I’ve got a whole bucket full of old hard drives, CDs and DVDs, and I’m starting the process of backing up as much as still works to a 4TB drive.

It’s gonna be a long journey and lots of files, many prone to being duplicates from some of the drives.

What sorts of software do you Linux users recommend?

I’m on Linux Mint MATE, if that matters much.

Edit: One of the programs I’m accustomed to from my Windows days is FolderMatch, which is a step above simple duplicate file scanning, it scans for duplicate or semi-duplicate folders as well and breaks down individual file differences when comparing two folders.

I see I’ve already gotten some responses, and I thank everyone in advance. I’m on a road trip right now, I’ll be checking you folks recommend software later this evening or as soon as I can anyways.

  • MonkderVierte@lemmy.ml
    link
    fedilink
    arrow-up
    4
    ·
    edit-2
    6 hours ago

    That is filesystem-level. Btrfs and i think ZFS? have deduplication built in.

    Btrfs gave me 150 GB on my 2 TB gaming disk that way.

  • serenissi@lemmy.world
    link
    fedilink
    arrow-up
    2
    ·
    14 hours ago

    Not recommending software. As you mentioned old hard disks, it is better to copy the files or better dd them on a ssd. That way making index and finding duplicates will be faster cause you’ve to access files once and not care about fragmentation if you dd.

  • doeknius_gloek@discuss.tchncs.de
    link
    fedilink
    arrow-up
    18
    ·
    edit-2
    1 day ago

    I’ve had great success with restic. It will handle your 4TB just fine, here’s some stats of mine:

    Total File Count: 78374
    Total Size: 13.324 TiB
    

    and another one, not as large but with lots of files

    Total File Count: 1295210
    Total Size: 2.717 TiB
    

    Restic will automatically deduplicate your data so your duplicates won’t waste storage at your backup location.

    I’ve recently learned about backrest which can serve as a restic UI if you’re not comfortable with the cli, but I haven’t used it myself.

    To clean your duplicates at the source I would look into Czkawka as another lemming already suggested.

    • Ekpu@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      13 hours ago

      I use backrest selfhostet on my server running yunohost. It is pretty much set and forget. I love it.

    • Squizzy@lemmy.world
      link
      fedilink
      arrow-up
      1
      ·
      18 hours ago

      Hey, does this have a gui? I am new to linux and cant quite handle doing work like thisnwithout a gui.

  • Dessalines@lemmy.ml
    link
    fedilink
    arrow-up
    6
    ·
    21 hours ago

    Nightly rsync job in crontab works well enough, if its an external hard drive.

    If you’re going over a network, syncthing.

  • SayCyberOnceMore@feddit.uk
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    12 hours ago

    There’s BeyondCompare and Meld if you want a GUI, but, if I understand this correctly, rmlint and fdupes might be helpful here

    I’ve done similar in the past - I prefer commandline for this…

    What I’d do is create a “final destination” folder on the 4TB drive and then other working folders for each hdd / cd / dvd that you’re working through

    Ie

    /mnt/4TB/finaldestination /mnt/4TB/source1 /mnt/4TB/source2 …

    Obviously finaldestination is empty to start with so it could just be a direct copy of your first hdd - so make that the largest drive.

    (I’m saying copy here, presuming you want to keep the old drives for now, just in case you accidentally delete the wrong stuff on the 4TB drive)

    Maybe clean up any obvious stuff

    Remove that first drive

    Mount the next and copy the data to /mnt/4TB/source2

    Now use rmlint or fdupes and do a dry-run between source2 and finaldestination and get a feel whether they’re similar or not, so then you’ll know whether to just move it all to finaldestination or maybe then use the gui tools.

    You might completely empty /mnt4TB/source2, or it might still have something in, depends on how you feel it’s going.

    Repeat for the rest, working on smaller & smaller drives, comparing with the finaldestination first and then moving the data.

    Slow? Yep. Satisfying that you know there’s only 1 version there? Yep.

    Then do a backup 😉

    • over_clox@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      ·
      edit-2
      36 minutes ago

      The way I’m organizing the main backups to start with is with folder names such as 20250505 Laptop Backup, 20250508 Media Backup, etc.

      Eventually I plan on organizing things in bulk folders with simple straightforward names such as Movies, Music, Game ROMs, Virtual Machines, etc.

      Yes, thankfully I already got all my main files, music and movies backed up. Right now I’m backing up my software, games, emulator ROMs, etc.

      Hopefully that drive finishes backing up before the weather gets bad, cuz I’m definitely shutting things down when there’s lightning around…

  • solrize@lemmy.world
    link
    fedilink
    arrow-up
    3
    ·
    edit-2
    1 day ago

    I’m using Borg and it’s fine at that scale. I don’t know if it would still be viable with 100TB or whatever. The initial backup will be kind of slow but it encrypts everything, and deduplicates it too if I’m not mistaken. In any case, it deduplicates the common situation where you back up another snapshot later. Only the differences get written in the second backup. So you can save new snapshots fairly quickly and without much additional space.

    • over_clox@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 day ago

      I don’t even want this data encrypted. Quite the opposite actually.

      This is mostly the category of files getting deleted from the Internet Archive every day. I want to preserve what I got before it gets erased…

      • solrize@lemmy.world
        link
        fedilink
        arrow-up
        2
        ·
        22 hours ago

        You can turn off Borg encryption but maybe what you really want is an object store (S3 style). Those exist too.

  • billwashere@lemmy.world
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Honestly I maintain a list of file types I care about and copy those off. It’s mostly things I’ve created or specifically accumulated. Things like mp3, mkv, gcode, stl, jpeg, doc, txt, etc. Find all of those and copy them off. I also find any files over a certain size and copy them off unless they are things like library files, dlls, that sorta thing. Am I possible going to kiss something, yeah. But I’ll get most of the things I care about.

    • over_clox@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 day ago

      Not everything is an individual file though, a lot of the stuff needs to be stored and maintained as bulk folders.

      I mod operating systems and occasionally games, plus write software. I can’t just dump off all text files into a single folder, that’ll just dump off all readme.txt files off into a single TXT folder, losing association with the project folders from which they came.

      • billwashere@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 day ago

        Isn’t all the code in git somewhere? I would totally do that for code projects.

        I do the same thing with arduino code so I know where you’re coming from.

          • billwashere@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            21 hours ago

            I feel you. I started coding before the internet even existed (well technically it existed, just nobody had access to it)

  • catloaf@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 day ago

    Do you need any of it? Usually I’ve not even thought about what might be on an old drive.

    If I was worried about the slim chance there’s something of critical importance I’d need later, I’d just look over each device and pick out individual files I might want, and dump the rest.

    If you’re extremely paranoid, I’d take a block-level backup of each device and archive it.

    • over_clox@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 day ago

      It’s not about whether I need any of the data or not. It’s about the fact that I have many archives scattered across many smaller driives of things getting deleted from the internet every day.

      It’s about data preservation. And suddenly I have 2X 4TB hard drives and a 2TB hard drive? A total of 10TB, just suddenly found in a dumpster, and all the SMART stats check out?! 👍

      I’m looking to backup everything I have from the past 25+ years!

      Just a drop in the bucket, one of my drives has like almost all the SNES game ROMs…

      • lemming741@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        1 day ago

        If it’s just buckets of data, mergerfs can pool the drives together, and then you can dedupe the whole lot.

        Or consider buying a surplus 20tb drive, copy everything to it, dedupe the 20, write back to the 4+4+2 as cold spares. Those surplus drives are $10-14 per tb and I’ve had fantastic luck with them.

        • over_clox@lemmy.worldOP
          link
          fedilink
          arrow-up
          2
          arrow-down
          1
          ·
          1 day ago

          These 4+4+2TB drives are fresh new to me, amazing they all seem to check out.

          Right now, the drives I’ll be pulling data from range anywhere from 40GB to 320GB, from a variety of different file systems. And that’s not counting the many optical discs that need to be archived before disc rot sets in (I’m sure some have already, but looking better than I expected).

          I don’t necessarily need a 20TB, just one of these 4TB drives ought to do the trick. Besides, its already gonna take me months to pull all my backups from the Internet Archive…

          • kylian0087@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            3
            ·
            7 hours ago

            Sounds like you are a data hoarder haha. Can’t blame you. But for such hobby’s perhaps a ZFS system with deduplication and a second ZFS system to use for backup of the first system is what you want.

            Does get costly though.

  • just_another_person@lemmy.world
    link
    fedilink
    arrow-up
    2
    arrow-down
    1
    ·
    1 day ago

    Deduping only works for a single target or context at a time, so if you’re working with many drives, you’ll need to sort your data into unified locations on the backup target first, THEN run dedupe tools against it all.

    Second, if all of your data from these drives fits uncompressed on the target drive, rsync will be the fastest to get the data from A to B.

    • over_clox@lemmy.worldOP
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      1 day ago

      Of course.

      Goal #1 is to migrate what data I can (which is a fucking lot) all over to the 4TB, in separate folders for each drive. Only after that will I worry with scanning for dupes and organizing things.

      I’m just looking for advice on what software is recommend for helping deal with such large tasks in advance.

      I’ve actually got 2X 4TB drives plus a single 2TB drive. But yeah, I know the best and easiest way is to consolidate it all on one drive first.

      • just_another_person@lemmy.world
        link
        fedilink
        arrow-up
        2
        arrow-down
        1
        ·
        1 day ago

        Then rsync is your friend, like so rsync -avzp /drive1/ /target2/drive1/

        That will copy all the files from drive1 to a destination folder in the backup drive called ‘drive1’.

        • over_clox@lemmy.worldOP
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          1 day ago

          Joy oh joy, I got like 75+ optical discs and like 10+ hard drives (whatever still works) to back up.

          This is already gonna take months I know, just my free time at the end of the day.

          This is gonna be fun. /s

          Thank you and everyone for the advice though.

          Side note, I think one of my drives has almost all the SNES game ROMS…