I want to transfer 80 TB of data to another locatio . I already have the drives for it. The idea is to copy everything to it, fly it to the target and use or copy the data on/to the server.

What filesystem would you use and would you use a raid configuration? Currently I lean towards 8 single disk filesystems on the 10 TB drives with ext4, because it is simple. I considered ZFS because of the possiblity to scrub at the target destination and/or pool all drives. But ZFS may not be available at the target.

There is btrfs which should be available everywhere because it is in mainline linux and ZFS is not. But from my knowledge btrfs would require lvm to pool disks together like zfs can do natively.

Pooling the drives would also be a problem if one disk gets lost during transit. If I have everything on 8 single disks at least the remaining data can be used at the target and they only have to wait for the missing data.

I like to read about your opinions or practical experience with similar challanges.

  • mko@discuss.tchncs.de
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    1 day ago

    Will the disks be permanently in-place there or are they just a means of transport? Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.

    ZFS is up to the task if you can connect all the disks at the same time at the target location. You don’t really have to keep track of the order of the disks - ZFS will figure it out when mounting the pool. The act of copying the data from the disks will effectively perform a scrub at the same time.

    If you will only attach one disk at a time, it is a bit more of a coin toss. Although - ZFS single disk volumes do support scrubbing as well.

    Thinking about disk corruption in transit would be one of my worries - X-ray scans, vibration and just handling can do stuff with the bits. Tgz, zip or rar files with low or no compression can provide error detection, although low recovery. Checksum files can also help with detection. Any failed files can perhaps be transferred over the network for recovery.

    • poinck@lemmy.worldOP
      link
      fedilink
      arrow-up
      2
      ·
      13 hours ago

      Thx.

      The disks are only meant for transport at this time.

      The more I think about it, the more I lean towards btrfs, because even if they don’t use btrfs on the target server the copying process will do the error correction based on the checksums in btrfs itself. I hope btrfs does it the same way as ZFS in this scenario.

      • mko@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        14 minutes ago

        It’s a good idea to use what you know. I don’t have much experience with btrfs but if it does what it says on the tin then it should be safe to use.

        Copying the contents at the target is a good strategy. If the drives are to be put into 27/7 use later I would probably consider wiping them and run an integrity test before putting them to use, as once they start being used it will be too late (and stay as a doubt in the back of my mind).

    • atzanteol@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      3
      arrow-down
      6
      ·
      1 day ago

      Either way, traveling with that much spinning rust there is always a good chance for bit-flips or damage.

      What? Lol no. They’ll travel fine.

        • atzanteol@sh.itjust.works
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          3
          ·
          24 hours ago

          You kids think HDDs just failed daily or something. I flew all over the place with a laptop with an HDD for years, as did many others. It’ll be fine. Especially since it’s unlikely they would be using the drives while traveling.

          • mko@discuss.tchncs.de
            link
            fedilink
            English
            arrow-up
            5
            ·
            19 hours ago

            From a position of handling corporate data on a daily basis, I am pretty confident that data integrity is top of mind.

            • poinck@lemmy.worldOP
              link
              fedilink
              arrow-up
              2
              ·
              12 hours ago

              I agree with both of you. Somehow I don’t worry about the drive in my laptop but 80 TB of scientific data is another thing, and I want to make sure it is the same data when it arrives.

      • frongt@lemmy.zip
        link
        fedilink
        arrow-up
        7
        ·
        1 day ago

        Really, then why is there an explicit SMART conveyance test?

        It’s to test for damage that may have occurred during shipping.

          • frongt@lemmy.zip
            link
            fedilink
            arrow-up
            2
            arrow-down
            1
            ·
            17 hours ago

            Often enough that there’s a test designed to detect it specifically. If you want hard data you’ll have to find it on your own, I don’t have any handy.

            • poinck@lemmy.worldOP
              link
              fedilink
              arrow-up
              2
              ·
              12 hours ago

              this is scientific data.

              Funfact, I recently did a scrub on my offline backup drive of my work PC. It correct around 250 errors. I wouldn’t have noticed any problems if I had used ext4 instead of btrfs.