Cross-platform script and guide should get you started creating encrypted backups on 100GB M-Discs, with a manifest to track chunks and checksums, plus optional ISO creation and automated burning steps.
Find a file
first df214ea14c Update README.md
Signed-off-by: first <first@noreply.git.r21.io>
2025-01-24 06:51:35 +00:00
backup2mdisc.sh Update backup2mdisc.sh 2025-01-24 06:46:04 +00:00
LICENSE Initial commit 2025-01-24 06:39:53 +00:00
README.md Update README.md 2025-01-24 06:51:35 +00:00

1. Overview: Individually Encrypted 100GB Archives

Basic Idea

  1. Figure out which files belong to which 100GB set.

    • You can gather files until their combined uncompressed size is ~100GB (or 95GB if you want some buffer for overhead).
    • Put them in a "chunk_001" grouping, then "chunk_002," etc.
  2. Create a TAR for each group, then compress with lz4, then encrypt with gpg.

    • Result: chunk_001.tar.lz4.gpg, chunk_002.tar.lz4.gpg, etc.
    • Each chunk is fully independent: if you only have chunk_004.tar.lz4.gpg, you can decrypt it, decompress it, and restore the files that were in chunk #4.
  3. Burn each chunk onto its own M-Disc.

    • Optionally, create ISO images (e.g., genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg) and then burn them.
  4. To restore any subset, you just decrypt the chunk you want, decompress, and extract it. No other chunks are required.

Pros

  • Each 100GB chunk is an autonomous backup.
  • Damage/loss of one disc only affects that chunk's files.

Cons

  • Less efficient if you have many smaller files (no cross-chunk deduplication).
  • Slightly more complex to create "balanced" 100GB sets.
  • Big single files that exceed 100GB are a problem unless you handle them specially.

2. Sample Script: backup2mdisc.sh

This is a Bash script that:

  1. Collects all files in a specified source directory.
  2. Iterates over them in ascending order by size (you can adjust if you prefer a different approach).
  3. Accumulates files into a "chunk" until you're about to exceed the chunk size limit.
  4. When the chunk is "full," it creates a tar archive, pipes it into lz4, then encrypts with gpg.
  5. Moves on to the next chunk until all files are processed.
  6. Generates a manifest with checksums for each .tar.lz4.gpg.

Disclaimer:

  • This script uses file-size-based grouping. If you have one single file larger than the chunk limit, it won't fit. You'd need advanced splitting or a different solution.
  • On macOS or FreeBSD, you might need to install or alias sha256sum. If unavailable, replace with shasum -a 256.
  • This script does not automatically burn discs (though it shows how you might add that step).

How This Script Works

  1. Collect Files and Sort

    • We use find to list all files in SOURCE_DIR, capturing both size and path.
    • Sorting by size ensures the script packs smaller files first. (You can remove sorting if you prefer alphabetical or another method.)
  2. Accumulate Files Until the Chunk Is ~100GB

    • We convert CHUNK_SIZE from something like 100G into bytes. Then we compare the sum of file sizes to that limit.
    • If adding a new file would exceed the chunk limit, we finalize the current chunk and create a new one.
  3. Create a TAR, Compress with lz4, Then Encrypt

    • We pipe the TAR stream into lz4 for fast compression, and then pipe that into gpg --batch -c for symmetric encryption with AES256.
    • Each chunk is written to chunk_XXX.tar.lz4.gpg.
    • No chunk depends on the others.
  4. Write Checksums to the Manifest

    • We run a SHA-256 on the resulting chunk_XXX.tar.lz4.gpg and store that in manifest_individual_chunks.txt for integrity checks.
  5. Repeat

    • Next chunk continues until all files have been processed.
  6. Result

    • You get multiple .tar.lz4.gpg archives in your DEST_DIR, each below your chosen chunk size and fully independent.

Burning to M-Disc

You can then burn each chunk to a separate disc. For example:

cd /path/to/work_dir
genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg
# Then burn chunk_001.iso
growisofs -Z /dev/sr0=chunk_001.iso

Repeat for each chunk. On macOS, you might use:

hdiutil burn chunk_001.iso

(Adjust device paths and commands as needed.)

Restoring Data

To restore from a single chunk (e.g., chunk_002.tar.lz4.gpg), do:

gpg --decrypt chunk_002.tar.lz4.gpg | lz4 -d | tar -xvf -

You'll be prompted for the same passphrase you used when creating the archive. After extraction, you'll see all the files that chunk contained.

  • If one disc is lost, you can still decrypt and restore the other discs. You only lose the files in the missing chunk.

Why lz4 Over xz?

  • lz4 is extremely fast compared to xz, especially for decompression.
  • xz typically yields better compression (smaller output size), but at a much higher CPU cost.
  • For backups where speed is the priority (and you have enough disc space), lz4 is a great choice.
  • If you need to cram as much data as possible into 100GB, you might prefer xz with a high compression setting—but your backup process and restoration would be slower.

Final Thoughts

With this script and approach:

  • You gain independently decryptable 100GB archives.
  • If a single disc is damaged, you only lose that chunk's data; all other chunks remain fully restorable.
  • lz4 + gpg is a solid combo for speed (lz4 for compression, gpg for encryption).
  • Always test your workflow on smaller data sets before doing a large 2TB backup.
  • Keep your passphrase secure, and consider verifying your burned discs with checksums.

That's it! You now have a fast, chunked, and individually encrypted backup solution for your M-Discs.