|
||
---|---|---|
backup2mdisc.sh | ||
LICENSE | ||
README.md |
1. Overview: Individually Encrypted 100GB Archives
Basic Idea
-
Figure out which files belong to which 100GB set.
- You can gather files until their combined uncompressed size is ~100GB (or 95GB if you want some buffer for overhead).
- Put them in a "chunk_001" grouping, then "chunk_002," etc.
-
Create a TAR for each group, then compress with
lz4
, then encrypt withgpg
.- Result:
chunk_001.tar.lz4.gpg
,chunk_002.tar.lz4.gpg
, etc. - Each chunk is fully independent: if you only have
chunk_004.tar.lz4.gpg
, you can decrypt it, decompress it, and restore the files that were in chunk #4.
- Result:
-
Burn each chunk onto its own M-Disc.
- Optionally, create ISO images (e.g.,
genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg
) and then burn them.
- Optionally, create ISO images (e.g.,
-
To restore any subset, you just decrypt the chunk you want, decompress, and extract it. No other chunks are required.
Pros
- Each 100GB chunk is an autonomous backup.
- Damage/loss of one disc only affects that chunk's files.
Cons
- Less efficient if you have many smaller files (no cross-chunk deduplication).
- Slightly more complex to create "balanced" 100GB sets.
- Big single files that exceed 100GB are a problem unless you handle them specially.
2. Sample Script: backup2mdisc.sh
This is a Bash script that:
- Collects all files in a specified source directory.
- Iterates over them in ascending order by size (you can adjust if you prefer a different approach).
- Accumulates files into a "chunk" until you're about to exceed the chunk size limit.
- When the chunk is "full," it creates a tar archive, pipes it into lz4, then encrypts with
gpg
. - Moves on to the next chunk until all files are processed.
- Generates a manifest with checksums for each
.tar.lz4.gpg
.
Disclaimer:
- This script uses file-size-based grouping. If you have one single file larger than the chunk limit, it won't fit. You'd need advanced splitting or a different solution.
- On macOS or FreeBSD, you might need to install or alias
sha256sum
. If unavailable, replace withshasum -a 256
.- This script does not automatically burn discs (though it shows how you might add that step).
How This Script Works
-
Collect Files and Sort
- We use
find
to list all files inSOURCE_DIR
, capturing both size and path. - Sorting by size ensures the script packs smaller files first. (You can remove sorting if you prefer alphabetical or another method.)
- We use
-
Accumulate Files Until the Chunk Is ~100GB
- We convert
CHUNK_SIZE
from something like100G
into bytes. Then we compare the sum of file sizes to that limit. - If adding a new file would exceed the chunk limit, we finalize the current chunk and create a new one.
- We convert
-
Create a TAR, Compress with lz4, Then Encrypt
- We pipe the TAR stream into
lz4
for fast compression, and then pipe that intogpg --batch -c
for symmetric encryption with AES256. - Each chunk is written to
chunk_XXX.tar.lz4.gpg
. - No chunk depends on the others.
- We pipe the TAR stream into
-
Write Checksums to the Manifest
- We run a SHA-256 on the resulting
chunk_XXX.tar.lz4.gpg
and store that inmanifest_individual_chunks.txt
for integrity checks.
- We run a SHA-256 on the resulting
-
Repeat
- Next chunk continues until all files have been processed.
-
Result
- You get multiple
.tar.lz4.gpg
archives in yourDEST_DIR
, each below your chosen chunk size and fully independent.
- You get multiple
Burning to M-Disc
You can then burn each chunk to a separate disc. For example:
cd /path/to/work_dir
genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg
# Then burn chunk_001.iso
growisofs -Z /dev/sr0=chunk_001.iso
Repeat for each chunk. On macOS, you might use:
hdiutil burn chunk_001.iso
(Adjust device paths and commands as needed.)
Restoring Data
To restore from a single chunk (e.g., chunk_002.tar.lz4.gpg), do:
gpg --decrypt chunk_002.tar.lz4.gpg | lz4 -d | tar -xvf -
You'll be prompted for the same passphrase you used when creating the archive. After extraction, you'll see all the files that chunk contained.
- If one disc is lost, you can still decrypt and restore the other discs. You only lose the files in the missing chunk.
Why lz4 Over xz?
- lz4 is extremely fast compared to xz, especially for decompression.
- xz typically yields better compression (smaller output size), but at a much higher CPU cost.
- For backups where speed is the priority (and you have enough disc space), lz4 is a great choice.
- If you need to cram as much data as possible into 100GB, you might prefer xz with a high compression setting—but your backup process and restoration would be slower.
Final Thoughts
With this script and approach:
- You gain independently decryptable 100GB archives.
- If a single disc is damaged, you only lose that chunk's data; all other chunks remain fully restorable.
- lz4 + gpg is a solid combo for speed (lz4 for compression, gpg for encryption).
- Always test your workflow on smaller data sets before doing a large 2TB backup.
- Keep your passphrase secure, and consider verifying your burned discs with checksums.
That's it! You now have a fast, chunked, and individually encrypted backup solution for your M-Discs.