125 lines
5.3 KiB
Markdown
125 lines
5.3 KiB
Markdown
# 1. Overview: Individually Encrypted 100GB Archives
|
|
|
|
### Basic Idea
|
|
|
|
1. **Figure out which files belong to which 100GB set**.
|
|
- You can gather files until their combined uncompressed size is ~100GB (or 95GB if you want some buffer for overhead).
|
|
- Put them in a "chunk_001" grouping, then "chunk_002," etc.
|
|
|
|
2. **Create a TAR for each group**, then **compress** with `lz4`, then **encrypt** with `gpg`.
|
|
- Result: `chunk_001.tar.lz4.gpg`, `chunk_002.tar.lz4.gpg`, etc.
|
|
- Each chunk is fully independent: if you only have `chunk_004.tar.lz4.gpg`, you can decrypt it, decompress it, and restore the files that were in chunk #4.
|
|
|
|
3. **Burn each chunk** onto its own M-Disc.
|
|
- Optionally, create ISO images (e.g., `genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg`) and then burn them.
|
|
|
|
4. **To restore** any subset, you just decrypt the chunk you want, decompress, and extract it. No other chunks are required.
|
|
|
|
### Pros
|
|
|
|
- Each 100GB chunk is an autonomous backup.
|
|
- Damage/loss of one disc only affects that chunk's files.
|
|
|
|
### Cons
|
|
|
|
- Less efficient if you have many smaller files (no cross-chunk deduplication).
|
|
- Slightly more complex to create "balanced" 100GB sets.
|
|
- Big single files that exceed 100GB are a problem unless you handle them specially.
|
|
|
|
---
|
|
|
|
# 2. Sample Script: `backup2mdisc.sh`
|
|
|
|
This is a **Bash** script that:
|
|
|
|
1. Collects **all files** in a specified source directory.
|
|
2. Iterates over them in ascending order by size (you can adjust if you prefer a different approach).
|
|
3. Accumulates files into a "chunk" until you're about to exceed the chunk size limit.
|
|
4. When the chunk is "full," it creates a **tar** archive, pipes it into **lz4**, then **encrypts** with `gpg`.
|
|
5. Moves on to the next chunk until all files are processed.
|
|
6. Generates a manifest with checksums for each `.tar.lz4.gpg`.
|
|
|
|
> **Disclaimer**:
|
|
> - This script uses file-size-based grouping. If you have one single file larger than the chunk limit, it won't fit. You'd need advanced splitting or a different solution.
|
|
> - On macOS or FreeBSD, you might need to install or alias `sha256sum`. If unavailable, replace with `shasum -a 256`.
|
|
> - This script **does not** automatically burn discs (though it shows how you might add that step).
|
|
|
|
---
|
|
|
|
## How This Script Works
|
|
|
|
1. **Collect Files and Sort**
|
|
- We use `find` to list all files in `SOURCE_DIR`, capturing both size and path.
|
|
- Sorting by size ensures the script packs smaller files first. (You can remove sorting if you prefer alphabetical or another method.)
|
|
|
|
2. **Accumulate Files Until the Chunk Is ~100GB**
|
|
- We convert `CHUNK_SIZE` from something like `100G` into bytes. Then we compare the sum of file sizes to that limit.
|
|
- If adding a new file would exceed the chunk limit, we finalize the current chunk and create a new one.
|
|
|
|
3. **Create a TAR, Compress with lz4, Then Encrypt**
|
|
- We pipe the TAR stream into `lz4` for fast compression, and then pipe **that** into `gpg --batch -c` for symmetric encryption with AES256.
|
|
- Each chunk is written to `chunk_XXX.tar.lz4.gpg`.
|
|
- No chunk depends on the others.
|
|
|
|
4. **Write Checksums to the Manifest**
|
|
- We run a SHA-256 on the resulting `chunk_XXX.tar.lz4.gpg` and store that in `manifest_individual_chunks.txt` for integrity checks.
|
|
|
|
5. **Repeat**
|
|
- Next chunk continues until all files have been processed.
|
|
|
|
6. **Result**
|
|
- You get multiple `.tar.lz4.gpg` archives in your `DEST_DIR`, each below your chosen chunk size and fully independent.
|
|
|
|
## Burning to M-Disc
|
|
|
|
You can then burn each chunk to a separate disc. For example:
|
|
|
|
```bash
|
|
cd /path/to/work_dir
|
|
genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg
|
|
# Then burn chunk_001.iso
|
|
growisofs -Z /dev/sr0=chunk_001.iso
|
|
```
|
|
|
|
Repeat for each chunk. On macOS, you might use:
|
|
|
|
```bash
|
|
hdiutil burn chunk_001.iso
|
|
```
|
|
|
|
(Adjust device paths and commands as needed.)
|
|
|
|
## Restoring Data
|
|
|
|
To restore from a single chunk (e.g., chunk_002.tar.lz4.gpg), do:
|
|
|
|
```bash
|
|
gpg --decrypt chunk_002.tar.lz4.gpg | lz4 -d | tar -xvf -
|
|
```
|
|
|
|
You'll be prompted for the same passphrase you used when creating the archive. After extraction, you'll see all the files that chunk contained.
|
|
|
|
- **If one disc is lost**, you can still decrypt and restore the other discs. You only lose the files in the missing chunk.
|
|
|
|
---
|
|
|
|
# Why lz4 Over xz?
|
|
|
|
- **lz4** is extremely fast compared to xz, especially for decompression.
|
|
- **xz** typically yields better compression (smaller output size), but at a much higher CPU cost.
|
|
- For backups where speed is the priority (and you have enough disc space), lz4 is a great choice.
|
|
- If you need to cram as much data as possible into 100GB, you might prefer xz with a high compression setting—but your backup process and restoration would be slower.
|
|
|
|
---
|
|
|
|
## Final Thoughts
|
|
|
|
With this script and approach:
|
|
|
|
- You gain **independently decryptable** 100GB archives.
|
|
- If a single disc is damaged, you only lose that chunk's data; all other chunks remain fully restorable.
|
|
- lz4 + gpg is a solid combo for speed (lz4 for compression, gpg for encryption).
|
|
- Always **test** your workflow on smaller data sets before doing a large 2TB backup.
|
|
- Keep your passphrase secure, and consider verifying your burned discs with checksums.
|
|
|
|
That's it! You now have a **fast, chunked, and individually encrypted** backup solution for your M-Discs. |