diff --git a/README.md b/README.md index 599c34c..1991253 100644 --- a/README.md +++ b/README.md @@ -1,125 +1,81 @@ -# 1. Overview: Individually Encrypted 100GB Archives +# backup2mdisc +## How It Works -### Basic Idea +1. **File Collection & Sorting** + - The script uses `find` to list all files in your `SOURCE_DIR` with their sizes. + - It sorts them in ascending order by size so it can pack smaller files first (you can remove `| sort -n` if you prefer a different method). -1. **Figure out which files belong to which 100GB set**. - - You can gather files until their combined uncompressed size is ~100GB (or 95GB if you want some buffer for overhead). - - Put them in a "chunk_001" grouping, then "chunk_002," etc. +2. **Chunk Accumulation** + - It iterates over each file, summing up file sizes into a “current chunk.” + - If adding a new file would exceed `CHUNK_SIZE` (default 100GB), it **finalizes** the current chunk (creates `.tar.lz4.gpg`) and starts a new one. -2. **Create a TAR for each group**, then **compress** with `lz4`, then **encrypt** with `gpg`. - - Result: `chunk_001.tar.lz4.gpg`, `chunk_002.tar.lz4.gpg`, etc. - - Each chunk is fully independent: if you only have `chunk_004.tar.lz4.gpg`, you can decrypt it, decompress it, and restore the files that were in chunk #4. +3. **Archive, Compress, Encrypt** + - For each chunk, it creates a `.tar.lz4.gpg` file. Specifically: + 1. `tar -cf - -T $TMP_CHUNK_LIST` (archive of the files in that chunk) + 2. Pipe into `lz4 -c` for fast compression + 3. Pipe into `gpg --batch -c` (symmetric encrypt with AES256, using your passphrase) + - The result is a self-contained file like `chunk_001.tar.lz4.gpg`. -3. **Burn each chunk** onto its own M-Disc. - - Optionally, create ISO images (e.g., `genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg`) and then burn them. +4. **Checksums & Manifest** + - It calculates the SHA-256 sum of each chunk archive and appends it to a manifest file along with the list of included files. + - That manifest is stored in `$WORK_DIR`. -4. **To restore** any subset, you just decrypt the chunk you want, decompress, and extract it. No other chunks are required. +5. **Optional ISO Creation** (`--create-iso`) + - After each chunk is created, the script can build an ISO image containing just that `.tar.lz4.gpg`. + - This step uses `genisoimage` (or `mkisofs`). The resulting file is `chunk_001.iso`, etc. -### Pros +6. **Optional Burning** (`--burn`) + - If you specify `--burn`, the script will pause after creating each chunk/ISO and prompt you to insert a fresh M-Disc. + - On **Linux**, it tries `growisofs`. + - On **macOS**, it tries `hdiutil` (if creating an ISO). + - If it doesn't find these commands, it'll instruct you to burn manually. -- Each 100GB chunk is an autonomous backup. -- Damage/loss of one disc only affects that chunk's files. - -### Cons - -- Less efficient if you have many smaller files (no cross-chunk deduplication). -- Slightly more complex to create "balanced" 100GB sets. -- Big single files that exceed 100GB are a problem unless you handle them specially. +7. **Repeat** + - The script loops until all files have been placed into chunk(s). --- -# 2. Sample Script: `backup2mdisc.sh` +## Restoring Your Data -This is a **Bash** script that: +- **Disc is self-contained**: If you have disc #4 containing `chunk_004.tar.lz4.gpg`, you can restore it independently of the others. +- **Decrypt & Extract**: + ```bash + gpg --decrypt chunk_004.tar.lz4.gpg | lz4 -d | tar -xvf - + ``` + This will prompt for the passphrase you used during backup. -1. Collects **all files** in a specified source directory. -2. Iterates over them in ascending order by size (you can adjust if you prefer a different approach). -3. Accumulates files into a "chunk" until you're about to exceed the chunk size limit. -4. When the chunk is "full," it creates a **tar** archive, pipes it into **lz4**, then **encrypts** with `gpg`. -5. Moves on to the next chunk until all files are processed. -6. Generates a manifest with checksums for each `.tar.lz4.gpg`. - -> **Disclaimer**: -> - This script uses file-size-based grouping. If you have one single file larger than the chunk limit, it won't fit. You'd need advanced splitting or a different solution. -> - On macOS or FreeBSD, you might need to install or alias `sha256sum`. If unavailable, replace with `shasum -a 256`. -> - This script **does not** automatically burn discs (though it shows how you might add that step). +- If one disc is lost, you only lose the files in that chunk; all other chunks remain restorable. --- -## How This Script Works +## Why lz4? -1. **Collect Files and Sort** - - We use `find` to list all files in `SOURCE_DIR`, capturing both size and path. - - Sorting by size ensures the script packs smaller files first. (You can remove sorting if you prefer alphabetical or another method.) - -2. **Accumulate Files Until the Chunk Is ~100GB** - - We convert `CHUNK_SIZE` from something like `100G` into bytes. Then we compare the sum of file sizes to that limit. - - If adding a new file would exceed the chunk limit, we finalize the current chunk and create a new one. - -3. **Create a TAR, Compress with lz4, Then Encrypt** - - We pipe the TAR stream into `lz4` for fast compression, and then pipe **that** into `gpg --batch -c` for symmetric encryption with AES256. - - Each chunk is written to `chunk_XXX.tar.lz4.gpg`. - - No chunk depends on the others. - -4. **Write Checksums to the Manifest** - - We run a SHA-256 on the resulting `chunk_XXX.tar.lz4.gpg` and store that in `manifest_individual_chunks.txt` for integrity checks. - -5. **Repeat** - - Next chunk continues until all files have been processed. - -6. **Result** - - You get multiple `.tar.lz4.gpg` archives in your `DEST_DIR`, each below your chosen chunk size and fully independent. - -## Burning to M-Disc - -You can then burn each chunk to a separate disc. For example: - -```bash -cd /path/to/work_dir -genisoimage -o chunk_001.iso chunk_001.tar.lz4.gpg -# Then burn chunk_001.iso -growisofs -Z /dev/sr0=chunk_001.iso -``` - -Repeat for each chunk. On macOS, you might use: - -```bash -hdiutil burn chunk_001.iso -``` - -(Adjust device paths and commands as needed.) - -## Restoring Data - -To restore from a single chunk (e.g., chunk_002.tar.lz4.gpg), do: - -```bash -gpg --decrypt chunk_002.tar.lz4.gpg | lz4 -d | tar -xvf - -``` - -You'll be prompted for the same passphrase you used when creating the archive. After extraction, you'll see all the files that chunk contained. - -- **If one disc is lost**, you can still decrypt and restore the other discs. You only lose the files in the missing chunk. +- **Speed**: `lz4` is extremely fast at both compression and decompression. +- **Less compression ratio** than xz, but if your priority is speed (and 100GB disc space is enough), `lz4` is a great choice. +- For maximum compression at the cost of time, you could replace `lz4` with `xz -9`, but expect slower backups and restores. --- -# Why lz4 Over xz? +## Tips & Caveats -- **lz4** is extremely fast compared to xz, especially for decompression. -- **xz** typically yields better compression (smaller output size), but at a much higher CPU cost. -- For backups where speed is the priority (and you have enough disc space), lz4 is a great choice. -- If you need to cram as much data as possible into 100GB, you might prefer xz with a high compression setting—but your backup process and restoration would be slower. +1. **Large Files** + - A single file larger than your chunk size (e.g., 101GB file with a 100GB chunk limit) won't fit. This script doesn't handle that gracefully. You'd need to split such a file (e.g., with `split`) before archiving or use a backup tool that supports partial file splitting. + +2. **Verification** + - Always verify your discs after burning. Mount them and compare the chunk's SHA-256 with the manifest to ensure data integrity. + +3. **Incremental or Deduplicated Backups** + - For advanced features (incremental, deduplication, partial-chunk checksums), consider specialized backup programs (like Borg, restic, Duplicati). However, they usually produce multi-volume archives that need **all** volumes to restore. + +4. **Cross-Platform** + - On FreeBSD or macOS, you might need to tweak the commands for hashing (`sha256sum` vs. `shasum -a 256`) or ISO creation (`mkisofs` vs. `genisoimage`). + - For burning, Linux uses `growisofs`, macOS uses `hdiutil`, and FreeBSD may require `cdrecord` or another tool. --- -## Final Thoughts +**Now you can enjoy the best of both worlds**: +- **Independently decryptable** (and restorable) archives on each M-Disc. +- Automatic ISO creation and optional disc burning in the same script. +- Fast compression via lz4. -With this script and approach: - -- You gain **independently decryptable** 100GB archives. -- If a single disc is damaged, you only lose that chunk's data; all other chunks remain fully restorable. -- lz4 + gpg is a solid combo for speed (lz4 for compression, gpg for encryption). -- Always **test** your workflow on smaller data sets before doing a large 2TB backup. -- Keep your passphrase secure, and consider verifying your burned discs with checksums. - -That's it! You now have a **fast, chunked, and individually encrypted** backup solution for your M-Discs. \ No newline at end of file +This gives you a **self-contained** backup on each disc without chain-dependency across your entire 2TB backup set! \ No newline at end of file