Archiver
Want to archive folders however you want?
Context
I had been trying to figure out a more efficient way of backing up directories on my computers. For example, I have a huge directory where my Obsidian vault lives. This vault is filled with a bunch of pdf and markdown files that I use for my everyday notes. I do like keeping this folder backed up in case my laptop or PC dies.
The way I would backup this folder was by simply turning it into a GitHub repo and uploading everything privately to my GitHub account. This was not what GitHub was made for and I was nearing 2GB for the entire repo. It was time to search for a different solution.
Enter the infamous vulnerability. This put compression libraries on my radar. So I decided to get the entire directory and compress it down using and . I noticed that the compression on was far better than . For my purposes I wanted to optimize for size at the cost of compression speed. Additionally the nice people at r/DataHoarder had mentioned that a really cool software program called was a thing. I could recover any amount I wanted from a specific file, using bit parity magic.
For my purposes, I want to store big directories as small as possible, store their MD5-Sum to make sure I can verify the files are correct, and store their parity file to be able to recover any part of them I want (up to 30% in my case). This is because sharing things on a local NAS or even with Syncthing this way would be very easy and efficient.
Some downsides to this system are:
- is not as common as I thought and it's been somewhat complicated to move files to systems that don't have this utility.
- Incremental backups are not possible.
PS: I recognize that might have vulnerabilities that might not be.
Archiver
Archiver is a bash script that helps you backup whatever you want, however you want.
Usage
For example:
You specify list of attribute sets with three key parts:
- : the target directory you wish to backup
- : the name of the archive and where you want to store the archive
- : the command you wish to use to back up this specific directory
Other details like and are useful for other purposes if you wish to climb under the hood to use them. The MD5-Sum is also useful if you wish to verify the legitimacy of your files after retrieving them.
PS: an example is provided.
Methodology
- Your gets converted into a tarball.
- That tarball is compressed into an archive.
- This format was chosen because of its excellent compression ratio.
- Though in the future I would like to implement multiple formats for this.
- A parity archive is created from that compressed tarball.
- Uses the utilities.
- A single block file with 30% redundancy is created.
- Additionally, you can use the index file that's created but doesn't really need it.
- A unix timestamp and MD5-Sum is taken from the archived tarball.
- Your hook is run at the end and a secondary timestamp is taken at the end of this.
- Your file is updated with all of the fresh timestamps and MD5-Sum.
Discussion in the ATmosphere