{
  "$type": "site.standard.document",
  "path": "/posts/archiver/",
  "publishedAt": "2024-12-09T00:20:32.000Z",
  "site": "at://did:plc:6n2ngs7zpcpwxz3jaoxj56tu/site.standard.publication/3mo6y7ludvn2h",
  "tags": [
    "programming"
  ],
  "textContent": "Want to archive folders however you want?\n\nContext\n\nI had been trying to figure out a more efficient way of backing up directories on my computers. For example, I have a huge directory where my Obsidian vault lives. This vault is filled with a bunch of pdf and markdown files that I use for my everyday notes. I do like keeping this folder backed up in case my laptop or PC dies.\n\nThe way I would backup this folder was by simply turning it into a GitHub repo and uploading everything privately to my GitHub account. This was not what GitHub was made for and I was nearing 2GB for the entire repo. It was time to search for a different solution.\n\nEnter the infamous  vulnerability. This put compression libraries on my radar. So I decided to get the entire directory and compress it down using  and . I noticed that the compression on  was far better than . For my purposes I wanted to optimize for size at the cost of compression speed. Additionally the nice people at r/DataHoarder had mentioned that a really cool software program called  was a thing. I could recover any amount I wanted from a specific file, using bit parity magic.\n\nFor my purposes, I want to store big directories as small as possible, store their MD5-Sum to make sure I can verify the files are correct, and store their parity file to be able to recover any part of them I want (up to 30% in my case). This is because sharing things on a local NAS or even with Syncthing this way would be very easy and efficient.\n\nSome downsides to this system are:\n\n-  is not as common as I thought and it's been somewhat complicated to move files to systems that don't have this utility.\n- Incremental backups are not possible.\n\nPS: I recognize that  might have vulnerabilities that might not be.\n\nArchiver\n\nArchiver is a bash script that helps you backup whatever you want, however you want.\n\nUsage\n\nFor example:\n\nYou specify list of attribute sets with three key parts:\n\n- : the target directory you wish to backup\n- : the name of the archive and where you want to store the archive\n- : the command you wish to use to back up this specific directory\n\nOther details like  and  are useful for other purposes if you wish to climb under the hood to use them. The MD5-Sum is also useful if you wish to verify the legitimacy of your files after retrieving them.\n\nPS: an example  is provided.\n\nMethodology\n\n1. Your  gets converted into a tarball.\n2. That tarball is compressed into an  archive.\n    - This format was chosen because of its excellent compression ratio.\n    - Though in the future I would like to implement multiple formats for this.\n3. A parity archive is created from that compressed tarball.\n    - Uses the  utilities.\n    - A single block file with 30% redundancy is created.\n    - Additionally, you can use the index file that's created but  doesn't really need it.\n4. A unix timestamp and MD5-Sum is taken from the archived tarball.\n5. Your  hook is run at the end and a secondary timestamp is taken\n   at the end of this.\n6. Your  file is updated with all of the fresh timestamps and\n   MD5-Sum.",
  "title": "Archiver"
}