Need privacy-focused file formats
(I was about to reply to this thread, and realized I was really starting a new topic.)
Removing file metadata is much harder than it should be. Granted, there are a ton of file formats and even a given format can change over time. But I think developers and maintainers of these formats need to do more to isolate metadata information and support simple, one-shot methods for removing it without undermining the integrity of the file (as in, it should work just fine without it). Similarly, it should be trivially easy to view the metadata.
I know this is easier said than done. Some files are weird hybrids - like a PowerPoint that contains videos, images, etc - each of which may have their own embedded metadata. But I also believe we can get there and we should start working it out.
Of course, we do have some tools that are good - but not perfect. Dangerzone can be used for this, too, but it’s lossy and complicated.
I also know that you can ‘fingerpint’ files… but that’s just because our file formats (particularly PDFs and Office XML) have gotten ridiculously complicated, and so can include lots of clues about its owner/origin.
Maybe what we need instead are a set of “clean”, well-defined base formats… like converting a Word doc to markdown. That is, some well-known file formats that are inherently free of metadata. We may have to define these… but fine, let’s do that.
Discussion in the ATmosphere