Hister: A free & self-hosted personal search engine
Privacy Guides Community [Unofficial]
May 25, 2026
That’s great to hear. I’m a software developer, so I might be able to help out, although I don’t have that much time available…
I do think it’s a hard problem to solve. Of the top of my head, I can see several challenges like index size, risk of people sharing indexed private o personal data without realising, bad actors sharing “poisoned” data linking to fraudulent sites or people submitting indexed illegal content such as CSAM. Then if the index is hosted by a third party, there’s the issue of how to avoid the index server from tracking the search queries of people.
For the index size, I guess it could be broken down as you mention, maybe by language and category. For preventing the index containing personal data, the indexer should probably visit the sites without any cookie or auth header, but even then, there’s the risk of indexing pages that are publicly accessible, but have a hard to guess URLs and aren’t meant to be indexed or found by crawlers, like Google Docs etc. Maybe reading the robots.txt would be enough to avoid indexing pages that aren’t meant to be indexed? For the “poisoned” or illegal content… that’s probably the hardest issue and as long as random people can submit content, the problem will always be there. Maybe some sort of key-pair signing, where the nodes hosting the index can choose which public-key’s contributions to accept, and index moderation is done by the nodes hosting the indices. Which as I’m typing this, it kind of rhymes with NOSTR. Maybe that could be the decentralised index distribution system. For anonymous queries, people can rely on Tor for either downloading the index from the nodes/relays and then do offline queries, or Tor to actually query the remote hosted indices.
Discussion in the ATmosphere