PriEco Quality Benchmarking
Intro
Hi! Some of you may know PriEco and you may know the results aren’t the best yet. I’m committed to improve it as much as I can. I’ve decided to document it here. Hopefully I am allowed to, will provide a coherent source of information, show some transparency on my side and we could have discussions.
Why here? Because of community and I like Privacy Guides. I could do it on X/Mastodon/Bsky or on own blog (that would be first post) but I don’t have trust that people would ready it there. Please tell me if it’s appreciated here and if not I’ll stop
I still believe the main reason why PriEco lacks behind is index size. It’s just too small compared to more established web search engines
Index size
Google (40-50B results, 400B known URLs) doesn’t publicly report it, but we can estimate from sources: WorldWideWebSize estimates Google at roughly 40-50B pages (Bing at 1-3B, I find it hard to believe as both Mojeek and Brave search report much higher numbers) I’d say that general agreed upon numbers online are ~50B results and 400B known URLs But there are claims as 8B (Maybe they mean domains)
Bing (8-14B) SEJournal IndexMachine Again, it’s just estimation
Brave search (8B+)
Source Here we can estimate Brave search is roughly Bing size as it was 8B over 5y ago. I believe we can all agree Brave search got a lot better compared to how it was 5y ago.
Mojeek (9B) Mojeek is pretty transparent about this. They replied to me and in 2025 they reached 9B (it’s in their timeline)
That said, there is a meaningful distinction about how many URLs (web pages) a web search engine knows about crawling and actually stores and serves as results indexing. I personally have 0 care for now about how many URLs PriEco knows about but! for the sake of this post I ran a script: 2.1B. I care only about how many results it can deliver to you: LIVE STATS (need to improve that page design)
While writing this I stumbled across
Just so you know. That was likely version 1 of my crawler. Now it’s at version 3 and the reason was that the before versions produced unusable results. PriEco crawls the web only for a few months and only recently with a reasonable speed
PriEco (300M results, 2.1B known URLs) I already mentioned the information but for people scanning through this Again the known URLs is irrelevant information for me, the results count is LIVE
Ranking
This information is even more hidden. PriEco does:
- Concurrent full-text search (keyword matching) and IVF vector search (semantic understanding)
- RRF merge & deduplication: merges both indexes to 1 list of results
- Hand ranking: A set of hand-picked rules that boost or hurt result score
- Examples are: SSL, loading time, if the page is in user set lang/loc, bad url patterns, measured confidence and effort of the page, if the page is homepage…
- I made up each signal weight. I am now looking to do a Google & Brave search query log optimization to improve it
- Reranker model + PageRank
- Cap SERP (search engine result page) to max 3 results from a single domain
Online is a lot more information about how to do ranking and I’m looking to ways to include in my ranking pipeline to improve results quality. Right now it’s about putting some logic behind hand ranking weights
Tests
Product becomes what you optimize it for. We need a proper “Gold standard” or a measurable metric we optimize PriEco against so that we can reliably measure if it’s getting better. We could measure against Google like so many before me did.
NDCG (AI helped, don’t yet have a proper test) I took 50 Google, Brave search, Mojeek and PriEco SERPs of the same query Sample of the queries:
- simple: youtube, netflix sign in, wikipedia english main page
- products: best video editing software 2026, best wireless earbuds for working out
- questions: difference between ssd and hdd sequential read write speeds or why does rust borrow checker reject mutable references in loops
The questions and test code was AI made for now These are results: Google (Measured against) 1.0000 Brave Search Score: 0.5599 Mojeek Score: 0.2932 PriEco Score: 0.0856
We can clearly see PriEco scored the worst. BUT! It’s workable, considering even index size compared to Mojeek is 30 times smaller and PriEco ranking isn’t yet very smart
That said, likely the test wasn’t entire optimal. It contained a lot of long queries. But it’s good to compare how PriEco scores on it compared to Brave search and Mojeek, which in my view makes the score pretty reasonable
Final words
First of all it isn’t yet done. I just wrote this to communicate current state of PriEco. I will keep this post updated as I improve the ranking, run more tests and grow the index.
Excited for any of your replies to this topic
Discussion in the ATmosphere