Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifquwrzvqn6qnpfr4cuy76i25rfwcbxfjcs7vpvdppj2re2c4vf4i",
    "uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3moigqfjl2uh2"
  },
  "coverImage": {
    "$type": "blob",
    "ref": {
      "$link": "bafkreidi3n5xs7gf2gvgpgxaqdoh2e6zg6okzixz6aajimpzmyya45ulyu"
    },
    "mimeType": "image/webp",
    "size": 232912
  },
  "path": "/sidswirl/the-knowledge-authority-layer-what-your-agents-cant-get-from-the-outside-f4i",
  "publishedAt": "2026-06-17T13:28:55.000Z",
  "site": "https://dev.to",
  "tags": [
    "ai",
    "rag",
    "llm",
    "mcp",
    "benchmarked three retrieval strategies"
  ],
  "textContent": "Every enterprise AI conversation right now starts in the same place: \"connect the model to our data.\" Then it stalls in the same place: _which_ data, copied _where_ , governed by _whom_.\n\nI build retrieval for a living (I wrote the original open-source SWIRL), so let me make an argument that runs against the current default - and then show the architecture it implies.\n\n##  The default is a second copy of your data\n\nThe standard RAG recipe is: crawl your sources, chunk them, embed them, and load the vectors into a database. Now your model can retrieve. It also means you have a _second copy_ of your content living in an index you have to secure, keep in sync, and explain to whoever owns compliance. You've recreated every permission boundary by hand, and you'll eventually get one wrong.\n\nFor a lot of teams that copy is simply not allowed. Regulated content, client-confidential material, anything privileged - copying it into a vendor store is exposure you don't get paid to take on.\n\n##  You probably don't need the vector database\n\nHere's the part people don't want to hear. Meta's XetHub team benchmarked three retrieval strategies: keyword-only (BM25), vector-only, and hybrid (keyword to pull candidates, then re-rank). Keyword-only came last. Vector-only did better.\n\n**Hybrid won** - and their conclusion was blunt: \"No vector database necessary.\"\n\nThat matches what we see in production. Vector similarity is a great _high-precision filter_ , not a great _first pass_. Lead with exact matches and quoted terms, then let embeddings and a cross-encoder re-rank what's left.\n\n##  What \"make your LLM better\" actually means\n\nIt's not a slogan; it's a pipeline. In SWIRL, relevance is three passes, and both models run locally:\n\n  1. **Federate and match.** Query every connected source in parallel - keyword + BM25 - and honor quoted phrases and exact terms first.\n  2. **Embedding re-rank.** Re-rank candidates with `E5-large-v2`, using title-aware chunking and hybrid keyword+vector fusion (RRF). No vector database to build or secure.\n  3. **Cross-encoder re-rank.** An `MS-MARCO` cross-encoder reads the query and document _together_ and scores real relevance, not vector distance.\n\n\n\nFeed _that_ to your LLM - whatever model you've chosen, including an on-prem one - and the answer gets better, because the context got better. Same model, sharper input.\n\n##  The layer no model supplies from the outside\n\nThe stack is settling: foundation models orchestrate, MCP is the retrieval interface, the chat UI is a commodity. The piece none of them provide from outside your walls is **knowledge authority** - which document is official, which clause your org actually uses, which answer carries approval.\n\nSo we made it a first-class layer. SWIRL 5 exposes an MCP server. Any agent - Claude, Copilot, ChatGPT, your own - calls SWIRL and gets ranked, permissioned, _organization-approved_ answers. A team pins the canonical result for a query once; every agent gets it after that. And no copy of your data leaves your tenant.\n\n##  Why this shape\n\nThree properties fall out of it, and they're the whole reason to build it this way:\n\n  * **Private by architecture.** Data stays in place; permissions are enforced live; there's no second index to govern.\n  * **The answer, not a guess.** Cross-encoder ranking plus canonical answers means people and agents get the result the org trusts.\n  * **The safe on-ramp to AI.** Headless and MCP-native, deployed in your tenant - the lowest-risk way to give agents enterprise reach.\n\n\n\nIf you're wiring agents into enterprise data and the \"just copy everything into a vector store\" step is making your security team twitch, there's another shape available. SWIRL 5 goes GA July 15; the preview is open if you want to point it at your own stack. Either way - I'd genuinely like to hear how you're handling the authority problem, because I don't think the industry has it figured out yet.\n\n_Sid Probstein is the creator of SWIRL and CEO of SWIRL AI._",
  "title": "The knowledge-authority layer: what your agents can't get from the outside"
}