External Publication

The knowledge-authority layer: what your agents can't get from the outside

DEV Community [Unofficial] June 17, 2026

Every enterprise AI conversation right now starts in the same place: "connect the model to our data." Then it stalls in the same place: which data, copied where , governed by whom.

I build retrieval for a living (I wrote the original open-source SWIRL), so let me make an argument that runs against the current default - and then show the architecture it implies.

The default is a second copy of your data

The standard RAG recipe is: crawl your sources, chunk them, embed them, and load the vectors into a database. Now your model can retrieve. It also means you have a second copy of your content living in an index you have to secure, keep in sync, and explain to whoever owns compliance. You've recreated every permission boundary by hand, and you'll eventually get one wrong.

For a lot of teams that copy is simply not allowed. Regulated content, client-confidential material, anything privileged - copying it into a vendor store is exposure you don't get paid to take on.

You probably don't need the vector database

Here's the part people don't want to hear. Meta's XetHub team benchmarked three retrieval strategies: keyword-only (BM25), vector-only, and hybrid (keyword to pull candidates, then re-rank). Keyword-only came last. Vector-only did better.

Hybrid won - and their conclusion was blunt: "No vector database necessary."

That matches what we see in production. Vector similarity is a great high-precision filter , not a great first pass. Lead with exact matches and quoted terms, then let embeddings and a cross-encoder re-rank what's left.

What "make your LLM better" actually means

It's not a slogan; it's a pipeline. In SWIRL, relevance is three passes, and both models run locally:

Federate and match. Query every connected source in parallel - keyword + BM25 - and honor quoted phrases and exact terms first.
Embedding re-rank. Re-rank candidates with E5-large-v2, using title-aware chunking and hybrid keyword+vector fusion (RRF). No vector database to build or secure.
Cross-encoder re-rank. An MS-MARCO cross-encoder reads the query and document together and scores real relevance, not vector distance.

Feed that to your LLM - whatever model you've chosen, including an on-prem one - and the answer gets better, because the context got better. Same model, sharper input.

The layer no model supplies from the outside

The stack is settling: foundation models orchestrate, MCP is the retrieval interface, the chat UI is a commodity. The piece none of them provide from outside your walls is knowledge authority - which document is official, which clause your org actually uses, which answer carries approval.

So we made it a first-class layer. SWIRL 5 exposes an MCP server. Any agent - Claude, Copilot, ChatGPT, your own - calls SWIRL and gets ranked, permissioned, organization-approved answers. A team pins the canonical result for a query once; every agent gets it after that. And no copy of your data leaves your tenant.

Why this shape

Three properties fall out of it, and they're the whole reason to build it this way:

Private by architecture. Data stays in place; permissions are enforced live; there's no second index to govern.
The answer, not a guess. Cross-encoder ranking plus canonical answers means people and agents get the result the org trusts.
The safe on-ramp to AI. Headless and MCP-native, deployed in your tenant - the lowest-risk way to give agents enterprise reach.

If you're wiring agents into enterprise data and the "just copy everything into a vector store" step is making your security team twitch, there's another shape available. SWIRL 5 goes GA July 15; the preview is open if you want to point it at your own stack. Either way - I'd genuinely like to hear how you're handling the authority problem, because I don't think the industry has it figured out yet.

Sid Probstein is the creator of SWIRL and CEO of SWIRL AI.

The default is a second copy of your data

You probably don't need the vector database

What "make your LLM better" actually means

The layer no model supplies from the outside

Why this shape

Discussion in the ATmosphere