External Publication
Visit Post

Dotty: An open source framework for LLM-website communication

Hugging Face Forums [Unofficial] June 9, 2026
Source

I built this because I kept running into the same problem: LLM agents have no standard way to query a website’s content semantically. You either scrape HTML on the fly (slow, expensive), rely on third-party search indices (stale, no privacy), or build a full hosted RAG pipeline (vendor lock-in).

Dotty is a two-endpoint protocol — a website pre-vectorises its content offline using whichever embedding models it wants, stores the vectors in a sqlite-vec file, and exposes a search endpoint that accepts a query vector and returns ranked text chunks. The agent embeds the query itself (it already has model access) and sends the vector. No embedding happens on the server at query time.

The analogy I keep using: it’s like robots.txt or sitemap.xml — a simple machine-readable convention that gives website owners control over how automated agents interact with their content.

github.com

GitHub - dotty-protocol/dotty: An open protocol for websites to expose semantic...

An open protocol for websites to expose semantic search to LLM

s8knowledge.co.uk

S8 Knowledge Integration — Introducing Dotty: a semantic search protocol for...

Happy to answer questions about the design, the sqlite-vec choice, or the chunking strategy.

Discussion in the ATmosphere

Loading comments...