Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiftuogv7nizawathlbgbkfcw3i2c6h7q67v2ztxsuqxaigfhtb654",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mntpmwfm5d52"
  },
  "path": "/t/dotty-an-open-source-framework-for-llm-website-communication/176637#post_1",
  "publishedAt": "2026-06-09T07:31:02.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "github.com",
    "GitHub - dotty-protocol/dotty: An open protocol for websites to expose semantic...",
    "s8knowledge.co.uk",
    "S8 Knowledge Integration — Introducing Dotty: a semantic search protocol for..."
  ],
  "textContent": "I built this because I kept running into the same problem: LLM agents have no standard way to query a website’s content semantically. You either scrape HTML on the fly (slow, expensive), rely on third-party search indices (stale, no privacy), or build a full hosted RAG pipeline (vendor lock-in).\n\nDotty is a two-endpoint protocol — a website pre-vectorises its content offline using whichever embedding models it wants, stores the vectors in a sqlite-vec file, and exposes a search endpoint that accepts a query vector and returns ranked text chunks. The agent embeds the query itself (it already has model access) and sends the vector. No embedding happens on the server at query time.\n\nThe analogy I keep using: it’s like robots.txt or sitemap.xml — a simple machine-readable convention that gives website owners control over how automated agents interact with their content.\n\ngithub.com\n\n### GitHub - dotty-protocol/dotty: An open protocol for websites to expose semantic...\n\nAn open protocol for websites to expose semantic search to LLM\n\ns8knowledge.co.uk\n\n### S8 Knowledge Integration — Introducing Dotty: a semantic search protocol for...\n\nHappy to answer questions about the design, the sqlite-vec choice, or the chunking strategy.",
  "title": "Dotty: An open source framework for LLM-website communication"
}