{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreieprasf7kejwljsuiutcjwy4v524aosec655by6ze3aouepoogace",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhqzjfqg4js2"
  },
  "path": "/t/architecture-suggestions-for-a-chatbot-website-widget/174553#post_1",
  "publishedAt": "2026-03-23T14:08:23.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "**Personal Project:** My idea is to create a chatbot widget for a state-government-focused website. The goal is for the AI to answer questions based on a robust database (PDFs, legislation, and metadata). Currently, I use a RAG (Retrieval-Augmented Generation) system that handles simple metadata searches (e.g., “What is the Gazette for date X?”). I built this using pre-defined prefixes where the user asks a question, the system searches the database, checks the extracted metadata, and returns it. However, if a user asks, “Summarize the ordinance from Gazette X,” the system fails because it lacks the AI logic for that type of processing.\n\nI am facing technical limitations in two main pillars:\n\n  1. **Scalability:** How can I support 50+ concurrent users while maintaining performance?\n\n  2. **Synthesis Capability:** The current system locates the document but cannot “read” or summarize the internal content (e.g., “Summarize the ordinances from day X”) efficiently.\n\n\n\n\n### The Challenge\n\nMy database is structured, but the current retrieval logic is limited to search filters rather than an LLM operating over the file content. I need to evolve the architecture so the AI doesn’t just find the file but processes the text within it to generate contextual answers.\n\n### Specific Questions:\n\n  * **Orchestration:** For 50 concurrent users, what is the best stack to manage request queues and concurrency?\n\n  * **Context Processing:** How do you handle extensive documents (Gazettes/Laws) so that summaries fit within the LLM’s context window without losing crucial information?\n\n  * **Vector Infrastructure:** Which vector database do you recommend for this workload to ensure low latency?\n\n  * **Cost-Benefit Ratio:** Considering scale, is it more cost-effective to use API-based models (OpenAI/Anthropic) or local instances (e.g., Llama 3 via vLLM) to process these summaries?\n\n\n\n\n### Desired Workflow Example:\n\nThe user would ask:\n\n  * _“When did street parking legislation first emerge?”_\n\n  * _“How do I submit a law for floor approval?”_\n\n  * _“How do I file a complaint?”_\n\n\n\n\nThe AI should answer the question directly, similar to existing AIs (Gemini, ChatGPT, etc.). The user should have a fluid experience—asking about legislative processes or history and receiving a synthesized response based on my database, rather than just a link to a PDF.",
  "title": "Architecture Suggestions for a Chatbot (Website Widget)"
}