Architecture Suggestions for a Chatbot (Website Widget)
Personal Project: My idea is to create a chatbot widget for a state-government-focused website. The goal is for the AI to answer questions based on a robust database (PDFs, legislation, and metadata). Currently, I use a RAG (Retrieval-Augmented Generation) system that handles simple metadata searches (e.g., “What is the Gazette for date X?”). I built this using pre-defined prefixes where the user asks a question, the system searches the database, checks the extracted metadata, and returns it. However, if a user asks, “Summarize the ordinance from Gazette X,” the system fails because it lacks the AI logic for that type of processing.
I am facing technical limitations in two main pillars:
Scalability: How can I support 50+ concurrent users while maintaining performance?
Synthesis Capability: The current system locates the document but cannot “read” or summarize the internal content (e.g., “Summarize the ordinances from day X”) efficiently.
The Challenge
My database is structured, but the current retrieval logic is limited to search filters rather than an LLM operating over the file content. I need to evolve the architecture so the AI doesn’t just find the file but processes the text within it to generate contextual answers.
Specific Questions:
Orchestration: For 50 concurrent users, what is the best stack to manage request queues and concurrency?
Context Processing: How do you handle extensive documents (Gazettes/Laws) so that summaries fit within the LLM’s context window without losing crucial information?
Vector Infrastructure: Which vector database do you recommend for this workload to ensure low latency?
Cost-Benefit Ratio: Considering scale, is it more cost-effective to use API-based models (OpenAI/Anthropic) or local instances (e.g., Llama 3 via vLLM) to process these summaries?
Desired Workflow Example:
The user would ask:
“When did street parking legislation first emerge?”
“How do I submit a law for floor approval?”
“How do I file a complaint?”
The AI should answer the question directly, similar to existing AIs (Gemini, ChatGPT, etc.). The user should have a fluid experience—asking about legislative processes or history and receiving a synthesized response based on my database, rather than just a link to a PDF.
Discussion in the ATmosphere