Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreig6opretcjfcrxw6cz2kvsgfhp5hnb62ecidldndfstfbbotedp4m",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3moumfuc6mmu2"
  },
  "path": "/t/how-to-build-an-ai-that-can-answer-questions-about-your-website-tutorial/667079#post_5",
  "publishedAt": "2026-06-22T09:16:31.000Z",
  "site": "https://community.openai.com",
  "textContent": "For anyone coming to this later, I’d treat this less like “scrape a website and fine-tune the model” and more like a basic RAG setup.\n\nThe usual flow is:\n\n\n    >collect website content\n    >clean it into plain text\n    >split it into chunks\n    >create embeddings\n    >store chunks in a vector store\n    >on each user question, retrieve the most relevant chunks\n    >send those chunks + the question to the model\n\n\nThat way the model is not permanently trained on the website. It just gets the right page content at answer time.\n\nIf you own the website, I’d avoid scraping when possible. Pull the content directly from the CMS, database, sitemap, docs, product feed, or exported pages. It is cleaner and easier to keep updated.\n\nThis is also roughly the same idea behind simple embeddable AI chatbot widgets like Elfsight AI Chatbot or Chatling, where you add URLs, files, or FAQs as the bot’s knowledge source. The difference with a custom OpenAI setup is that you build and control the retrieval layer yourself.\n\nFor current OpenAI API work, I’d look at vector stores / retrieval and the newer Responses API flow rather than copying older `Completion.create` examples.",
  "title": "How to build an AI that can answer questions about your website - Tutorial"
}