{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreichb3omnaxlgruuhf2mw274awpnf3ynvcjpy2lbiql4sx4v7jxo6m",
"uri": "at://did:plc:25rdn5elo5izoxrmtis34zuk/app.bsky.feed.post/3mogxq3wqe7p2"
},
"coverImage": {
"$type": "blob",
"ref": {
"$link": "bafkreibqkkmj442hrvlfm7q2q5f2oek22orbshqf43bdn65wppi4y7beyq"
},
"mimeType": "image/webp",
"size": 76132
},
"path": "/vishalmysore/foundation-vs-instruct-vs-chat-models-one-question-three-answers-3gi",
"publishedAt": "2026-06-16T23:08:32.000Z",
"site": "https://dev.to",
"tags": [
"ai",
"beginners",
"llm",
"tutorial",
"foundation_instruct_chat_tutorial.ipynb"
],
"textContent": "_A hands-on tutorial you can run for free in Google Colab._\n\n> **Run it yourself:** open foundation_instruct_chat_tutorial.ipynb in Google Colab and run every cell top to bottom. It uses the **SmolLM2-135M** family — small enough for a free CPU runtime, no GPU needed.\n\n## Why this confuses everyone\n\nPeople say \"LLM,\" \"GPT,\" \"an AI model,\" and \"ChatGPT\" as if they were the same thing. They aren't. There's a ladder of training stages between \"a model that read the internet\" and \"an assistant you can chat with,\" and the words **foundation** , **instruct** , and **chat** mark the rungs.\n\nThe cleanest way to feel the difference is to do something deliberately unfair: ask the **exact same question** to three versions of the **same model family** and watch how differently they behave. Our question is deliberately boring so the _behavior_ stands out:\n\n> **\"What is the capital of France?\"**\n\nWe use three checkpoints from Hugging Face's SmolLM2 family:\n\nModel type | Hugging Face ID | One-line summary\n---|---|---\nFoundation (base) | `HuggingFaceTB/SmolLM2-135M` | Predicts the next token. Knows things, isn't helpful.\nInstruct | `HuggingFaceTB/SmolLM2-135M-Instruct` | Fine-tuned to follow a single instruction.\nChat | `HuggingFaceTB/SmolLM2-135M-Instruct` (used conversationally) | Same weights, driven through a multi-turn message list.\n\nNotice that the chat row reuses the instruct checkpoint. That's not a shortcut — it's the honest reality, and we'll come back to why.\n\n## 1. The foundation model: a brilliant autocomplete\n\nA **foundation model** (also called a _base_ or _pretrained_ model) is trained on exactly one objective: given a stretch of text, **predict the next token**. Nothing else. It reads a huge slice of the internet and gets very good at continuing text in a statistically plausible way.\n\nWhat it is _never_ taught is that a question deserves an answer. So when you feed it:\n\n\n\n What is the capital of France?\n\n\nit doesn't think _\"I should answer that.\"_ It thinks _\"On the internet, what usually **comes after_ * a line like this?\"* And the answer is often… **more quiz questions** , a worksheet, or a tangent:\n\n\n\n What is the capital of France? What is the capital of Germany? What is the\n capital of Italy? ...\n\n\nIn the notebook we pass the raw string straight into the pipeline with no formatting:\n\n\n\n base_pipe = pipeline(\"text-generation\", model=\"HuggingFaceTB/SmolLM2-135M\")\n base_raw_out = base_pipe(test_query, max_new_tokens=30, do_sample=False)\n print(base_raw_out[0]['generated_text'])\n\n\n**Takeaway:** a foundation model is a **text completer** , not an assistant. It contains enormous knowledge but has no concept of being _helpful_. It's the raw clay everything else is shaped from.\n\n## 2. The instruct model: teaching the model to answer\n\nAn **instruct model** starts from that same base model and goes through a second stage of training — **fine-tuning on (instruction → response) pairs**. Thousands to millions of examples of the shape _\"Here's a request. Here's a good response.\"_ This teaches the model a new contract: **when the user asks for something, actually do it and then stop.**\n\nBut there's a crucial detail people miss: an instruct model only behaves correctly when you wrap your text in the **exact special format it was trained on.** That format uses control tokens — for SmolLM2 they look like this:\n\n\n\n <|im_start|>user\n What is the capital of France?<|im_end|>\n <|im_start|>assistant\n\n\nYou don't type those tokens by hand. Every instruct model ships with a **chat template** baked into its tokenizer that builds them for you:\n\n\n\n tokenizer = AutoTokenizer.from_pretrained(instruct_id)\n formatted_prompt = tokenizer.apply_chat_template(\n [{\"role\": \"user\", \"content\": test_query}],\n tokenize=False,\n add_generation_prompt=True, # appends the 'assistant' cue\n )\n\n\nFeed _that_ to the same-sized model and you get a clean, direct answer:\n\n\n\n The capital of France is Paris.\n\n\nThe notebook prints the formatted prompt **before** generating, so you can literally see the hidden scaffolding the model receives. That \"aha\" — _oh, there's a whole structure under the hood_ — is the most important thing in the tutorial.\n\n**Takeaway:** an instruct model = a base model **+ instruction tuning + a required prompt format**. Skip the format and even a well-trained instruct model can fall back to rambling.\n\n## 3. The chat model: memory across turns\n\nHere's the part that surprises people: a **chat model is usually the same weights as the instruct model.** The difference isn't _what_ the model is — it's _how you drive it._\n\nInstead of one instruction in, one response out, you maintain a **running list of role-tagged messages** :\n\n\n\n chat_history = [\n {\"role\": \"user\", \"content\": \"What is the capital of France?\"},\n ]\n chat_out = chat_pipe(chat_history, max_new_tokens=30)\n\n\nThe pipeline applies the chat template for you and returns the **whole conversation** with the assistant's reply appended. For a single turn, that looks identical to the instruct example. The magic only appears when the conversation **continues**.\n\nSo in the notebook we append the reply and ask a deliberately vague follow-up:\n\n\n\n conversation = chat_out[0]['generated_text'] # user + assistant so far\n conversation.append({\"role\": \"user\",\n \"content\": \"And what is a famous landmark there?\"})\n follow_up = chat_pipe(conversation, max_new_tokens=40)\n\n\nThe word **\"there\"** is meaningless on its own. But because we passed the _entire history_ , the model resolves \"there\" → **Paris** and names a landmark. That carried-over context is what turns a one-shot Q&A into something that feels like a conversation.\n\n**Takeaway:** a chat model is an instruct model **driven through a multi-turn message list** , so each new turn can use the previous turns as context. The system prompt, the `user`/`assistant` roles, and the growing history are the \"chat\" part.\n\n## The whole picture in one table\n\nModel | Trained to… | You give it… | Reply to _\"What is the capital of France?\"_\n---|---|---|---\n**Foundation** | continue text | a raw string | echoes / continues the document — may never answer\n**Instruct** | follow one instruction | a chat-templated string | a direct answer: _\"The capital of France is Paris.\"_\n**Chat** | converse over many turns | a list of messages | a direct answer **+ remembers context** for follow-ups\n\nRead top to bottom, it's a progression, not three unrelated things:\n\n 1. **Foundation** learns the world by predicting text.\n 2. **Instruct** fine-tunes that knowledge into _do-what-I-ask_ behavior — and demands a specific prompt format.\n 3. **Chat** wraps the instruct model in a _multi-turn interface_ so context flows across turns.\n\n\n\nWhen you talk to a commercial assistant, you're using stage 3, sitting on stage 2, built on stage 1.\n\n## A note on honesty and scale\n\nSmolLM2-135M is **tiny** — about 135 million parameters, versus the tens or hundreds of _billions_ in frontier models. At this size the model will sometimes get a fact wrong, repeat itself, or trail off. **That's expected, and it's not the point.** The tutorial is designed to make the _behavioral_ gap between the three modes visible on a free laptop or Colab CPU — not to win a trivia contest. The exact same three-stage structure scales all the way up to the largest models in production.\n\n## Run it and tinker\n\n 1. Open `foundation_instruct_chat_tutorial.ipynb` in Google Colab (`File → Open notebook → Upload`, or push it to GitHub and use the Colab badge).\n 2. Run all cells (`Runtime → Run all`). The first run downloads the models — give it a minute.\n 3. Experiment:\n * Change `test_query` to something open-ended like `\"Write a haiku about the sea.\"` and watch how the three modes diverge even more.\n * Set `do_sample=True` with `temperature=0.7` for more varied, creative output.\n * Swap in a larger sibling such as `HuggingFaceTB/SmolLM2-360M-Instruct` and feel the quality jump.\n\n\n\nOnce you've _seen_ the three behaviors with your own eyes, the vocabulary — base, instruct, chat, chat template, system prompt — stops being jargon and starts being obvious.\n\n_Happy experimenting!_ 🚀",
"title": "Foundation vs. Instruct vs. Chat Models: One Question, Three Answers"
}