External Publication

Foundation vs. Instruct vs. Chat Models: One Question, Three Answers

DEV Community [Unofficial] June 16, 2026

A hands-on tutorial you can run for free in Google Colab.

Run it yourself: open foundation_instruct_chat_tutorial.ipynb in Google Colab and run every cell top to bottom. It uses the SmolLM2-135M family — small enough for a free CPU runtime, no GPU needed.

Why this confuses everyone

People say "LLM," "GPT," "an AI model," and "ChatGPT" as if they were the same thing. They aren't. There's a ladder of training stages between "a model that read the internet" and "an assistant you can chat with," and the words foundation , instruct , and chat mark the rungs.

The cleanest way to feel the difference is to do something deliberately unfair: ask the exact same question to three versions of the same model family and watch how differently they behave. Our question is deliberately boring so the behavior stands out:

"What is the capital of France?"

We use three checkpoints from Hugging Face's SmolLM2 family:

Model type	Hugging Face ID	One-line summary
Foundation (base)	`HuggingFaceTB/SmolLM2-135M`	Predicts the next token. Knows things, isn't helpful.
Instruct	`HuggingFaceTB/SmolLM2-135M-Instruct`	Fine-tuned to follow a single instruction.
Chat	`HuggingFaceTB/SmolLM2-135M-Instruct` (used conversationally)	Same weights, driven through a multi-turn message list.

Notice that the chat row reuses the instruct checkpoint. That's not a shortcut — it's the honest reality, and we'll come back to why.

1. The foundation model: a brilliant autocomplete

A foundation model (also called a base or pretrained model) is trained on exactly one objective: given a stretch of text, predict the next token. Nothing else. It reads a huge slice of the internet and gets very good at continuing text in a statistically plausible way.

What it is never taught is that a question deserves an answer. So when you feed it:

What is the capital of France?

it doesn't think "I should answer that." It thinks "On the internet, what usually **comes after * a line like this?"* And the answer is often… more quiz questions , a worksheet, or a tangent:

What is the capital of France? What is the capital of Germany? What is the
capital of Italy? ...

In the notebook we pass the raw string straight into the pipeline with no formatting:

base_pipe = pipeline("text-generation", model="HuggingFaceTB/SmolLM2-135M")
base_raw_out = base_pipe(test_query, max_new_tokens=30, do_sample=False)
print(base_raw_out[0]['generated_text'])

Takeaway: a foundation model is a text completer , not an assistant. It contains enormous knowledge but has no concept of being helpful. It's the raw clay everything else is shaped from.

2. The instruct model: teaching the model to answer

An instruct model starts from that same base model and goes through a second stage of training — fine-tuning on (instruction → response) pairs. Thousands to millions of examples of the shape "Here's a request. Here's a good response." This teaches the model a new contract: when the user asks for something, actually do it and then stop.

But there's a crucial detail people miss: an instruct model only behaves correctly when you wrap your text in the exact special format it was trained on. That format uses control tokens — for SmolLM2 they look like this:

<|im_start|>user
What is the capital of France?<|im_end|>
<|im_start|>assistant

You don't type those tokens by hand. Every instruct model ships with a chat template baked into its tokenizer that builds them for you:

tokenizer = AutoTokenizer.from_pretrained(instruct_id)
formatted_prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": test_query}],
    tokenize=False,
    add_generation_prompt=True,  # appends the 'assistant' cue
)

Feed that to the same-sized model and you get a clean, direct answer:

The capital of France is Paris.

The notebook prints the formatted prompt before generating, so you can literally see the hidden scaffolding the model receives. That "aha" — oh, there's a whole structure under the hood — is the most important thing in the tutorial.

Takeaway: an instruct model = a base model + instruction tuning + a required prompt format. Skip the format and even a well-trained instruct model can fall back to rambling.

3. The chat model: memory across turns

Here's the part that surprises people: a chat model is usually the same weights as the instruct model. The difference isn't what the model is — it's how you drive it.

Instead of one instruction in, one response out, you maintain a running list of role-tagged messages :

chat_history = [
    {"role": "user", "content": "What is the capital of France?"},
]
chat_out = chat_pipe(chat_history, max_new_tokens=30)

The pipeline applies the chat template for you and returns the whole conversation with the assistant's reply appended. For a single turn, that looks identical to the instruct example. The magic only appears when the conversation continues.

So in the notebook we append the reply and ask a deliberately vague follow-up:

conversation = chat_out[0]['generated_text']        # user + assistant so far
conversation.append({"role": "user",
                     "content": "And what is a famous landmark there?"})
follow_up = chat_pipe(conversation, max_new_tokens=40)

The word "there" is meaningless on its own. But because we passed the entire history , the model resolves "there" → Paris and names a landmark. That carried-over context is what turns a one-shot Q&A into something that feels like a conversation.

Takeaway: a chat model is an instruct model driven through a multi-turn message list , so each new turn can use the previous turns as context. The system prompt, the user/assistant roles, and the growing history are the "chat" part.

The whole picture in one table

Model	Trained to…	You give it…	Reply to "What is the capital of France?"
Foundation	continue text	a raw string	echoes / continues the document — may never answer
Instruct	follow one instruction	a chat-templated string	a direct answer: "The capital of France is Paris."
Chat	converse over many turns	a list of messages	a direct answer + remembers context for follow-ups

Read top to bottom, it's a progression, not three unrelated things:

Foundation learns the world by predicting text.
Instruct fine-tunes that knowledge into do-what-I-ask behavior — and demands a specific prompt format.
Chat wraps the instruct model in a multi-turn interface so context flows across turns.

When you talk to a commercial assistant, you're using stage 3, sitting on stage 2, built on stage 1.

A note on honesty and scale

SmolLM2-135M is tiny — about 135 million parameters, versus the tens or hundreds of billions in frontier models. At this size the model will sometimes get a fact wrong, repeat itself, or trail off. That's expected, and it's not the point. The tutorial is designed to make the behavioral gap between the three modes visible on a free laptop or Colab CPU — not to win a trivia contest. The exact same three-stage structure scales all the way up to the largest models in production.

Run it and tinker

Open foundation_instruct_chat_tutorial.ipynb in Google Colab (File → Open notebook → Upload, or push it to GitHub and use the Colab badge).
Run all cells (Runtime → Run all). The first run downloads the models — give it a minute.
Experiment:
- Change test_query to something open-ended like "Write a haiku about the sea." and watch how the three modes diverge even more.
- Set do_sample=True with temperature=0.7 for more varied, creative output.
- Swap in a larger sibling such as HuggingFaceTB/SmolLM2-360M-Instruct and feel the quality jump.

Once you've seen the three behaviors with your own eyes, the vocabulary — base, instruct, chat, chat template, system prompt — stops being jargon and starts being obvious.

Happy experimenting! 🚀