External Publication
Visit Post

What prompt format/template should I use for training Unsloth/Phi-3.5-mini-instruct?

Hugging Face Forums [Unofficial] June 25, 2026
Source

Hmm… personally, when the model has one available (very old models sometimes do not) I think using the Chat Template is often the safer choice:


Short answer

I would keep your raw {input, output} data structure, but I would not use the hand-written

### Input: ...
### Output: ...
<|endoftext|>

string as my default training format for Phi-3.5-mini-instruct.

For this model, I would start with this route instead:

  1. keep your raw examples as {input, output};
  2. convert each row into messages;
  3. render those messages with the tokenizer/model chat template;
  4. use add_generation_prompt=False for training;
  5. use the same chat-style structure again when serving the model through your API.

The key idea is to separate how you store your dataset from the final text/token format the model sees.

A minimal shape would be:

import json

messages = [
    {
        "role": "system",
        "content": (
            "You convert receipt data into the requested spending insight JSON. "
            "Return only valid JSON matching the expected output shape."
        ),
    },
    {
        "role": "user",
        "content": json.dumps(example["input"], ensure_ascii=False),
    },
    {
        "role": "assistant",
        "content": json.dumps(example["output"], ensure_ascii=False),
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=False,
)

Then use that rendered text in your dataset.

This is not a claim that your current ### Input / ### Output string can never train. It probably can train as plain text-completion data. I would just treat the Phi-3.5 chat template as the lower-risk default for an already instruction-tuned chat model.

More detail: why I would start from the Phi-3.5 chat template (click for more details)

Recommended data flow

I would structure it like this:

raw {input, output}
    -> messages: system / user / assistant
    -> tokenizer.apply_chat_template(...)
    -> text column
    -> SFTTrainer

So your current raw data can stay mostly as-is. The main change is the formatting function.

def format_example(example, tokenizer):
    messages = [
        {
            "role": "system",
            "content": (
                "You convert receipt data into the requested spending insight JSON. "
                "Return only valid JSON matching the expected output shape."
            ),
        },
        {
            "role": "user",
            "content": json.dumps(example["input"], ensure_ascii=False),
        },
        {
            "role": "assistant",
            "content": json.dumps(example["output"], ensure_ascii=False),
        },
    ]

    return {
        "text": tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=False,
        )
    }

Before training, print one rendered example.

formatted = format_example(train_data[0], tokenizer)
print(formatted["text"])

You want to confirm that the final string actually looks like a Phi-style chat conversation, not like a mixture of multiple prompt formats.

More detail: training format vs inference/API format (click for more details)

Minimal checks before a real training run

Before starting a full job, I would inspect the tokenizer and one formatted sample.

print("=== chat template ===")
print(tokenizer.chat_template)

print("=== eos token ===")
print(tokenizer.eos_token)

print("=== special tokens ===")
print(tokenizer.special_tokens_map)

print("=== rendered example ===")
print(text)

Things I would check:

Check Why
The rendered text contains a user turn Confirms the input JSON is in the expected role.
The rendered text contains an assistant turn Confirms the output JSON is treated as the target answer.
The assistant JSON is complete Avoids training on truncated/malformed outputs.
Turn boundaries are consistent Avoids mixing multiple stop/end conventions.
You are not manually inserting extra special tokens Avoids EOS / stop-token confusion.
The same message structure can be used by the API Avoids train/serve mismatch.

More detail: special tokens, <|end|>, and <|endoftext|> (click for more details)

If you like ### Input, keep it inside the user message

If you prefer the readability of headings like ### Input, I would put those inside the user content, not use them as the outer conversation template.

For example:

messages = [
    {
        "role": "system",
        "content": "Return only the requested JSON object.",
    },
    {
        "role": "user",
        "content": (
            "### Input\n"
            + json.dumps(example["input"], ensure_ascii=False)
            + "\n\n### Task\n"
            + "Generate the spending insight JSON."
        ),
    },
    {
        "role": "assistant",
        "content": json.dumps(example["output"], ensure_ascii=False),
    },
]

text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=False,
)

This keeps the outer format aligned with the model’s chat template, while still letting your task content have readable headings.

Optional layer: response-only / assistant-only loss (click for more details) Optional layer: JSON reliability for an API (click for more details)

Practical summary

My first implementation would be:

import json

def format_example(example, tokenizer):
    messages = [
        {
            "role": "system",
            "content": (
                "You convert receipt data into the requested spending insight JSON. "
                "Return only valid JSON matching the expected output shape."
            ),
        },
        {
            "role": "user",
            "content": json.dumps(example["input"], ensure_ascii=False),
        },
        {
            "role": "assistant",
            "content": json.dumps(example["output"], ensure_ascii=False),
        },
    ]

    return {
        "text": tokenizer.apply_chat_template(
            messages,
            tokenize=False,
            add_generation_prompt=False,
        )
    }

Then inspect before training:

formatted = format_example(train_data[0], tokenizer)
print(formatted["text"])

And for API inference, use the same message structure, but without the assistant answer:

messages = [
    {
        "role": "system",
        "content": (
            "You convert receipt data into the requested spending insight JSON. "
            "Return only valid JSON matching the expected output shape."
        ),
    },
    {
        "role": "user",
        "content": json.dumps(input_payload, ensure_ascii=False),
    },
]

If using Transformers directly:

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
)

So my practical recommendation would be:

  • keep {input, output} as the raw data format;
  • convert rows into messages;
  • use the Phi-3.5 tokenizer chat template for the final training text;
  • use add_generation_prompt=False for training;
  • use the same chat structure for API inference;
  • do not manually append <|endoftext|> until you have inspected what the tokenizer already emits;
  • treat response-only loss and production JSON validation as separate follow-up layers.

References (click for more details)

Discussion in the ATmosphere

Loading comments...