Automatic -100 masking of the questions in Labels
Seems expected behaviour?
Automatic -100 masking of question/user tokens in TRL SFT: messages vs prompt/completion
Summary
The behavior you are seeing is expected.
transformers.DataCollatorForLanguageModeling does not automatically mask the user/question part of a chat example. It is not role-aware. It does not know what a system, user, or assistant message is. For ordinary causal language modeling with mlm=False, it prepares next-token-prediction labels from the input sequence, with padding ignored. In other words, if nothing else intervenes, the usual pattern is roughly:
labels = input_ids.copy()
labels[padding_positions] = -100
That is padding masking , not prompt/question masking.
The current TRL replacement for the old DataCollatorForCompletionOnlyLM path is not “use the Transformers LM collator and hope it infers the answer span.” The current route is:
- use
SFTTrainer - use the correct dataset format
- use the correct
SFTConfigloss option - inspect one real batch before training
The key distinction is:
{"messages": [...]} -> conversational language-modeling dataset
{"prompt": ..., "completion": ...} -> prompt-completion dataset
So, for your current messages: system-user-assistant dataset, completion_only_loss=True is not the right mental model unless you first convert the examples to a real prompt + completion format.
Relevant docs / source:
- TRL SFTTrainer docs — dataset formats, assistant-only loss, completion-only loss
- TRL SFTTrainer source — collator routing, dataset_text_field, completion_only_loss, masks
- Transformers causal language modeling docs — DataCollatorForLanguageModeling(..., mlm=False)
- TRL discussion — DataCollatorForCompletionOnlyLM removed; use SFTConfig(completion_only_loss=True) with prompt-completion data
What went wrong conceptually
You currently have data shaped like this:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
]
}
That is a conversational dataset.
It tells TRL:
This is a conversation. Apply the model's chat template.
It does not automatically mean:
system/user tokens -> labels = -100
assistant tokens -> labels = token ids
That extra masking requires either:
- a prompt-completion boundary, using
{"prompt": ..., "completion": ...}pluscompletion_only_loss=True, or - assistant-span masks, using
{"messages": [...]}plusassistant_only_loss=True, if the chat template supports assistant masks.
Why your printed labels have no -100
You printed something like:
labels = tensor([
248045, 8678, 198, 2523, 513, 264, 10631, 1558, 421,
5529, 2708, 321, 61446, 10926, 13, 248046, 198, 248045,
846, 198, 4199, 1599, 12417, 1467, 68677, 3222, 506,
18773, 30, 248046, 198, 248045, 74455, 198, 248068, 271,
248069, 271, 11280, 12417, 25, 7884, 24093, 10254, 318,
30425, 220, 17, 15, 15, 18, 310, 6443, 220,
17, 15, 15, 19, 553, 248046, 198
])
This looks like full-sequence causal-LM labels: the labels are token IDs across the whole rendered chat sequence. That usually means the trainer/collator path is training on the full conversation, not just the assistant answer.
Conceptually, if the rendered input is:
<system>
You are a helpful assistant.
</system>
<user>
What is the answer?
</user>
<assistant>
The answer is ...
</assistant>
ordinary causal-LM labeling supervises the whole sequence:
<system> -> supervised
system text -> supervised
<user> -> supervised
question -> supervised
<assistant> -> supervised
answer -> supervised
But what you expected was:
<system> -> -100
system text -> -100
<user> -> -100
question -> -100
<assistant> -> maybe supervised, maybe ignored depending on template
answer -> supervised
That second behavior is not provided by transformers.DataCollatorForLanguageModeling.
The first model’s advice: what was right and what was wrong
Claim 1
Use
from trl import DataCollatorForCompletionOnlyLM; it handles-100masking automatically.
This was historically plausible advice for older TRL examples, but it is not the current path. The class has been removed in current TRL versions. A TRL maintainer explicitly recommends using completion_only_loss=True in SFTConfig with a prompt-completion dataset instead:
- TRL discussion: DataCollatorForCompletionOnlyLM removed
So the correction was valid: importing DataCollatorForCompletionOnlyLM is no longer the right current solution.
Claim 2
Use
transformers.DataCollatorForLanguageModelingwith a prompt-completion dataset andcompletion_only_loss=True; that will take care of the masking.
This is partly right but imprecise enough to be misleading.
The precise version is:
Use TRL
SFTTrainerwith a prompt-completion dataset andSFTConfig(completion_only_loss=True). Do not rely ontransformers.DataCollatorForLanguageModelingitself to mask user/question tokens.
completion_only_loss=True is a TRL SFT setting. It is not a setting of transformers.DataCollatorForLanguageModeling.
The two correct ways to get response-only masking
Option A — Recommended for your case: convert messages to prompt + completion
For examples that are basically:
system + user question -> assistant answer
the cleanest format is:
{
"prompt": [
{"role": "system", "content": "SYSTEM"},
{"role": "user", "content": "QUESTION"},
],
"completion": [
{"role": "assistant", "content": "ANSWER"},
],
}
Then use:
SFTConfig(completion_only_loss=True)
The TRL docs describe completion-only training as the path for prompt-completion datasets:
- TRL docs — train on completion only
Conversion code
from trl import SFTConfig, SFTTrainer
def to_prompt_completion(example):
messages = example["messages"]
if not isinstance(messages, list):
raise TypeError("Expected example['messages'] to be a list.")
if len(messages) < 2:
raise ValueError("Expected at least one prompt message and one assistant response.")
if messages[-1]["role"] != "assistant":
raise ValueError(
f"Expected final message to be assistant, got {messages[-1]['role']!r}."
)
return {
"prompt": messages[:-1],
"completion": [messages[-1]],
}
train_dataset = train_dataset.map(
to_prompt_completion,
remove_columns=train_dataset.column_names,
)
args = SFTConfig(
output_dir="sft-out",
completion_only_loss=True,
packing=False, # keep False until masking is verified
max_length=2048, # adjust for your model/data
)
trainer = SFTTrainer(
model=model,
args=args,
train_dataset=train_dataset,
processing_class=tokenizer,
)
This is the option I would use for your described dataset.
Why? Because your real training objective is:
Given system + user, learn to generate assistant.
That is a prompt-completion objective.
Option B — Keep messages, but use assistant_only_loss=True
If you want to keep the original conversational format:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
]
}
then the relevant TRL setting is:
SFTConfig(assistant_only_loss=True)
The TRL docs describe this as training only on assistant messages while ignoring user/system messages:
- TRL docs — train on assistant messages only
However, there is an important caveat: assistant-only loss requires the chat template to support assistant-token masks using {% generation %} and {% endgeneration %} markers. TRL can patch some known model-family templates, but for other models you should check.
Template check
template = tokenizer.chat_template or ""
print("has generation start:", "{% generation %}" in template)
print("has generation end: ", "{% endgeneration %}" in template)
If both are false, do not blindly trust assistant_only_loss=True.
This is why, for simple one-turn system-user-assistant examples, I prefer converting to prompt + completion: it makes the boundary explicit and is easier to audit.
Which option should you choose?
For your case, I would choose Option A :
{"prompt": messages[:-1], "completion": [messages[-1]]}
with:
SFTConfig(completion_only_loss=True)
Use Option B only if you have genuine multi-turn conversations and want to train on every assistant turn.
Example where assistant_only_loss=True makes sense:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "Q1"},
{"role": "assistant", "content": "A1"},
{"role": "user", "content": "Q2"},
{"role": "assistant", "content": "A2"},
]
}
If you want to train only on the final assistant turn, even in a multi-turn conversation, prompt + completion is still clearer:
{
"prompt": [
{"role": "system", "content": "..."},
{"role": "user", "content": "Q1"},
{"role": "assistant", "content": "A1"},
{"role": "user", "content": "Q2"},
],
"completion": [
{"role": "assistant", "content": "A2"},
],
}
Does dataset_text_field="messages" help?
No, not in the way you are asking.
dataset_text_field is not a response-mask setting. It does not mean:
system/user -> -100
assistant -> train
It is for identifying the text column in standard text datasets. In the TRL source, dataset_text_field is described as relevant for standard dataset format, and the collator routing distinguishes language-modeling examples from prompt-completion examples:
- examples with
"messages"or the configured text field go through the language-modeling path - examples with
"prompt"and"completion"go through the prompt-completion path
Relevant source:
- TRL SFTTrainer source
So this:
dataset_text_field="messages"
does not turn a messages column into a prompt-completion dataset. It also does not request assistant-only masking.
Use these mental models instead:
# Plain text language modeling
{"text": "..."}
SFTConfig(dataset_text_field="text")
# Conversational assistant-only SFT
{"messages": [...]}
SFTConfig(assistant_only_loss=True)
# Completion-only SFT
{"prompt": [...], "completion": [...]}
SFTConfig(completion_only_loss=True)
Why completion_only_loss=True is not enough for plain messages
completion_only_loss=True requires a completion boundary.
A plain messages row has roles, but it does not explicitly split the example into:
prompt part
completion part
A prompt-completion row does:
{
"prompt": [...],
"completion": [...],
}
That explicit boundary is what lets TRL create a completion mask.
So this is not enough:
{
"messages": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
{"role": "assistant", "content": "..."},
]
}
with:
SFTConfig(completion_only_loss=True)
For this dataset shape, use either:
SFTConfig(assistant_only_loss=True)
or convert to:
{
"prompt": messages[:-1],
"completion": [messages[-1]],
}
and then use:
SFTConfig(completion_only_loss=True)
Do not pass the wrong collator
If you want TRL SFT masking, I would not pass this:
from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(
tokenizer=tokenizer,
mlm=False,
)
That collator is for ordinary causal LM. It is not the mechanism that masks user/question tokens.
Prefer:
trainer = SFTTrainer(
model=model,
args=SFTConfig(completion_only_loss=True),
train_dataset=prompt_completion_dataset,
processing_class=tokenizer,
)
and let TRL’s SFT pipeline create/apply the relevant masks.
The TRL source shows the important mechanism: labels are initially cloned from input_ids, padding positions are set to -100, and then completion masks / assistant masks can further set non-target labels to -100.
Relevant source:
- TRL SFTTrainer source
Important caveat: formatting_func can remove the boundary
Be careful with formatting_func.
If you convert your examples to prompt + completion, but then pass a formatting_func that renders everything into one string, you may accidentally turn the dataset back into ordinary language modeling.
Risky pattern:
def formatting_func(example):
return render_prompt_and_answer_as_one_string(example)
trainer = SFTTrainer(
model=model,
args=SFTConfig(completion_only_loss=True),
train_dataset=prompt_completion_dataset,
formatting_func=formatting_func,
processing_class=tokenizer,
)
Safer pattern:
trainer = SFTTrainer(
model=model,
args=SFTConfig(completion_only_loss=True),
train_dataset=prompt_completion_dataset,
processing_class=tokenizer,
)
Let the trainer see the structured prompt and completion fields.
How to verify that masking is correct
Do not only print the numeric tensor. Decode the supervised span.
Helper 1 — Decode the full input and supervised labels
def inspect_supervised_span(trainer, tokenizer, row=0):
batch = next(iter(trainer.get_train_dataloader()))
input_ids = batch["input_ids"][row].tolist()
labels = batch["labels"][row].tolist()
supervised_ids = [
token_id
for token_id, label_id in zip(input_ids, labels)
if label_id != -100
]
ignored_count = sum(label_id == -100 for label_id in labels)
supervised_count = sum(label_id != -100 for label_id in labels)
print("=" * 100)
print("FULL INPUT")
print(tokenizer.decode(input_ids, skip_special_tokens=False))
print("=" * 100)
print("SUPERVISED LABEL SPAN")
print(tokenizer.decode(supervised_ids, skip_special_tokens=False))
print("=" * 100)
print("COUNTS")
print("total tokens: ", len(input_ids))
print("ignored tokens: ", ignored_count)
print("supervised tokens:", supervised_count)
Good result:
FULL INPUT
<system>...<user>QUESTION<assistant>ANSWER...
SUPERVISED LABEL SPAN
<assistant>ANSWER...
Also possibly good, depending on template boundary:
SUPERVISED LABEL SPAN
ANSWER...
Bad result:
SUPERVISED LABEL SPAN
<system>...<user>QUESTION<assistant>ANSWER...
That means you are still training on the question.
Helper 2 — Token-by-token label table
def print_label_table(trainer, tokenizer, n_tokens=200, row=0):
batch = next(iter(trainer.get_train_dataloader()))
input_ids = batch["input_ids"][row].tolist()
labels = batch["labels"][row].tolist()
for i, (token_id, label_id) in enumerate(zip(input_ids, labels)):
if i >= n_tokens:
break
token = tokenizer.decode([token_id], skip_special_tokens=False)
status = "LOSS" if label_id != -100 else "IGNORED"
print(
f"{i:04d} | {status:7s} | "
f"input_id={token_id:8d} | label={label_id:8d} | {token!r}"
)
Expected pattern:
0000 | IGNORED | system marker
0001 | IGNORED | system text
...
0015 | IGNORED | user marker
0016 | IGNORED | user question
...
0030 | LOSS | assistant marker or answer
0031 | LOSS | answer text
Bad pattern:
0000 | LOSS | system marker
...
0015 | LOSS | user marker
0016 | LOSS | user question
Should the assistant role marker be supervised?
You may see either of these after conversion:
SUPERVISED LABEL SPAN
<assistant>ANSWER<end>
or:
SUPERVISED LABEL SPAN
ANSWER<end>
Both can be defensible.
Usually acceptable:
<assistant>ANSWER<end>
This teaches the model to produce a full assistant turn in the model’s chat format.
Also acceptable:
ANSWER<end>
This treats the assistant-start marker as part of the prompt and supervises only the answer body.
Usually not what you want:
<system>...<user>QUESTION<assistant>ANSWER<end>
That trains on the system/user prompt too.
The key thing is that the user/question tokens should be -100 if your goal is response-only SFT.
What -100 means here
-100 is the ignore index commonly used by PyTorch/Hugging Face causal-LM training. In this context:
label = token id -> compute loss
label = -100 -> ignore this token position in the loss
It does not mean the token is hidden from the model.
The ideal response-only SFT setup is:
system/user tokens:
visible to the model as context
ignored in the loss
assistant tokens:
visible to the model
included in the loss
This is exactly what you usually want for instruction tuning: the model conditions on the prompt, but it is not trained to reproduce the prompt.
Practical final answer
There is no Transformers DataCollatorForLanguageModeling setting that says “mask user/question tokens.” That collator is not chat-role-aware.
There are TRL SFT settings that can do the masking, but they depend on dataset format:
- For a
messagesconversational dataset, use:
SFTConfig(assistant_only_loss=True)
but verify that the chat template supports assistant masks.
- For a prompt-completion dataset, use:
SFTConfig(completion_only_loss=True)
with examples shaped like:
{
"prompt": [
{"role": "system", "content": "..."},
{"role": "user", "content": "..."},
],
"completion": [
{"role": "assistant", "content": "..."},
],
}
For your system-user-assistant examples, I would convert to prompt + completion and use completion_only_loss=True.
Minimal working pattern for your case
from trl import SFTConfig, SFTTrainer
def to_prompt_completion(example):
messages = example["messages"]
if messages[-1]["role"] != "assistant":
raise ValueError("Expected the final message to be the assistant answer.")
return {
"prompt": messages[:-1],
"completion": [messages[-1]],
}
train_dataset = train_dataset.map(
to_prompt_completion,
remove_columns=train_dataset.column_names,
)
args = SFTConfig(
output_dir="sft-out",
completion_only_loss=True,
packing=False,
max_length=2048,
)
trainer = SFTTrainer(
model=model,
args=args,
train_dataset=train_dataset,
processing_class=tokenizer,
)
inspect_supervised_span(trainer, tokenizer)
Only after the supervised span is correct:
trainer.train()
Checklist
Use this checklist before training:
Do I want to train only on the final assistant answer?
- Use
prompt+completion. - Use
completion_only_loss=True.
- Use
Do I want to train on all assistant turns in a multi-turn conversation?
- Keep
messages. - Use
assistant_only_loss=True. - Verify the chat template supports
{% generation %}/{% endgeneration %}.
- Keep
Am I using
transformers.DataCollatorForLanguageModelingand expecting question masking?- That is the wrong expectation.
Am I setting
dataset_text_field="messages"and expecting role masking?- That is not what
dataset_text_fielddoes.
- That is not what
Did I decode
labels != -100from a real dataloader batch?- If not, do that before training.
Bottom line
Your printed tensor shows ordinary full-sequence causal-LM behavior.
The fix is not a hidden DataCollatorForLanguageModeling setting.
For your case, the clearest fix is:
messages -> prompt/completion
then:
SFTConfig(completion_only_loss=True)
and verify with:
decode(input_ids where labels != -100)
dataset_text_field="messages" does not make messages behave like prompt + completion, and it does not create automatic user/question masking.
Discussion in the ATmosphere