External Publication
Visit Post

Automatic -100 masking of the questions in Labels

Hugging Face Forums [Unofficial] May 21, 2026
Source

Seems expected behaviour?


Automatic -100 masking of question/user tokens in TRL SFT: messages vs prompt/completion

Summary

The behavior you are seeing is expected.

transformers.DataCollatorForLanguageModeling does not automatically mask the user/question part of a chat example. It is not role-aware. It does not know what a system, user, or assistant message is. For ordinary causal language modeling with mlm=False, it prepares next-token-prediction labels from the input sequence, with padding ignored. In other words, if nothing else intervenes, the usual pattern is roughly:

labels = input_ids.copy()
labels[padding_positions] = -100

That is padding masking , not prompt/question masking.

The current TRL replacement for the old DataCollatorForCompletionOnlyLM path is not “use the Transformers LM collator and hope it infers the answer span.” The current route is:

  • use SFTTrainer
  • use the correct dataset format
  • use the correct SFTConfig loss option
  • inspect one real batch before training

The key distinction is:

{"messages": [...]}                 -> conversational language-modeling dataset
{"prompt": ..., "completion": ...}   -> prompt-completion dataset

So, for your current messages: system-user-assistant dataset, completion_only_loss=True is not the right mental model unless you first convert the examples to a real prompt + completion format.

Relevant docs / source:

  • TRL SFTTrainer docs — dataset formats, assistant-only loss, completion-only loss
  • TRL SFTTrainer source — collator routing, dataset_text_field, completion_only_loss, masks
  • Transformers causal language modeling docs — DataCollatorForLanguageModeling(..., mlm=False)
  • TRL discussion — DataCollatorForCompletionOnlyLM removed; use SFTConfig(completion_only_loss=True) with prompt-completion data

What went wrong conceptually

You currently have data shaped like this:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."},
    ]
}

That is a conversational dataset.

It tells TRL:

This is a conversation. Apply the model's chat template.

It does not automatically mean:

system/user tokens -> labels = -100
assistant tokens   -> labels = token ids

That extra masking requires either:

  1. a prompt-completion boundary, using {"prompt": ..., "completion": ...} plus completion_only_loss=True, or
  2. assistant-span masks, using {"messages": [...]} plus assistant_only_loss=True, if the chat template supports assistant masks.

Why your printed labels have no -100

You printed something like:

labels = tensor([
    248045,   8678,    198,   2523,    513,    264,  10631,   1558,    421,
      5529,   2708,    321,  61446,  10926,     13, 248046,    198, 248045,
       846,    198,   4199,   1599,  12417,   1467,  68677,   3222,    506,
     18773,     30, 248046,    198, 248045,  74455,    198, 248068,    271,
    248069,    271,  11280,  12417,     25,   7884,  24093,  10254,    318,
     30425,    220,     17,     15,     15,     18,    310,   6443,    220,
        17,     15,     15,     19,    553, 248046,    198
])

This looks like full-sequence causal-LM labels: the labels are token IDs across the whole rendered chat sequence. That usually means the trainer/collator path is training on the full conversation, not just the assistant answer.

Conceptually, if the rendered input is:

<system>
You are a helpful assistant.
</system>
<user>
What is the answer?
</user>
<assistant>
The answer is ...
</assistant>

ordinary causal-LM labeling supervises the whole sequence:

<system>     -> supervised
system text  -> supervised
<user>       -> supervised
question     -> supervised
<assistant>  -> supervised
answer       -> supervised

But what you expected was:

<system>     -> -100
system text  -> -100
<user>       -> -100
question     -> -100
<assistant>  -> maybe supervised, maybe ignored depending on template
answer       -> supervised

That second behavior is not provided by transformers.DataCollatorForLanguageModeling.


The first model’s advice: what was right and what was wrong

Claim 1

Use from trl import DataCollatorForCompletionOnlyLM; it handles -100 masking automatically.

This was historically plausible advice for older TRL examples, but it is not the current path. The class has been removed in current TRL versions. A TRL maintainer explicitly recommends using completion_only_loss=True in SFTConfig with a prompt-completion dataset instead:

  • TRL discussion: DataCollatorForCompletionOnlyLM removed

So the correction was valid: importing DataCollatorForCompletionOnlyLM is no longer the right current solution.

Claim 2

Use transformers.DataCollatorForLanguageModeling with a prompt-completion dataset and completion_only_loss=True; that will take care of the masking.

This is partly right but imprecise enough to be misleading.

The precise version is:

Use TRLSFTTrainer with a prompt-completion dataset and SFTConfig(completion_only_loss=True). Do not rely on transformers.DataCollatorForLanguageModeling itself to mask user/question tokens.

completion_only_loss=True is a TRL SFT setting. It is not a setting of transformers.DataCollatorForLanguageModeling.


The two correct ways to get response-only masking

Option A — Recommended for your case: convert messages to prompt + completion

For examples that are basically:

system + user question -> assistant answer

the cleanest format is:

{
    "prompt": [
        {"role": "system", "content": "SYSTEM"},
        {"role": "user", "content": "QUESTION"},
    ],
    "completion": [
        {"role": "assistant", "content": "ANSWER"},
    ],
}

Then use:

SFTConfig(completion_only_loss=True)

The TRL docs describe completion-only training as the path for prompt-completion datasets:

  • TRL docs — train on completion only

Conversion code

from trl import SFTConfig, SFTTrainer


def to_prompt_completion(example):
    messages = example["messages"]

    if not isinstance(messages, list):
        raise TypeError("Expected example['messages'] to be a list.")

    if len(messages) < 2:
        raise ValueError("Expected at least one prompt message and one assistant response.")

    if messages[-1]["role"] != "assistant":
        raise ValueError(
            f"Expected final message to be assistant, got {messages[-1]['role']!r}."
        )

    return {
        "prompt": messages[:-1],
        "completion": [messages[-1]],
    }


train_dataset = train_dataset.map(
    to_prompt_completion,
    remove_columns=train_dataset.column_names,
)

args = SFTConfig(
    output_dir="sft-out",
    completion_only_loss=True,
    packing=False,        # keep False until masking is verified
    max_length=2048,      # adjust for your model/data
)

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    processing_class=tokenizer,
)

This is the option I would use for your described dataset.

Why? Because your real training objective is:

Given system + user, learn to generate assistant.

That is a prompt-completion objective.


Option B — Keep messages, but use assistant_only_loss=True

If you want to keep the original conversational format:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."},
    ]
}

then the relevant TRL setting is:

SFTConfig(assistant_only_loss=True)

The TRL docs describe this as training only on assistant messages while ignoring user/system messages:

  • TRL docs — train on assistant messages only

However, there is an important caveat: assistant-only loss requires the chat template to support assistant-token masks using {% generation %} and {% endgeneration %} markers. TRL can patch some known model-family templates, but for other models you should check.

Template check

template = tokenizer.chat_template or ""

print("has generation start:", "{% generation %}" in template)
print("has generation end:  ", "{% endgeneration %}" in template)

If both are false, do not blindly trust assistant_only_loss=True.

This is why, for simple one-turn system-user-assistant examples, I prefer converting to prompt + completion: it makes the boundary explicit and is easier to audit.


Which option should you choose?

For your case, I would choose Option A :

{"prompt": messages[:-1], "completion": [messages[-1]]}

with:

SFTConfig(completion_only_loss=True)

Use Option B only if you have genuine multi-turn conversations and want to train on every assistant turn.

Example where assistant_only_loss=True makes sense:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "Q1"},
        {"role": "assistant", "content": "A1"},
        {"role": "user", "content": "Q2"},
        {"role": "assistant", "content": "A2"},
    ]
}

If you want to train only on the final assistant turn, even in a multi-turn conversation, prompt + completion is still clearer:

{
    "prompt": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "Q1"},
        {"role": "assistant", "content": "A1"},
        {"role": "user", "content": "Q2"},
    ],
    "completion": [
        {"role": "assistant", "content": "A2"},
    ],
}

Does dataset_text_field="messages" help?

No, not in the way you are asking.

dataset_text_field is not a response-mask setting. It does not mean:

system/user -> -100
assistant   -> train

It is for identifying the text column in standard text datasets. In the TRL source, dataset_text_field is described as relevant for standard dataset format, and the collator routing distinguishes language-modeling examples from prompt-completion examples:

  • examples with "messages" or the configured text field go through the language-modeling path
  • examples with "prompt" and "completion" go through the prompt-completion path

Relevant source:

  • TRL SFTTrainer source

So this:

dataset_text_field="messages"

does not turn a messages column into a prompt-completion dataset. It also does not request assistant-only masking.

Use these mental models instead:

# Plain text language modeling
{"text": "..."}
SFTConfig(dataset_text_field="text")



# Conversational assistant-only SFT
{"messages": [...]}
SFTConfig(assistant_only_loss=True)



# Completion-only SFT
{"prompt": [...], "completion": [...]}
SFTConfig(completion_only_loss=True)

Why completion_only_loss=True is not enough for plain messages

completion_only_loss=True requires a completion boundary.

A plain messages row has roles, but it does not explicitly split the example into:

prompt part
completion part

A prompt-completion row does:

{
    "prompt": [...],
    "completion": [...],
}

That explicit boundary is what lets TRL create a completion mask.

So this is not enough:

{
    "messages": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
        {"role": "assistant", "content": "..."},
    ]
}

with:

SFTConfig(completion_only_loss=True)

For this dataset shape, use either:

SFTConfig(assistant_only_loss=True)

or convert to:

{
    "prompt": messages[:-1],
    "completion": [messages[-1]],
}

and then use:

SFTConfig(completion_only_loss=True)

Do not pass the wrong collator

If you want TRL SFT masking, I would not pass this:

from transformers import DataCollatorForLanguageModeling

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer,
    mlm=False,
)

That collator is for ordinary causal LM. It is not the mechanism that masks user/question tokens.

Prefer:

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(completion_only_loss=True),
    train_dataset=prompt_completion_dataset,
    processing_class=tokenizer,
)

and let TRL’s SFT pipeline create/apply the relevant masks.

The TRL source shows the important mechanism: labels are initially cloned from input_ids, padding positions are set to -100, and then completion masks / assistant masks can further set non-target labels to -100.

Relevant source:

  • TRL SFTTrainer source

Important caveat: formatting_func can remove the boundary

Be careful with formatting_func.

If you convert your examples to prompt + completion, but then pass a formatting_func that renders everything into one string, you may accidentally turn the dataset back into ordinary language modeling.

Risky pattern:

def formatting_func(example):
    return render_prompt_and_answer_as_one_string(example)

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(completion_only_loss=True),
    train_dataset=prompt_completion_dataset,
    formatting_func=formatting_func,
    processing_class=tokenizer,
)

Safer pattern:

trainer = SFTTrainer(
    model=model,
    args=SFTConfig(completion_only_loss=True),
    train_dataset=prompt_completion_dataset,
    processing_class=tokenizer,
)

Let the trainer see the structured prompt and completion fields.


How to verify that masking is correct

Do not only print the numeric tensor. Decode the supervised span.

Helper 1 — Decode the full input and supervised labels

def inspect_supervised_span(trainer, tokenizer, row=0):
    batch = next(iter(trainer.get_train_dataloader()))

    input_ids = batch["input_ids"][row].tolist()
    labels = batch["labels"][row].tolist()

    supervised_ids = [
        token_id
        for token_id, label_id in zip(input_ids, labels)
        if label_id != -100
    ]

    ignored_count = sum(label_id == -100 for label_id in labels)
    supervised_count = sum(label_id != -100 for label_id in labels)

    print("=" * 100)
    print("FULL INPUT")
    print(tokenizer.decode(input_ids, skip_special_tokens=False))

    print("=" * 100)
    print("SUPERVISED LABEL SPAN")
    print(tokenizer.decode(supervised_ids, skip_special_tokens=False))

    print("=" * 100)
    print("COUNTS")
    print("total tokens:     ", len(input_ids))
    print("ignored tokens:   ", ignored_count)
    print("supervised tokens:", supervised_count)

Good result:

FULL INPUT
<system>...<user>QUESTION<assistant>ANSWER...

SUPERVISED LABEL SPAN
<assistant>ANSWER...

Also possibly good, depending on template boundary:

SUPERVISED LABEL SPAN
ANSWER...

Bad result:

SUPERVISED LABEL SPAN
<system>...<user>QUESTION<assistant>ANSWER...

That means you are still training on the question.

Helper 2 — Token-by-token label table

def print_label_table(trainer, tokenizer, n_tokens=200, row=0):
    batch = next(iter(trainer.get_train_dataloader()))

    input_ids = batch["input_ids"][row].tolist()
    labels = batch["labels"][row].tolist()

    for i, (token_id, label_id) in enumerate(zip(input_ids, labels)):
        if i >= n_tokens:
            break

        token = tokenizer.decode([token_id], skip_special_tokens=False)
        status = "LOSS" if label_id != -100 else "IGNORED"

        print(
            f"{i:04d} | {status:7s} | "
            f"input_id={token_id:8d} | label={label_id:8d} | {token!r}"
        )

Expected pattern:

0000 | IGNORED | system marker
0001 | IGNORED | system text
...
0015 | IGNORED | user marker
0016 | IGNORED | user question
...
0030 | LOSS    | assistant marker or answer
0031 | LOSS    | answer text

Bad pattern:

0000 | LOSS | system marker
...
0015 | LOSS | user marker
0016 | LOSS | user question

Should the assistant role marker be supervised?

You may see either of these after conversion:

SUPERVISED LABEL SPAN
<assistant>ANSWER<end>

or:

SUPERVISED LABEL SPAN
ANSWER<end>

Both can be defensible.

Usually acceptable:

<assistant>ANSWER<end>

This teaches the model to produce a full assistant turn in the model’s chat format.

Also acceptable:

ANSWER<end>

This treats the assistant-start marker as part of the prompt and supervises only the answer body.

Usually not what you want:

<system>...<user>QUESTION<assistant>ANSWER<end>

That trains on the system/user prompt too.

The key thing is that the user/question tokens should be -100 if your goal is response-only SFT.


What -100 means here

-100 is the ignore index commonly used by PyTorch/Hugging Face causal-LM training. In this context:

label = token id -> compute loss
label = -100     -> ignore this token position in the loss

It does not mean the token is hidden from the model.

The ideal response-only SFT setup is:

system/user tokens:
  visible to the model as context
  ignored in the loss

assistant tokens:
  visible to the model
  included in the loss

This is exactly what you usually want for instruction tuning: the model conditions on the prompt, but it is not trained to reproduce the prompt.


Practical final answer

There is no Transformers DataCollatorForLanguageModeling setting that says “mask user/question tokens.” That collator is not chat-role-aware.

There are TRL SFT settings that can do the masking, but they depend on dataset format:

  1. For a messages conversational dataset, use:
SFTConfig(assistant_only_loss=True)

but verify that the chat template supports assistant masks.

  1. For a prompt-completion dataset, use:
SFTConfig(completion_only_loss=True)

with examples shaped like:

{
    "prompt": [
        {"role": "system", "content": "..."},
        {"role": "user", "content": "..."},
    ],
    "completion": [
        {"role": "assistant", "content": "..."},
    ],
}

For your system-user-assistant examples, I would convert to prompt + completion and use completion_only_loss=True.


Minimal working pattern for your case

from trl import SFTConfig, SFTTrainer


def to_prompt_completion(example):
    messages = example["messages"]

    if messages[-1]["role"] != "assistant":
        raise ValueError("Expected the final message to be the assistant answer.")

    return {
        "prompt": messages[:-1],
        "completion": [messages[-1]],
    }


train_dataset = train_dataset.map(
    to_prompt_completion,
    remove_columns=train_dataset.column_names,
)

args = SFTConfig(
    output_dir="sft-out",
    completion_only_loss=True,
    packing=False,
    max_length=2048,
)

trainer = SFTTrainer(
    model=model,
    args=args,
    train_dataset=train_dataset,
    processing_class=tokenizer,
)

inspect_supervised_span(trainer, tokenizer)

Only after the supervised span is correct:

trainer.train()

Checklist

Use this checklist before training:

  • Do I want to train only on the final assistant answer?

    • Use prompt + completion.
    • Use completion_only_loss=True.
  • Do I want to train on all assistant turns in a multi-turn conversation?

    • Keep messages.
    • Use assistant_only_loss=True.
    • Verify the chat template supports {% generation %} / {% endgeneration %}.
  • Am I using transformers.DataCollatorForLanguageModeling and expecting question masking?

    • That is the wrong expectation.
  • Am I setting dataset_text_field="messages" and expecting role masking?

    • That is not what dataset_text_field does.
  • Did I decode labels != -100 from a real dataloader batch?

    • If not, do that before training.

Bottom line

Your printed tensor shows ordinary full-sequence causal-LM behavior.

The fix is not a hidden DataCollatorForLanguageModeling setting.

For your case, the clearest fix is:

messages -> prompt/completion

then:

SFTConfig(completion_only_loss=True)

and verify with:

decode(input_ids where labels != -100)

dataset_text_field="messages" does not make messages behave like prompt + completion, and it does not create automatic user/question masking.

Discussion in the ATmosphere

Loading comments...