Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreigy2pidlhoj5xovhrbqxncwcp3oird7ltpjgj6te6nlrbufbntvhm",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mivuaikqfpu2"
  },
  "path": "/t/deprecation-of-assistant-only-loss/175041#post_2",
  "publishedAt": "2026-04-07T09:21:34.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "Hugging Face",
    "GitHub"
  ],
  "textContent": "Hmm…?\n\n* * *\n\nThis is mainly a **TRL`SFTTrainer` story**, not a plain “Transformers deprecated a flag” story. In current TRL docs, `assistant_only_loss` is still supported for **conversational datasets**. What changed is the preferred training path: TRL now centers **prompt-completion datasets** and `completion_only_loss`, while `assistant_only_loss` is treated as a narrower option that only works when the chat template can return assistant-token masks via `{% generation %}` / `{% endgeneration %}`. (Hugging Face)\n\n## The idea in one sentence\n\nThe stack moved from:\n\n  * **old** : “figure out the trainable region from rendered chat text afterward”\n\n\n\nto:\n\n  * **new** : “define the trainable region explicitly in the dataset when possible” (GitHub)\n\n\n\n* * *\n\n## Chronological timeline\n\nPeriod | What changed | Why it mattered | Sources\n---|---|---|---\n**2023** | The common pattern was still the old collator-based masking workflow, especially `DataCollatorForCompletionOnlyLM`. Around this time, users were already hitting practical limitations such as incompatibility with `packing=True`. | This showed that masking “after formatting” did not fit cleanly with modern efficient SFT pipelines. | (Hugging Face)\n**Early 2024** | A Transformers feature request asked for `apply_chat_template(..., tokenize=True)` to return token masks so users could compute loss only on assistant tokens in multi-message chat. | This was the first clear signal that delimiter-based masking was too weak for real chat data with multiple turns. | (GitHub)\n**Mid to late 2024** | Transformers added assistant-token-mask support in chat templating, but only for templates that support it. In practice, tokenizer/template bugs appeared for some models, including Llama 3 and Qwen2.5, and truncation could also break assistant masks. | The feature existed, but it proved fragile because it depended on template markup, tokenization, and truncation all lining up correctly. | (Hugging Face)\n**2025** | TRL formalized dataset-type-aware SFT: conversational datasets can use `assistant_only_loss`; prompt-completion datasets use `completion_only_loss`, and for prompt-completion data that is the default behavior unless overridden. | This is the architectural pivot. The training target moved from “infer it from text” to “read it from the dataset schema.” | (Hugging Face)\n**Late 2025** | The old `DataCollatorForCompletionOnlyLM` was removed. A TRL maintainer explicitly told users to switch to `completion_only_loss=True` with a prompt-completion dataset. Around the same period, users reported prompt-completion labeling issues while migrating, which led to fixes and clearer warnings. | This is the practical migration point most users noticed. The old masking tool was gone, and the new expected path was explicit prompt/completion supervision. | (GitHub)\n**2026 / current state** | Current TRL docs still support `assistant_only_loss=True`, but only for conversational datasets with templates that can return assistant-token masks. They also say completion-only training is compatible with assistant-only training when using a conversational prompt-completion dataset. | So the correct reading is not “assistant_only_loss disappeared.” The correct reading is “it became a specialized, template-dependent option, while prompt/completion became the safer default.” | (Hugging Face)\n\n* * *\n\n## Old workflow vs new workflow vs why the warning appears\n\nTopic | Old workflow | New workflow | Why the warning appears | Sources\n---|---|---|---|---\n**Where the target is defined** | The target span was often inferred from rendered text using templates or delimiters. | The target span is preferably defined in the dataset itself as `prompt` + `completion`. | If TRL cannot reliably recover assistant spans from the template, it warns or errors instead of silently guessing. | (GitHub)\n**Typical data shape** | Often `messages` or already-rendered chat text. | Prefer `{\"prompt\": ..., \"completion\": ...}` or conversational prompt-completion. | A plain conversational dataset does not automatically make assistant masking reliable; the template must expose assistant spans. | (Hugging Face)\n**Loss mode** | Old code often used `DataCollatorForCompletionOnlyLM` to mask labels after formatting. | `completion_only_loss` is the intended path for prompt-completion datasets. `assistant_only_loss` remains available for conversational datasets. | The warning often appears when users expect assistant-only masking to work on a template that does not support `return_assistant_tokens_mask`. | (GitHub)\n**Dependency on Jinja template** | High, but often hidden. The boundary was recovered indirectly from formatting. | Still relevant for conversational data, but less central for prompt-completion because the boundary is already explicit in the dataset. | If the template lacks `{% generation %}`, assistant masks can be empty, and TRL will complain. | (Hugging Face)\n**Common failure modes** | Delimiter mismatch, tokenization quirks, packing incompatibility. | Fewer boundary-inference problems for prompt-completion, though template-related issues still exist for assistant masks. | The warning can also appear if truncation causes all assistant tokens to fall outside the retained sequence. | (GitHub)\n**What the library now prefers** | Heuristic masking on top of rendered text. | Explicit prompt/completion supervision, then optional assistant masking when the template supports it. | The warning is TRL nudging you away from a fragile path toward the explicit one. | (Hugging Face)\n\n* * *\n\n## What the warning usually means\n\nIn plain English, the warning usually means one of these:\n\n  1. **Your chat template does not emit assistant masks.**\nTransformers’ tokenizer docs say `return_assistant_tokens_mask=True` only works for chat templates that support it via `{% generation %}`. TRL’s docs say the same thing for `assistant_only_loss=True`. (Hugging Face)\n\n  2. **The assistant mask came back empty.**\nTRL has an error path that says: if `assistant_only_loss=True` but an example has no assistant tokens, that usually means the template does not generate assistant masks and may be missing `{% generation %}`. (GitHub)\n\n  3. **Truncation can also cause it.**\nThere are issues showing that assistant masks can become effectively empty after truncation, even when the template conceptually supports them. (GitHub)\n\n\n\n\n* * *\n\n## Bottom line\n\nThe clean takeaway is:\n\n  * **Not really deprecated:** `assistant_only_loss` still exists and is documented. (Hugging Face)\n  * **What actually changed:** the old collator-centric workflow was removed, and TRL now prefers `prompt` / `completion` data with `completion_only_loss`. (GitHub)\n  * **Why:** explicit dataset boundaries are more robust than inferring assistant spans from Jinja-rendered chat text. (GitHub)\n\n",
  "title": "Deprecation of assistant_only_loss"
}