{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiclqy7axyzui3dwjvthkdu57ra3fpswkd7wyqg2exfk6soy7brn6q",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmexey6qcxx2"
  },
  "path": "/t/automatic-100-masking-of-the-questions-in-labels/176151#post_1",
  "publishedAt": "2026-05-21T16:26:21.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "Hi. A frontier model tried to convince me to use “from trl import DataCollatorForCompletionOnlyLM” instead of “from transformers import DataCollatorForLanguageModeling” reason being that “the `DataCollatorForCompletionOnlyLM` **handles the`-100` masking automatically** of the question part in labels.”\n\nWhen I countered that trl/DataCollatorForCompletionOnlyLM was deprecated, it acknowledged its error and stated that transformers/DataCollatorForLanguageModeling coupled with a prompt-completion dataset format and setting \"`completion_only_loss=True\"` will take care of the -100 masking automatically. No need for a customer data_collator function to mask manually.\n\nMy dataset is of the conversational format (messages: system-user-assistant). I printed the batch output coming out of the data_collator and I can see that no -100 masking of the question part of labels took place (as shown below). Is there a transformer data_collator setting or SFT setting that can be used to force it? Does \"dataset_text_field=“messages” play a role in this case the same way dataset_text_field=“text” does with a prompt-completion dataset format?\n\nAny hints will be greatly appreciated!\n\nLabels (Batch 0):\ntensor([248045, 8678, 198, 2523, 513, 264, 10631, 1558, 421,\n5529, 2708, 321, 61446, 10926, 13, 248046, 198, 248045,\n846, 198, 4199, 1599, 12417, 1467, 68677, 3222, 506,\n18773, 30, 248046, 198, 248045, 74455, 198, 248068, 271,\n248069, 271, 11280, 12417, 25, 7884, 24093, 10254, 318,\n30425, 220, 17, 15, 15, 18, 310, 6443, 220,\n17, 15, 15, 19, 553, 248046, 198])",
  "title": "Automatic -100 masking of the questions in Labels"
}