{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreied6shlbcdqe5g4inqabru7nldmp4pwmzxinnmcjvpatfr634utla",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmpaxjnubwv2"
},
"path": "/t/sfttrainerflags-blocks-assistant-only-loss-true/176210#post_3",
"publishedAt": "2026-05-25T18:24:03.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"@John6666"
],
"textContent": "Hi @John6666 thank you so much for your truly valuable feedback as always. I really appreciate it!\n\nRegarding the action items you mentioned:\n\n * **force the text/tokenizer path by explicitly passing the tokenizer as`processing_class`:**\n\n\n\nI’m currently doing it. However, at first, I was not directly passing the tokenizer to the trainer as an SFTTrainer parameter. Instead, I was passing it to the data_collator which is then passed to the trainer:\n\n_from transformers import DataCollatorForLanguageModeling_\n_data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)_\n\ntrainer = SFTTrainer(\nmodel=model,\ntrain_dataset=train_dataset,\neval_dataset=eval_dataset,\ndata_collator=data_collator,\nargs=sft_config,\npeft_config=peft_config,\n)\n\nAfter I read your message, I passed it to the sfttrainer and the error disappeared:\n\ntrainer = SFTTrainer(\nmodel=model,\ntrain_dataset=train_dataset,\neval_dataset=eval_dataset,\nprocessing_class=tokenizer,\ndata_collator=data_collator,\nargs=sft_config,\npeft_config=peft_config,\n)\n\nI will trade the buff output tomorrow to check if any masking took place and I’ll update you. Thanks!",
"title": "SFTTrainerflags blocks assistant_only_loss=True"
}