Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreied6shlbcdqe5g4inqabru7nldmp4pwmzxinnmcjvpatfr634utla",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mmpaxjnubwv2"
  },
  "path": "/t/sfttrainerflags-blocks-assistant-only-loss-true/176210#post_3",
  "publishedAt": "2026-05-25T18:24:03.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "@John6666"
  ],
  "textContent": "Hi @John6666 thank you so much for your truly valuable feedback as always. I really appreciate it!\n\nRegarding the action items you mentioned:\n\n  * **force the text/tokenizer path by explicitly passing the tokenizer as`processing_class`:**\n\n\n\nI’m currently doing it. However, at first, I was not directly passing the tokenizer to the trainer as an SFTTrainer parameter. Instead, I was passing it to the data_collator which is then passed to the trainer:\n\n_from transformers import DataCollatorForLanguageModeling_\n_data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm=False)_\n\ntrainer = SFTTrainer(\nmodel=model,\ntrain_dataset=train_dataset,\neval_dataset=eval_dataset,\ndata_collator=data_collator,\nargs=sft_config,\npeft_config=peft_config,\n)\n\nAfter I read your message, I passed it to the sfttrainer and the error disappeared:\n\ntrainer = SFTTrainer(\nmodel=model,\ntrain_dataset=train_dataset,\neval_dataset=eval_dataset,\nprocessing_class=tokenizer,\ndata_collator=data_collator,\nargs=sft_config,\npeft_config=peft_config,\n)\n\nI will trade the buff output tomorrow to check if any masking took place and I’ll update you. Thanks!",
  "title": "SFTTrainerflags blocks assistant_only_loss=True"
}