Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreihzmjgltzorofabk3csheforch6qiww5rx5gzju4qrtdtcjxhdhua",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mlnp77p5orb2"
  },
  "path": "/t/fine-tuning-microsoft-harrier-oss-v1-270m-with-sentencetransformertrainer-is-it-supported/175947#post_1",
  "publishedAt": "2026-05-12T09:26:53.000Z",
  "site": "https://discuss.huggingface.co",
  "textContent": "I recently fine-tuned BAAI/bge-m3 for a Portuguese QA retrieval task using SentenceTransformerTrainer with MultipleNegativesRankingLoss, and it works well.\n\nI’d now like to try microsoft/harrier-oss-v1-270m as the base model, since it achieves better results on Multilingual MTEB v2. The model card confirms it is compatible with SentenceTransformers, so that part is clear.\n\nHowever, I have some questions specific to fine-tuning this model:\n\n  1. The model card states that queries should include a task instruction (e.g. `Instruct: ... Query: ...`) but documents should not. When fine-tuning with MultipleNegativesRankingLoss, should the instruction prefix be applied to the anchor texts during training, or only at inference?\n  2. Are there any known challenges or recommended adaptations when fine-tuning decoder-only embedding models with SentenceTransformers, compared to encoder-based models like BGE-M3?\n  3. Any recommended starting hyperparameters (learning rate, batch size) for this architecture?\n\n\n\nAny guidance or pointers to examples would be appreciated.",
  "title": "Fine-tuning microsoft/harrier-oss-v1-270m with SentenceTransformerTrainer — is it supported?"
}