Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifyhu7x7ohaww3ewv2or6e6evvqlrty27gu5m4mvpcxwelhr4skn4",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhjoal5zf7s2"
  },
  "path": "/t/numerical-instability-when-finetuning-deberta-v3-small/174444#post_1",
  "publishedAt": "2026-03-20T16:03:23.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "SANDWiCH",
    "DebertaV2ForSequenceClassification",
    "microsoft/deberta-v3-small"
  ],
  "textContent": "I’m trying to reproduce the results from the SANDWiCH word sense disambiguation paper. To do this I’m fine tuning a DebertaV2ForSequenceClassification model with microsoft/deberta-v3-small as the base model, and the same training parameters as given in the paper.\nHowever, I keep seeing numerical stability issues. As the charts below show, at some point in the training there is a huge spike in the loss, after which the gradient norm becomes NaN. What can I do to diagnose and resolve this?",
  "title": "Numerical instability when finetuning deberta-v3-small"
}