{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifyhu7x7ohaww3ewv2or6e6evvqlrty27gu5m4mvpcxwelhr4skn4",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mhjoal5zf7s2"
},
"path": "/t/numerical-instability-when-finetuning-deberta-v3-small/174444#post_1",
"publishedAt": "2026-03-20T16:03:23.000Z",
"site": "https://discuss.huggingface.co",
"tags": [
"SANDWiCH",
"DebertaV2ForSequenceClassification",
"microsoft/deberta-v3-small"
],
"textContent": "I’m trying to reproduce the results from the SANDWiCH word sense disambiguation paper. To do this I’m fine tuning a DebertaV2ForSequenceClassification model with microsoft/deberta-v3-small as the base model, and the same training parameters as given in the paper.\nHowever, I keep seeing numerical stability issues. As the charts below show, at some point in the training there is a huge spike in the loss, after which the gradient norm becomes NaN. What can I do to diagnose and resolve this?",
"title": "Numerical instability when finetuning deberta-v3-small"
}