{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreiavyeqwvyj4xmo4332owkbknmk2hrabs2ssgijo75on6tvp5he52y",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mfuioipyj5s2"
},
"path": "/t/catastrophic-forgetting-by-language-models/173863#post_1",
"publishedAt": "2026-02-27T18:08:43.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "To all the awesome experts in AI/ML out there. i need a favor.\nI realized there is a gap in Language Models (SLMs/LLMs) remembering the data continuously which is termed as ‘catastrophic forgetting’.\n\nTo solve that problem I came up with an adapter called Constrained Residual Mixing Adapter (CRMA) that enables continual learning. I tested it on TinyLlama 1.1B and Mistral 7B — the result: -0.1% drift across 4 sequential\ndomains. Essentially zero forgetting.\n\nCRMA: -0.1% drift. Naive: +351% forgetting. Same model, same data, same hardware.\n\nHolds at both 1.1B and 7B. No replay, no EWC, no KD needed.\n● CRMA Modular vs Naive — Mistral 7B (4 sequential domains)\n\n┌─────────┬────────────┬──────────────────┐\n│ Task │ CRMA Drift │ Naive Forgetting │\n├─────────┼────────────┼──────────────────┤\n│ Medical │ -0.2% │ +228% │\n├─────────┼────────────┼──────────────────┤\n│ Legal │ -0.1% │ +593% │\n├─────────┼────────────┼──────────────────┤\n│ Code │ -0.1% │ +233% │\n├─────────┼────────────┼──────────────────┤\n│ Finance │ +0.0% │ — │\n├─────────┼────────────┼──────────────────┤\n│ Average │ -0.1% │ +351% │\n└─────────┴────────────┴──────────────────┘\n\nNow the favor - If you’re interested in independently verifying these results, I’d love to hear from you. DM me and I’ll share what you need to reproduce it. Thank you. and best wishes",
"title": "Catastrophic Forgetting by Language models"
}