{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreidtbr3vom2owrxfn37ophxstgegawcsswpeqdkq3hi4nxtezlmxzi",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mng3cjorobu2"
},
"path": "/t/why-does-naive-replay-still-beat-most-sophisticated-continual-learning-methods-in-practice/176513#post_1",
"publishedAt": "2026-06-03T20:39:47.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "I’ve been going deep on catastrophic forgetting in LLM fine-tuning and keep running into the same puzzle.\n\nThe literature has a lot of elegant methods, EWC and its variants (penalize changes to important weights), orthogonal-subspace approaches like O-LoRA, gradient-projection methods, etc. But in a lot of practical reports and benchmarks, plain experience replay (just mixing a slice of old/general data back into the new training set) ends up matching or beating them ,despite being the least clever option.\n\nA few things I’m trying to understand:\n\n 1. Is replay actually winning, or just winning on the benchmarks we use? A lot of CL benchmarks are short task sequences (4–5 tasks). Does replay’s advantage hold at 10–20 sequential domains, or does the replay buffer just\nbecome unmanageable?\n\n 2. What’s the real reason the regularization methods (EWC etc.) underperform? My intuition: estimating per-parameter importance (the Fisher matrix) is expensive and noisy at LLM scale, so the penalty ends up either too weak (still forgets) or too strong (won’t learn). Is that the consensus, or am I missing something?\n\n 3. For people running this in production: when your domain data changes and you need to re-fine-tune, what do you actually reach for replay, freezing, low-rank adapters, or just retrain from base? And what made you pick it?\n\n\n\n\nI have my own opinions forming (I lean toward constraining the update itself so it can’t overwrite prior capabilities, rather than fighting the pull after the fact with replay/penalties), but I’d genuinely like to hear what’s working for people in the wild before I commit to that view.",
"title": "Why does naive replay still beat most \"sophisticated\" continual-learning methods in practice?"
}