CRMA: Drop-in adapter for fine-tuning + continual learning — zero catastrophic forgetting at 7B scale
Hugging Face Forums [Unofficial]
February 27, 2026
I built CRMA (Constrained Residual Mixing Adapter) — a small adapter that attaches to every layer of a language model
during fine-tuning. It applies a mathematical constraint that keeps training stable: the model learns new information
but can’t overwrite what it already knows.
Fine-tuning results (Mistral-7B):
* CRMA holdout loss: 0.1426 vs standard LoRA: 0.1519 (-6.1% improvement)
* Peak gradient norm reduced 39-84% across 3 independent runs
* Tested on TinyLlama-1.1B, Mistral-7B-v0.3, Gemma-2-2b-it
Continual learning results (4 domains sequentially: Medical, Legal, Code, Finance):
* CRMA modular drift: -0.1% (model actually slightly improves on earlier domains)
* Standard sequential fine-tuning forgetting: +351.4%
* That’s a 3,500x reduction in catastrophic forgetting
* No replay buffers, no knowledge distillation, no frozen teacher copy, no extra compute
How it compares:
┌──────────┬────────┬────────┐
│ Method │ Forget │ Needs │
├──────────┼────────┼────────┤
│ EWC │ +58% │ Replay │
├──────────┼────────┼────────┤
│ SDFT │ -0.1pt │ 2x inf │
├──────────┼────────┼────────┤
│ O-LoRA │ Less │ Track │
├──────────┼────────┼────────┤
│ Adaption │ N/A │ $50M │
├──────────┼────────┼────────┤
│ CRMA │ -0.1% │ None │
└──────────┴────────┴────────┘
API is live and testable right now. Free tier available (3 runs/day, TinyLlama). Usage-based pricing for larger
models.
API: CRMA Fine-Tuner & Continual Learning API - Swagger UI
Full technical report (with methodology and ablation history) available on request. Happy to answer questions.
— Kiran
Discussion in the ATmosphere