External Publication
Visit Post

CRMA: Drop-in adapter for fine-tuning + continual learning — zero catastrophic forgetting at 7B scale

Hugging Face Forums [Unofficial] February 27, 2026
Source
I built CRMA (Constrained Residual Mixing Adapter) — a small adapter that attaches to every layer of a language model during fine-tuning. It applies a mathematical constraint that keeps training stable: the model learns new information but can’t overwrite what it already knows. Fine-tuning results (Mistral-7B): * CRMA holdout loss: 0.1426 vs standard LoRA: 0.1519 (-6.1% improvement) * Peak gradient norm reduced 39-84% across 3 independent runs * Tested on TinyLlama-1.1B, Mistral-7B-v0.3, Gemma-2-2b-it Continual learning results (4 domains sequentially: Medical, Legal, Code, Finance): * CRMA modular drift: -0.1% (model actually slightly improves on earlier domains) * Standard sequential fine-tuning forgetting: +351.4% * That’s a 3,500x reduction in catastrophic forgetting * No replay buffers, no knowledge distillation, no frozen teacher copy, no extra compute How it compares: ┌──────────┬────────┬────────┐ │ Method │ Forget │ Needs │ ├──────────┼────────┼────────┤ │ EWC │ +58% │ Replay │ ├──────────┼────────┼────────┤ │ SDFT │ -0.1pt │ 2x inf │ ├──────────┼────────┼────────┤ │ O-LoRA │ Less │ Track │ ├──────────┼────────┼────────┤ │ Adaption │ N/A │ $50M │ ├──────────┼────────┼────────┤ │ CRMA │ -0.1% │ None │ └──────────┴────────┴────────┘ API is live and testable right now. Free tier available (3 runs/day, TinyLlama). Usage-based pricing for larger models. API: CRMA Fine-Tuner & Continual Learning API - Swagger UI Full technical report (with methodology and ablation history) available on request. Happy to answer questions. — Kiran

Discussion in the ATmosphere

Loading comments...