Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreicnfnnu7obiyjfvjiehmxmy5ojskbcshy3ryerfgeiu2btpr343ta",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mfszqe3s7y72"
  },
  "path": "/t/crma-drop-in-adapter-for-fine-tuning-continual-learning-zero-catastrophic-forgetting-at-7b-scale/173818#post_1",
  "publishedAt": "2026-02-27T01:07:11.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "CRMA Fine-Tuner & Continual Learning API - Swagger UI"
  ],
  "textContent": "I built CRMA (Constrained Residual Mixing Adapter) — a small adapter that attaches to every layer of a language model\nduring fine-tuning. It applies a mathematical constraint that keeps training stable: the model learns new information\nbut can’t overwrite what it already knows.\n\nFine-tuning results (Mistral-7B):\n\n  * CRMA holdout loss: 0.1426 vs standard LoRA: 0.1519 (-6.1% improvement)\n  * Peak gradient norm reduced 39-84% across 3 independent runs\n  * Tested on TinyLlama-1.1B, Mistral-7B-v0.3, Gemma-2-2b-it\n\n\n\nContinual learning results (4 domains sequentially: Medical, Legal, Code, Finance):\n\n  * CRMA modular drift: -0.1% (model actually slightly improves on earlier domains)\n  * Standard sequential fine-tuning forgetting: +351.4%\n  * That’s a 3,500x reduction in catastrophic forgetting\n  * No replay buffers, no knowledge distillation, no frozen teacher copy, no extra compute\n\n\n\nHow it compares:\n\n┌──────────┬────────┬────────┐\n│ Method │ Forget │ Needs │\n├──────────┼────────┼────────┤\n│ EWC │ +58% │ Replay │\n├──────────┼────────┼────────┤\n│ SDFT │ -0.1pt │ 2x inf │\n├──────────┼────────┼────────┤\n│ O-LoRA │ Less │ Track │\n├──────────┼────────┼────────┤\n│ Adaption │ N/A │ $50M │\n├──────────┼────────┼────────┤\n│ CRMA │ -0.1% │ None │\n└──────────┴────────┴────────┘\n\nAPI is live and testable right now. Free tier available (3 runs/day, TinyLlama). Usage-based pricing for larger\nmodels.\n\nAPI: CRMA Fine-Tuner & Continual Learning API - Swagger UI\n\nFull technical report (with methodology and ablation history) available on request. Happy to answer questions.\n\n— Kiran",
  "title": "CRMA: Drop-in adapter for fine-tuning + continual learning — zero catastrophic forgetting at 7B scale"
}