{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiessav6x7u2xsezjpigdadez4fxfjj3bac7n5wqhet7rl6dua6zku",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mf6h2qfdv2n2"
  },
  "path": "/t/looking-for-cpu-compute-grant-flashlm-ternary-cpu-only-language-model/173626#post_1",
  "publishedAt": "2026-02-18T23:24:25.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "https://github.com/changcheng967/FlashLM",
    "https://huggingface.co/changcheng967/flashlm-v4-bolt"
  ],
  "textContent": "Hi everyone. I’m building FlashLM, an open-source ternary (1.58-bit) language model designed to run entirely on CPU — inference is pure add/sub, no float multiply.\n\nWe just validated a v5 architecture that scores 88% on associative recall benchmarks vs 3% for v4. Now we need CPU compute (ideally EPYC or Xeon with large L3 cache) to train on real data and get BPC numbers. So far everything has been trained on Deepnote free tier (2 core CPU, 5GB RAM).\n\nRepo: https://github.com/changcheng967/FlashLM Model & weights: https://huggingface.co/changcheng967/flashlm-v4-bolt\n\nAnyone know if HuggingFace offers CPU compute grants, or have suggestions for where to get donated CPU time for open-source research?",
  "title": "Looking for CPU compute grant — FlashLM, ternary CPU-only language model"
}