{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreie7mnn6ofsaghmv3utfk46smedbivioie75t6ykgdz66keteafhfm",
"uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mji4dhjggcl2"
},
"path": "/t/fine-tuning-gemma-4-e2b-on-macbook-m3/175228#post_5",
"publishedAt": "2026-04-14T18:27:45.000Z",
"site": "https://discuss.huggingface.co",
"textContent": "**Sharing some success with Gemma-4-E4B (Q8_0) tuning!**\n\nI’ve been experimenting with the “overclocking” feel of parameter tweaks on my headless ROCm server. I found that making subtle “shuttle changes” to the inference settings really tightened up the model’s performance.\n\nFor those running the E4B variant, these settings significantly cut back on “rambling” and reduced thinking latency without the logic falling apart:\n\n * **Temperature:** `0.8` (keeps it opinionated)\n\n * **Top_P:** `0.85` / **Top_K:** `40` (narrower, faster search area)\n\n * **Repeat Penalty:** `1.1` (just enough to kill the logic loops)\n\n\n\n\nIt feels like tuning a GPU—if you push too hard, it breaks, but hitting this sweet spot makes it feel much more surgical. Anyone else found a “magic” parameter set for the Gemma-4 family?",
"title": "Fine-tuning Gemma-4-E2B on MacBook M3"
}