External Publication

Fine-tuning Gemma-4-E2B on MacBook M3

Hugging Face Forums [Unofficial] April 14, 2026

Sharing some success with Gemma-4-E4B (Q8_0) tuning!

I’ve been experimenting with the “overclocking” feel of parameter tweaks on my headless ROCm server. I found that making subtle “shuttle changes” to the inference settings really tightened up the model’s performance.

For those running the E4B variant, these settings significantly cut back on “rambling” and reduced thinking latency without the logic falling apart:

Temperature: 0.8 (keeps it opinionated)
Top_P: 0.85 / Top_K: 40 (narrower, faster search area)
Repeat Penalty: 1.1 (just enough to kill the logic loops)

It feels like tuning a GPU—if you push too hard, it breaks, but hitting this sweet spot makes it feel much more surgical. Anyone else found a “magic” parameter set for the Gemma-4 family?

Discussion in the ATmosphere