Squeeze Gemma 4 26b on a 4060ti with NVFP4
Hugging Face Forums [Unofficial]
April 30, 2026
Good question. Short version: probably not “comfortably” on a 5060 Ti 16GB yet , at least not in a clean plug-and-play OpenClaw setup.
What matters:
- NVFP4 availability
- There are community NVFP4 checkpoints for Gemma 4 26B-A4B, but these are not the same as mainstream GGUF flows.
- Example model card: CyberFitz/gemma-4-26B-A4B-it-NVFP4
- VRAM headroom
- Even that card reports about ~16 GB model size and around ~18 GB minimum GPU memory for serving, before comfortable KV cache headroom.
- On a 16GB card, it may load only with tight limits / offloading and then feel slow.
- “No vision tower”
- Gemma 4 26B-A4B is a multimodal architecture; removing vision tower is not a standard toggle in typical runtimes.
- You can run text-only inference without sending images, but physically stripping vision components is model surgery and usually breaks compatibility unless specifically supported.
- OpenClaw compatibility
- OpenClaw is the orchestration layer; real support depends on backend/runtime kernels (vLLM/TensorRT/llama.cpp/Ollama path you use).
- If your backend doesn’t support this NVFP4 format end-to-end, it won’t help.
Practical recommendation:
- If you want reliability on 16GB today, use a text-focused quantized path with proven OpenClaw backend support.
- If you want Gemma 4 26B NVFP4 specifically, expect experimentation and likely compromises (lower context, offload, slower throughput).
Base model reference: google/gemma-4-26B-A4B-it OpenClaw repo/docs entry point: openclaw/openclaw
Discussion in the ATmosphere