Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreiaebh5otoxmbec7qr5e4ktvk2jly63hoac4ehms42ouk4ecmjg7wy",
    "uri": "at://did:plc:pgryn3ephfd2xgft23qokfzt/app.bsky.feed.post/3mkpsj2mn5qz2"
  },
  "path": "/t/squeeze-gemma-4-26b-on-a-4060ti-with-nvfp4/175654#post_2",
  "publishedAt": "2026-04-30T13:01:02.000Z",
  "site": "https://discuss.huggingface.co",
  "tags": [
    "CyberFitz/gemma-4-26B-A4B-it-NVFP4",
    "google/gemma-4-26B-A4B-it",
    "openclaw/openclaw"
  ],
  "textContent": "Good question. Short version: **probably not “comfortably” on a 5060 Ti 16GB yet** , at least not in a clean plug-and-play OpenClaw setup.\n\nWhat matters:\n\n  1. **NVFP4 availability**\n\n\n  * There are community NVFP4 checkpoints for Gemma 4 26B-A4B, but these are not the same as mainstream GGUF flows.\n  * Example model card: CyberFitz/gemma-4-26B-A4B-it-NVFP4\n\n\n  2. **VRAM headroom**\n\n\n  * Even that card reports about **~16 GB model size** and around **~18 GB minimum GPU memory** for serving, before comfortable KV cache headroom.\n  * On a 16GB card, it may load only with tight limits / offloading and then feel slow.\n\n\n  3. **“No vision tower”**\n\n\n  * Gemma 4 26B-A4B is a multimodal architecture; removing vision tower is not a standard toggle in typical runtimes.\n  * You can run **text-only inference** without sending images, but physically stripping vision components is model surgery and usually breaks compatibility unless specifically supported.\n\n\n  4. **OpenClaw compatibility**\n\n\n  * OpenClaw is the orchestration layer; real support depends on backend/runtime kernels (vLLM/TensorRT/llama.cpp/Ollama path you use).\n  * If your backend doesn’t support this NVFP4 format end-to-end, it won’t help.\n\n\n\nPractical recommendation:\n\n  * If you want reliability on 16GB today, use a **text-focused quantized path** with proven OpenClaw backend support.\n  * If you want Gemma 4 26B NVFP4 specifically, expect experimentation and likely compromises (lower context, offload, slower throughput).\n\n\n\nBase model reference: google/gemma-4-26B-A4B-it\nOpenClaw repo/docs entry point: openclaw/openclaw",
  "title": "Squeeze Gemma 4 26b on a 4060ti with NVFP4"
}