{
"$type": "site.standard.document",
"bskyPostRef": {
"cid": "bafyreifrsjt2fs3hamkrbozqs6j5b5nhsj7azfhlkxmxstls7xtbjxbkte",
"uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3me72kikf2l72"
},
"path": "/t/guidance-needed-gpt-oss-20b-fine-tuning-with-unsloth-gguf-ollama-triton-vllm-tensorrt-llm/1373515#post_1",
"publishedAt": "2026-02-06T12:08:22.000Z",
"site": "https://community.openai.com",
"textContent": "I am currently fine-tuning the **GPT-OSS 20B** model using **Unsloth** with **HuggingFace TRL (SFTTrainer)**.\n\n**Long-term goal**\n\n * Serve the model in production using **Triton** with either **vLLM** or **TensorRT-LLM** as the backend\n\n * **Short-term / initial deployment** using **Ollama (GGUF)**\n\n\n\n\n**Current challenge**\nGPT-OSS uses a **Harmony-style chat template** , which includes:\n\n * `developer` role\n\n * Explicit EOS handling\n\n * `thinking` / `analysis` channels\n\n * Tool / function calling structure\n\n\n\n\nWhen converting the fine-tuned model to **GGUF** and deploying it in **Ollama** using the **default GPT-OSS Modelfile** , I am running into ambiguity around:\n\n 1. Whether the **default Jinja chat template** provided by GPT-OSS should be **modified** for Ollama compatibility\n\n 2. How to correctly handle:\n\n * EOS token behavior\n\n * Internal reasoning / analysis channels\n\n * Developer role alignment\n\n 3. How to do this **without degrading the model’s default performance or alignment**\n\n\n\n\n**Constraints / Intent**\n\n * I already have training data prepared strictly in **system / user / assistant** format\n\n * I want to:\n\n * Preserve GPT-OSS’s native behavior as much as possible\n\n * Perform **accurate, non-destructive fine-tuning**\n\n * Avoid hacks that work short-term but break compatibility with **vLLM / TensorRT-LLM** later\n\n\n\n\n**What I’m looking for**\n\n * Has anyone successfully:\n\n * Fine-tuned GPT-OSS\n\n * Converted it to GGUF\n\n * Deployed it with **Ollama**\n\n * While preserving the Harmony template behavior?\n\n * If yes:\n\n * Did you modify the **chat template / Modelfile**?\n\n * How did you handle EOS + reasoning channels?\n\n * Any pitfalls to avoid to keep it production-ready for Triton later?\n\n\n\n\nAny concrete guidance, references, or proven setups would be extremely helpful.",
"title": "Guidance Needed: GPT-OSS 20B Fine-Tuning with Unsloth → GGUF → Ollama → Triton (vLLM / TensorRT-LLM)"
}