Raw Record Source

{
  "$type": "site.standard.document",
  "bskyPostRef": {
    "cid": "bafyreifrsjt2fs3hamkrbozqs6j5b5nhsj7azfhlkxmxstls7xtbjxbkte",
    "uri": "at://did:plc:lk3jfj3zq4k4wxnk474axylu/app.bsky.feed.post/3me72kikf2l72"
  },
  "path": "/t/guidance-needed-gpt-oss-20b-fine-tuning-with-unsloth-gguf-ollama-triton-vllm-tensorrt-llm/1373515#post_1",
  "publishedAt": "2026-02-06T12:08:22.000Z",
  "site": "https://community.openai.com",
  "textContent": "I am currently fine-tuning the **GPT-OSS 20B** model using **Unsloth** with **HuggingFace TRL (SFTTrainer)**.\n\n**Long-term goal**\n\n  * Serve the model in production using **Triton** with either **vLLM** or **TensorRT-LLM** as the backend\n\n  * **Short-term / initial deployment** using **Ollama (GGUF)**\n\n\n\n\n**Current challenge**\nGPT-OSS uses a **Harmony-style chat template** , which includes:\n\n  * `developer` role\n\n  * Explicit EOS handling\n\n  * `thinking` / `analysis` channels\n\n  * Tool / function calling structure\n\n\n\n\nWhen converting the fine-tuned model to **GGUF** and deploying it in **Ollama** using the **default GPT-OSS Modelfile** , I am running into ambiguity around:\n\n  1. Whether the **default Jinja chat template** provided by GPT-OSS should be **modified** for Ollama compatibility\n\n  2. How to correctly handle:\n\n     * EOS token behavior\n\n     * Internal reasoning / analysis channels\n\n     * Developer role alignment\n\n  3. How to do this **without degrading the model’s default performance or alignment**\n\n\n\n\n**Constraints / Intent**\n\n  * I already have training data prepared strictly in **system / user / assistant** format\n\n  * I want to:\n\n    * Preserve GPT-OSS’s native behavior as much as possible\n\n    * Perform **accurate, non-destructive fine-tuning**\n\n    * Avoid hacks that work short-term but break compatibility with **vLLM / TensorRT-LLM** later\n\n\n\n\n**What I’m looking for**\n\n  * Has anyone successfully:\n\n    * Fine-tuned GPT-OSS\n\n    * Converted it to GGUF\n\n    * Deployed it with **Ollama**\n\n    * While preserving the Harmony template behavior?\n\n  * If yes:\n\n    * Did you modify the **chat template / Modelfile**?\n\n    * How did you handle EOS + reasoning channels?\n\n    * Any pitfalls to avoid to keep it production-ready for Triton later?\n\n\n\n\nAny concrete guidance, references, or proven setups would be extremely helpful.",
  "title": "Guidance Needed: GPT-OSS 20B Fine-Tuning with Unsloth → GGUF → Ollama → Triton (vLLM / TensorRT-LLM)"
}