External Publication

Using a Hugging Face Model offline to support code generation in VSCode

Hugging Face Forums [Unofficial] March 25, 2026

I am trying to use a Hugging Face offline http://localhost:11434/api/generate (spoofing ollama) access model with VSCode. After I get success there, I might try it with openclaw. I am unable to get VSCode to access the model. I have tried Continue, LM Studio, CodeGPT and AI Tools in VSCode. I run into a wall of non-functionality, or a demand that I log in via Google or something, I want no logins. With AI Tools, it tried once to access /api/tags, so I looked up what ollama returns for tags, and wrote code to spoof that in my api. I just want VSCode to send a prompt to (spoof ollama interface) wait for the dictionary with a “response” variable, and use it when it comes. I am a beginner in AI/Python/VSCode, not a beginner in a lot of “old” languages. I have: * Downloaded example code for use of Qwen-2.5 Coder 3B offline on my GPU (6GB) * Used the LLM model to help me learn Python to expand my code into a chatbot (copy&paste from chatbot to VSCode, tinker, debug, expand again…) * Used model to learn more Python in two other expansions. * Developed code to run the model as a (spoof ollama) interface, using uvicorn, tested it with curl, and created a /api/chatbot interface too, used it there. I can tell from connections printed to that terminal window for my uvicorn server when there is an attempt to contact the local LLM. I want to use Hugging Face, not Ollama. I want completely private sessions, no tokens, no tracking, no telemetry, no logins. I have achieved that with Hugging Face. The model I chose, it just works with my card, I will try others later. If VSCode (OpenClaw) is just intentionally incompatible with Hugging Face, fine; a link to an explanation why would be appreciated. If this can be made to work, please provide a link to the clearest explanation of how. Thank you

Discussion in the ATmosphere