Using a Hugging Face Model offline to support code generation in VSCode
Hugging Face Forums [Unofficial]
March 25, 2026
I am trying to use a Hugging Face offline http://localhost:11434/api/generate (spoofing ollama) access model with VSCode. After I get success there, I might try it with openclaw.
I am unable to get VSCode to access the model.
I have tried Continue, LM Studio, CodeGPT and AI Tools in VSCode. I run into a wall of non-functionality, or a demand that I log in via Google or something, I want no logins. With AI Tools, it tried once to access /api/tags, so I looked up what ollama returns for tags, and wrote code to spoof that in my api. I just want VSCode to send a prompt to (spoof ollama interface) wait for the dictionary with a “response” variable, and use it when it comes.
I am a beginner in AI/Python/VSCode, not a beginner in a lot of “old” languages. I have:
* Downloaded example code for use of Qwen-2.5 Coder 3B offline on my GPU (6GB)
* Used the LLM model to help me learn Python to expand my code into a chatbot (copy&paste from chatbot to VSCode, tinker, debug, expand again…)
* Used model to learn more Python in two other expansions.
* Developed code to run the model as a (spoof ollama) interface, using uvicorn, tested it with curl, and created a /api/chatbot interface too, used it there. I can tell from connections printed to that terminal window for my uvicorn server when there is an attempt to contact the local LLM.
I want to use Hugging Face, not Ollama. I want completely private sessions, no tokens, no tracking, no telemetry, no logins. I have achieved that with Hugging Face. The model I chose, it just works with my card, I will try others later.
If VSCode (OpenClaw) is just intentionally incompatible with Hugging Face, fine; a link to an explanation why would be appreciated.
If this can be made to work, please provide a link to the clearest explanation of how.
Thank you
Discussion in the ATmosphere