Deepseek? Qwen?
Hugging Face Forums [Unofficial]
June 9, 2026
Guys, I don’t even know where to begin. I’m used to working with RTX 3090s and the models that fit this type of setup. Recently, I had the opportunity to work with a server with an H200 and 2TB of RAM, and now I have no idea what to use in this setup. I was thinking of using Deepseek v4 Flash, but in conversations with ChatGPT, he’s been telling me I won’t get good results. Does anyone have any experience with VLLM in a setup like this? Or can anyone tell me what the best options are for a setup of this size? Initially, it will be for some internal applications working with text; we’ll use it mostly for testing to see if it performs well.
Discussion in the ATmosphere