Need help getting started with image generation
When using open-source generative AI models, there are still some limitations with AMD GPUs. While things have improved significantly on Linux and Windows 11 + WSL2 environments today, options remain quite limited on Windows 10…
What you’re setting up (two separate local apps)
- Image generation: Stable Diffusion 1.5 “weights” + a GUI that runs locally (you open it in your browser at
127.0.0.1). - Prompt enhancement: a small local text model that turns “an idea” into POSITIVE / NEGATIVE / SETTINGS you copy/paste into the image GUI.
Keeping them separate is the simplest “offline + no-coding” workflow.
The most realistic Windows 10 + AMD path (no WSL2)
Best first-success route
SD.Next + ONNX Runtime + DirectML (DmlExecutionProvider)
SD.Next explicitly supports ONNX Runtime and notes you can select DmlExecutionProvider by installing onnxruntime-directml, and that DirectX 12 is required. (GitHub)
Alternatives (only if you want them later)
- AUTOMATIC1111 + Microsoft DirectML extension: uses ONNX Runtime + DirectML, but requires models optimized via Olive (more moving parts). (GitHub) AMD’s own guide for that extension calls it “preview” and (in that guide) states only SD 1.5 is supported. (AMD)
- A1111 main repo on Windows+AMD: not officially supported; their wiki points to DirectML-focused forks/approaches instead. (GitHub)
- SD.Next + ZLUDA: can be a speed/compatibility upgrade on some AMD cards, but it’s an “after you already work” option. SD.Next documents launching it with
--use-zludaand notes HIP SDK version constraints. (GitHub)
Step-by-step: SD 1.5 image generation with SD.Next (Windows 10 + AMD)
0) Put it in an easy folder
Use something like:
C:\AI\sdnext\
Avoid OneDrive/Desktop/Program Files. (This prevents many permissions/path problems.)
1) Install the basics (one-time)
- Latest AMD GPU driver + reboot
- Git for Windows
- Python (many SD Windows setups are happiest on Python 3.10.x)
2) Install + start SD.Next (use cmd.exe , not PowerShell)
Open Command Prompt and run:
cd C:\AI
git clone https://github.com/vladmandic/sdnext.git
cd sdnext
webui.bat --debug
SD.Next documents launching on Windows with webui.bat --debug. (GitHub)
When it finishes starting, it prints a local URL (often http://127.0.0.1:7860). Open that in your browser.
3) Add an SD 1.5 model file (the “weights”)
A common starter SD 1.5 checkpoint is:
v1-5-pruned-emaonly.safetensors(license shown as creativeml-openrail-m) (Hugging Face)
Place the .safetensors file into SD.Next’s model folder (SD.Next “Getting Started” covers the basic “generate with a few clicks” workflow and model handling). (GitHub)
4) Turn on AMD GPU acceleration (ONNX Runtime + DirectML)
In SD.Next, switch to the ONNX Runtime pipeline and choose DmlExecutionProvider (DirectML). SD.Next notes:
- DML EP becomes available by installing
onnxruntime-directml - DirectX 12 is required (GitHub)
Why this matters: ONNX Runtime’s DirectML EP has specific constraints (for example, it does not support memory-pattern optimizations or parallel execution in ORT sessions). (ONNX Runtime)
5) First “known-stable” test settings (prove it works)
Start conservative:
- 512×512
- Steps: 20
- CFG: ~7
- Batch size: 1
Test prompts:
- Positive:
portrait photo, soft studio lighting, sharp focus - Negative:
lowres, blurry, watermark, text, bad anatomy, extra fingers
Once you can generate one image reliably, then raise resolution/complexity.
Quick troubleshooting (the fastest fixes)
A) Start in “safe mode” to remove extension problems
webui.bat --debug --safe
--safe disables user extensions and is recommended for troubleshooting. (GitHub)
B) UI acts broken / buttons don’t work
SD.Next recommends deleting ui-config.json if it’s bloated (old settings can override new defaults and break the UI). (GitHub)
C) DirectML crashes / weird ORT errors
DirectML EP requires certain ORT options (mem-pattern + parallel execution) to be disabled; enabling them can cause errors. (ONNX Runtime)
If you see errors like 80070057, they’re commonly associated with those constraints; ONNX Runtime has issue reports in this area. (GitHub)
Prompt enhancement (offline, GUI-first)
Pick one “local chat” app
Option 1: Jan (desktop GUI, open source, offline)
Jan is presented as an open-source ChatGPT-like app for running models locally. (GitHub)
Option 2: KoboldCpp (single EXE + browser UI; good AMD hint)
KoboldCpp releases explicitly recommend the Vulkan option in the nocuda build for AMD. (GitHub)
Option 3: Ollama (simple installer)
Ollama’s Windows docs state it does not require Administrator and installs in your home directory by default. (Ollama Official Document)
Good beginner prompt-enhancer models (small + practical)
Specialized prompt optimizers (often best for SD prompting):
- TIPO-200M (prompt optimization for text-to-image workflows). (Hugging Face)
- DART v2 (generates Danbooru-style tags; useful if you like tag prompts). (Hugging Face)
General small instruct model (good at structured output):
- SmolLM2-1.7B-Instruct (compact “run on-device” class model). (Hugging Face)
Copy/paste template for your prompt enhancer
Use this once as your “system prompt” (or first message).
You write prompts for Stable Diffusion 1.5.
Return exactly these sections:
POSITIVE:
NEGATIVE:
SETTINGS:
VARIATIONS:
Rules:
- POSITIVE: 1–2 lines. Include subject, environment, lighting, camera/framing, style/medium.
- NEGATIVE: comma-separated. Include common artifacts: lowres, blurry, watermark, text, deformed hands, extra fingers.
- SETTINGS: suggest resolution (start 512x512), steps (20–30), CFG (6–8).
- VARIATIONS: 5 short alternate POSITIVE prompts that keep the same idea but change lighting/camera/mood.
User idea: <paste your idea here>
Workflow:
- Write your idea → 2) copy POSITIVE/NEGATIVE/SETTINGS → 3) paste into SD.Next → 4) generate.
Discussion in the ATmosphere