ComfyUI on Windows: Your First Image in 15 Minutes
Why ComfyUI?
If you already read the Automatic1111 Windows guide, you might be wondering why there's a whole other program that does the same thing.
A1111 gives you a traditional form — type a prompt, tweak some settings, hit generate. ComfyUI gives you a node graph where you wire the pieces together visually. It looks more complicated at first, but it's faster, uses less VRAM, and gives you way more control once you learn it.
For your first image though? It's just as easy. ComfyUI comes with a default workflow already wired up. You load a checkpoint, type a prompt, and hit go.
What You Need
- Windows 10 or 11
- An NVIDIA GPU with at least 4GB of VRAM. ComfyUI is more memory-efficient than A1111, so it runs on weaker cards. 6GB+ is comfortable.
- About 15GB of free disk space to start
Step 1: Download ComfyUI
Go to the ComfyUI GitHub releases page and download the latest Windows portable package. It's a .7z or .zip file.
Extract it somewhere — your Desktop or a dedicated folder. You'll get a folder called something like ComfyUI_windows_portable.
That's it. No Python install, no Git, no command line. The portable version bundles everything.
Step 2: Download a Checkpoint
Same as with A1111 — you need a model to generate images. If you already have one from following the A1111 guide, you can use the same file.
Good first checkpoints by style:
For anime/illustration:
- Anything V5 — classic anime, forgiving with prompts
- MeinaMix — clean anime with good anatomy
For realistic/photographic:
- Realistic Vision — go-to for realistic portraits
- epiCRealism — natural-looking skin and lighting
For stylized/3D:
- DreamShaper — versatile, does everything
- RevAnimated — fantasy and 3D-style art
Download the .safetensors file and put it in:
ComfyUI_windows_portable\ComfyUI\models\checkpoints\
Step 3: Launch It
Double-click run_nvidia_gpu.bat in the portable folder.
A terminal window opens. Wait until you see:
To see the GUI go to: http://127.0.0.1:8188
Open that in your browser. You'll see the node graph — a bunch of connected boxes. Don't panic. The default workflow is already set up for basic text-to-image generation.
Step 4: Your First Image
The default workflow has everything wired up. You just need to do three things:
-
Pick your checkpoint. Find the node labeled "Load Checkpoint" (usually top-left). Click the dropdown and select the model you downloaded. If it doesn't show up, click the refresh button or restart ComfyUI.
-
Type your prompt. Find the node labeled "CLIP Text Encode" that's connected to the positive input. Click in the text box and type something simple:
a girl standing in a field of flowers, sunset, beautiful lighting
There's a second CLIP Text Encode node for the negative prompt. You can leave it empty or type bad quality, blurry — it's not critical right now.
- Set your size. Find the "Empty Latent Image" node. This is where you set width and height. Start with:
- 512 x 768 for portrait (SD 1.5 checkpoints)
- 832 x 1216 for portrait (SDXL checkpoints)
Click Queue Prompt (the button in the sidebar, or press Ctrl+Enter).
Your image appears in the "Save Image" or "Preview Image" node when it's done. First one takes 10-30 seconds depending on your GPU.
Step 5: Play With It
Generate a few images. Change the prompt. Try different sizes.
The main thing to experiment with early:
- Steps. Find the "KSampler" node. The
stepsvalue controls quality vs speed. Start at 20. Bump to 30 if you want more detail. Don't bother going above 40. - Size. Change width and height in the Empty Latent Image node. Taller = portrait. Wider = landscape. Bigger = slower but more detail.
Leave cfg, sampler_name, scheduler, and denoise at their defaults for now. They matter, but not for your first day.
The Node Graph Looks Scary But It's Simple
Here's what the default workflow actually does, left to right:
- Load Checkpoint — loads the AI model
- CLIP Text Encode (positive) — your prompt, what you want to see
- CLIP Text Encode (negative) — what you don't want to see
- Empty Latent Image — the canvas size
- KSampler — the brain, takes all the inputs and generates the image
- VAE Decode — converts the raw output into a viewable image
- Save/Preview Image — shows you the result
That's the entire pipeline. Every workflow in ComfyUI — no matter how complex — is just a fancier version of this same chain.
What's Next?
Once you're comfortable generating:
- Why I don't start with a prompt — the mindset shift that changes everything
- The Three Starting Points — how to go from reference images to better prompts
- How I Remix Any Prompt — how to take someone else's prompt and make it your own
Welcome to the rabbit hole.