TRANSMISSION · BEGINNER · DECRYPTING

beginner FREE

ComfyUI on Windows: Your First Image in 15 Minutes

admin · Apr 7, 2026 · 47 views · 5 min read

Why ComfyUI?

If you already read the Automatic1111 Windows guide, you might be wondering why there's a whole other program that does the same thing.

A1111 gives you a traditional form — type a prompt, tweak some settings, hit generate. ComfyUI gives you a node graph where you wire the pieces together visually. It looks more complicated at first, but it's faster, uses less VRAM, and gives you way more control once you learn it.

For your first image though? It's just as easy. ComfyUI comes with a default workflow already wired up. You load a checkpoint, type a prompt, and hit go.

What You Need

Windows 10 or 11
An NVIDIA GPU with at least 4GB of VRAM. ComfyUI is more memory-efficient than A1111, so it runs on weaker cards. 6GB+ is comfortable.
About 15GB of free disk space to start

Step 1: Download ComfyUI

Go to the ComfyUI GitHub releases page and download the latest Windows portable package. It's a .7z or .zip file.

Extract it somewhere — your Desktop or a dedicated folder. You'll get a folder called something like ComfyUI_windows_portable.

That's it. No Python install, no Git, no command line. The portable version bundles everything.

Step 2: Download a Checkpoint

Same as with A1111 — you need a model to generate images. If you already have one from following the A1111 guide, you can use the same file.

Good first checkpoints by style:

For anime/illustration:

Anything V5 — classic anime, forgiving with prompts
MeinaMix — clean anime with good anatomy

For realistic/photographic:

Realistic Vision — go-to for realistic portraits
epiCRealism — natural-looking skin and lighting

For stylized/3D:

DreamShaper — versatile, does everything
RevAnimated — fantasy and 3D-style art

Download the .safetensors file and put it in:

ComfyUI_windows_portable\ComfyUI\models\checkpoints\

Step 3: Launch It

Double-click run_nvidia_gpu.bat in the portable folder.

A terminal window opens. Wait until you see:

To see the GUI go to: http://127.0.0.1:8188

Open that in your browser. You'll see the node graph — a bunch of connected boxes. Don't panic. The default workflow is already set up for basic text-to-image generation.

Step 4: Your First Image

The default workflow has everything wired up. You just need to do three things:

Pick your checkpoint. Find the node labeled "Load Checkpoint" (usually top-left). Click the dropdown and select the model you downloaded. If it doesn't show up, click the refresh button or restart ComfyUI.
Type your prompt. Find the node labeled "CLIP Text Encode" that's connected to the positive input. Click in the text box and type something simple:

a girl standing in a field of flowers, sunset, beautiful lighting

There's a second CLIP Text Encode node for the negative prompt. You can leave it empty or type bad quality, blurry — it's not critical right now.

Set your size. Find the "Empty Latent Image" node. This is where you set width and height. Start with:
- 512 x 768 for portrait (SD 1.5 checkpoints)
- 832 x 1216 for portrait (SDXL checkpoints)

Click Queue Prompt (the button in the sidebar, or press Ctrl+Enter).

Your image appears in the "Save Image" or "Preview Image" node when it's done. First one takes 10-30 seconds depending on your GPU.

Step 5: Play With It

Generate a few images. Change the prompt. Try different sizes.

The main thing to experiment with early:

Steps. Find the "KSampler" node. The steps value controls quality vs speed. Start at 20. Bump to 30 if you want more detail. Don't bother going above 40.
Size. Change width and height in the Empty Latent Image node. Taller = portrait. Wider = landscape. Bigger = slower but more detail.

Leave cfg, sampler_name, scheduler, and denoise at their defaults for now. They matter, but not for your first day.

The Node Graph Looks Scary But It's Simple

Here's what the default workflow actually does, left to right:

Load Checkpoint — loads the AI model
CLIP Text Encode (positive) — your prompt, what you want to see
CLIP Text Encode (negative) — what you don't want to see
Empty Latent Image — the canvas size
KSampler — the brain, takes all the inputs and generates the image
VAE Decode — converts the raw output into a viewable image
Save/Preview Image — shows you the result

That's the entire pipeline. Every workflow in ComfyUI — no matter how complex — is just a fancier version of this same chain.

What's Next?

Once you're comfortable generating:

Why I don't start with a prompt — the mindset shift that changes everything
The Three Starting Points — how to go from reference images to better prompts
How I Remix Any Prompt — how to take someone else's prompt and make it your own

Welcome to the rabbit hole.

NEXT TRANSMISSIONS