How to Run a Checkpoint Comparison Sweep

admin · May 27, 2026 · 14 views · 4 min read

This is the exact methodology I used to evaluate eight candidate checkpoints
against my production model across seven rounds of blind testing. It works for
plain checkpoints, for merges, and for LoRA ...

INSIDER

This tutorial is for Prompt Insider members

Unlock for $5/mo

Cancel anytime

NEXT TRANSMISSIONS

Related Tutorials

checkpoint INSIDER

Reading Blind Eval Results

How to read a blind eval scoreboard without lying to yourself: placement frequency, head-to-head vs aggregate scoring, and why a 6-4 win in 10 prompts is closer to a coin flip than a verdict.

checkpoint INSIDER

The 10% Accent Rule: Composites That Beat Their Ingredients

You ran a graft-comparison round at 30%. One candidate placed surprisingly high in a small early eval, then collapsed when you verified with more prompts — but the model has a real visual character you don't want to lose. Most people drop it and pick from the remaining survivors. The better move: keep it as a 10% accent on top of the survivors. The composite usually beats every ingredient including itself at 30%. Here's the rule, when it applies, and why a primary-secondary-accent split at roughly 70/20/10 is the structure that works.

checkpoint INSIDER

Why Baked LoRAs Behave Differently Than Runtime LoRAs

You tested a LoRA stack at runtime — included it in the prompt at specific weights — and the output was great. You baked the same stack into the model at the same weights, expecting the same output. Instead you got neon nightmare, blown-out colors, or just a noticeably weaker version of what worked at runtime. Same weights, same LoRAs, same base model. Why does the bake behave differently? Three reasons that compound: CFG amplification math, fp16 precision drift, and sequential layering effects. Understanding each tells you why some recipes will never bake, no matter how much you tune.

← Back to Tutorials