How to Run a Checkpoint Comparison Sweep
against my production model across seven rounds of blind testing. It works for
plain checkpoints, for merges, and for LoRA ...
Related Tutorials
Reading Blind Eval Results
How to read a blind eval scoreboard without lying to yourself: placement frequency, head-to-head vs aggregate scoring, and why a 6-4 win in 10 prompts is closer to a coin flip than a verdict.
The 10% Accent Rule: Composites That Beat Their Ingredients
You ran a graft-comparison round at 30%. One candidate placed surprisingly high in a small early eval, then collapsed when you verified with more prompts — but the model has a real visual character you don't want to lose. Most people drop it and pick from the remaining survivors. The better move: keep it as a 10% accent on top of the survivors. The composite usually beats every ingredient including itself at 30%. Here's the rule, when it applies, and why a primary-secondary-accent split at roughly 70/20/10 is the structure that works.
Why Baked LoRAs Behave Differently Than Runtime LoRAs
You tested a LoRA stack at runtime — included it in the prompt at specific weights — and the output was great. You baked the same stack into the model at the same weights, expecting the same output. Instead you got neon nightmare, blown-out colors, or just a noticeably weaker version of what worked at runtime. Same weights, same LoRAs, same base model. Why does the bake behave differently? Three reasons that compound: CFG amplification math, fp16 precision drift, and sequential layering effects. Understanding each tells you why some recipes will never bake, no matter how much you tune.