Diagnosing Intermittent Checkpoint Failures with a Tensor Health Scan
Related Tutorials
Tame Your SD Output Library: Sort by Model, Then Browse
Two small scripts for anyone drowning in generated images: sort a mixed output folder into per-model subdirectories by reading each file's embedded metadata, then page through the result in a keyboard-driven 2x2 grid.
Production-Grade Blind Evaluation: Four Pipeline Gotchas That Will Bite You
You wrote a script that auto-generates images across N checkpoints and feeds them into a blind eval. It works once. It breaks the next time. The failure modes are subtle: filename gaps that shift every subsequent image's label by one; old prompt-dirs from yesterday's run leaking into today's; new checkpoints invisible because the API cached its model list at startup; non-realism checkpoints saturating to black on prompts with heavy double-parens. None of these announce themselves; you just get a result that's quietly wrong. Here are the four gotchas, exactly what each one does, and the specific code fix for each.
Bake Stability Diagnostics — When Your Recipe Won't Bake
You found a perfect LoRA recipe at runtime. Tournament-tested it across multiple rounds. Picked the winner. You bake it into the checkpoint and the output is neon nightmare at every CFG. Lowering CFG doesn't help. Lighter weights don't help. Fresh base doesn't help. You've burned half a day on a recipe that won't survive being baked. Here's the diagnostic batch I use when this happens — five controlled variant bakes running in parallel, each isolating a different cause. By the time the batch finishes, you know exactly what broke (and usually it's something you couldn't have predicted from runtime behavior).