Diagnosing Intermittent Checkpoint Failures with a Tensor Health Scan

admin · Jun 3, 2026 · 12 views · 6 min read

Some checkpoint failures are constant — a washed VAE hazes *every* image, a broken merge breaks *every* render. Those are easy. The hard ones are intermittent: a model that produces a gray blob on rou...

PRO

This tutorial is for Full Workshop members

Unlock for $15/mo

Cancel anytime

NEXT TRANSMISSIONS

Related Tutorials

tools PRO

Tame Your SD Output Library: Sort by Model, Then Browse

Two small scripts for anyone drowning in generated images: sort a mixed output folder into per-model subdirectories by reading each file's embedded metadata, then page through the result in a keyboard-driven 2x2 grid.

tools PRO

Production-Grade Blind Evaluation: Four Pipeline Gotchas That Will Bite You

You wrote a script that auto-generates images across N checkpoints and feeds them into a blind eval. It works once. It breaks the next time. The failure modes are subtle: filename gaps that shift every subsequent image's label by one; old prompt-dirs from yesterday's run leaking into today's; new checkpoints invisible because the API cached its model list at startup; non-realism checkpoints saturating to black on prompts with heavy double-parens. None of these announce themselves; you just get a result that's quietly wrong. Here are the four gotchas, exactly what each one does, and the specific code fix for each.

tools PRO

Bake Stability Diagnostics — When Your Recipe Won't Bake

You found a perfect LoRA recipe at runtime. Tournament-tested it across multiple rounds. Picked the winner. You bake it into the checkpoint and the output is neon nightmare at every CFG. Lowering CFG doesn't help. Lighter weights don't help. Fresh base doesn't help. You've burned half a day on a recipe that won't survive being baked. Here's the diagnostic batch I use when this happens — five controlled variant bakes running in parallel, each isolating a different cause. By the time the batch finishes, you know exactly what broke (and usually it's something you couldn't have predicted from runtime behavior).

← Back to Tutorials