You're not closing one hole; you're designing the agent loop. A one-shot build passes the instances that dodge its blind spot and bleeds the rest; naming the rules lifts the floor; a verify-and-retry loop reaches back and recovers the tail. Generalization is the skill — and a capable model doesn't save the one-shot, because a frozen prompt can't adapt to inputs it never saw.
Compose it yourself
real sandbox · runs across 8 hidden instances
Author a frozen pipeline — name the rules (decompose), turn on verify+retry, set a budget — then run it across instances you never see. Load a preset and tweak it: watch a one-shot stall, then add structure and climb.
step 1
Author a pipeline, then freeze it
Chain move-cards — frame, decompose, build, verify, retry, fallback. You write the strategy, never any instance's answer.
step 2
It runs across a world you never saw
The frozen pipeline executes per-instance over a hidden sample of the distribution — different instances trip different bugs.
step 3
Generalization is the score
Not 'did one build go green' — what fraction of unseen instances your loop survives, modulated by how few builds it spent.
Three pipelines, the same hidden world
real captured grades · press play
Each frozen pipeline is swept across the same 14 hidden instances (oversampled toward the recency cases a vague prompt blinds itself to). Watch the one-shot stall at its floor while naming the rules lifts the field and a verify+retry loop reaches back to flip the red tiles. Every cell is a real grade.
One swing
FRAMEBUILD
—
generalization 0/14
Name the rules
FRAMEDECOMPOSEBUILD
—
generalization 0/14
Decompose + verify + retry
FRAMEDECOMPOSEBUILDVERIFYRETRY ·1FALLBACK
—
generalization 0/14
instance 0 / 14 recovered by retry one-shot floor
Why this isn't The Closer: The Closer is a single-build loop — you drive one cache to green. The Conductor is a distributionarena — you author one strategy and it must survive instances you never saw. Depth vs breadth. The verb is "field a strategy across a tournament," not "sink one putt."
A recorded demo, scored by the same deterministic function the live arena will. The play-it-yourself composer — drag your own move-cards, set your verify gate and retry budget, run it live against the hidden sample — slots onto this machinery next. ← The Closer