Two loops. We run the outer one.
A loop is a pattern that repeats — and loops nest: a step can be a loop, and the outer loop wraps many inner ones. The inner loop is the agent iterating toward a goal. The outer loop is the system that runs, verifies, governs, and evolves those loops across runs.
The hard parts — verification you can trust, comprehension debt, cognitive surrender — are all outer-loop problems. The frame that sells is the one that's true.
Your loop survives the crash.
A hyperloop is a durable workflow, not a chat. It is enqueued, claimed, executed, and finalized — and if the machine dies mid-run, it picks up exactly where it left off.
Crash between execute and finalize? The next worker reclaims the run from its step cursor and continues — it does not start over.
A run resumes from its step cursor after a crash — no duplicated work.
Cron, interval, or event triggers. The loop runs whether or not you are watching.
Step outcomes stream live over the wire — tail a run as it happens.
A cost ceiling and quality gates can pause a run before it spends more than it should.
The loop rewrites the loop.
A meta-loop proposes a change to a loop's steps or prompts. The change passes through real gates and is either held for a human or applied in a controlled sandbox. Every revision is diffed and attributable.
Two separate switches, default-deny. Auto-apply is an explicit operator decision — off in production.
It can't file down its own brakes.
The optimizer's action space cannot name the kill switch, the budget, the audit key, or its own grader — not because a runtime check forbids it, but because a change that tries simply fails to parse. The cage is a type boundary, and a build-time test fails the build if anyone ever widens it.
The floor is small and load-bearing. Everything above it can change; it cannot.
Not expressible in any diff the proposer can emit.
Kill, pause, and quarantine are always one click away. Counts shown are illustrative.
Every decision leaves a signed trace.
Reliability, latency, and cost — measured on every run, not estimated.
An output-contract grader the loop cannot rewrite checks form, not correctness.
Every decision links to the previous one. Tamper with a block and the chain breaks.
Did the cage cost more than it saved? The ledger answers in cents.
The cheap mechanical signals measure reliability, cost, and output form — not correctness. We do not claim the machine knows “good.” That is why the next section exists.
We ran it hot and tried to break our own thesis.
A blind optimizer — it never sees the answer key — evolves a data-QA classifier under a cheap, label-free form signal. Then, out of band and behind a firewall, we measure whether that signal actually drags real accuracy up, or whether quality stays flat while the proxy looks perfect. That second outcome has a name: Goodhart.
The sweep is still running. We publish the verdict when the numbers settle — including if the answer is no.
Form and held-out quality are scored independently. The whole point is to catch them diverging. Values shown are placeholders until the run is wired in.
Few infrastructure projects will show you their own falsification test. We built one because the self-improvement claim is a bet — and a bet you can't lose isn't worth making.
today, list, run, tail, trace, deploy. Your loops in the terminal, with live event streams.
Expose your fleet to Claude Code, Cursor, Codex, or Continue. Loops become tools any agent can call.
REST for state, events for streams, a typed client for everything. Scoped, bearer-auth keys.