Every routine is a hundred small decisions. Decisions are glucose; decisions are willpower. Spend them on the work, not on remembering how to begin it.
Not a prompt. A loop.
A recipe is text you paste once. A loop is an object you run, fork, schedule, evolve, and audit — a repeated pattern with everything it needs to be trusted on its own.
The recipe is where you start. The runtime is what makes it safe to repeat.
The empty cells are the product.
Codex and Claude Code converged on the same inner-loop primitives — automations, worktrees, skills, connectors, sub-agents, a state file. The outer-loop jobs — the ones that only exist once an agent runs unattended — are mostly blank in both. That blank is where hyperloops lives.
Five of these six are outer-loop plumbing that is built and verified. The sixth — Evolve — is the open frontier: gated, dormant, and under test, because whether a loop can safely improve its own loops is exactly the question BOBOS is built to answer — and the result isn't in yet.
Start with a loop.
Open, forkable agent loops — nothing's for sale. Each one is a reusable operating pattern (trigger, steps, model, tools, checks, budget, governance), and each card carries its governance level and an evidence-tagged metric, not a copy button.
A governed run in five minutes. No key required.
A cold install opens on three keyless agent-coding demo loops. Press run and watch a real run finish — verified against a frozen output contract — without connecting an account. When you're ready, connect a provider and point loops at your own work.
A failed pipeline comes back with a ranked root cause and a minimal patch plan.
A diff is checked against a frozen review contract — findings are graded, not vibed.
Merged work becomes a release candidate: changelog, smoke checks, rollback notes.
Your loop survives the crash.
A hyperloop is a durable workflow, not a chat. It is enqueued, claimed, executed, and finalized — and if the machine dies mid-run, it picks up exactly where it left off.
Crash between execute and finalize? The next worker reclaims the run from its step cursor and continues — it does not start over.
A run resumes from its step cursor after a crash — no duplicated work.
Cron, interval, or event triggers. The loop runs whether or not you are watching.
Step outcomes stream live over the wire — tail a run as it happens.
A cost ceiling and form-contract gates can pause a run before it spends more than it should.
The loop rewrites the loop.
A meta-loop proposes a change to a loop's steps or prompts. The change passes through real gates and is either held for a human or applied in a controlled sandbox. Every revision is diffed and attributable.
Two separate switches, default-deny. Self-modification ships Labs-gated and dormant by default — it never runs unless an operator turns it on.
It can't file down its own brakes.
The optimizer's action space cannot name the kill switch, the budget, the audit key, or its own grader — not because a runtime check forbids it, but because a change that tries simply fails to parse. The cage is a type boundary, and a build-time test fails the build if anyone ever widens it.
The floor is small and load-bearing. Everything above it can change; it cannot.
Not expressible in any diff the proposer can emit.
Kill, pause, and quarantine are always one click away. Counts shown are illustrative.
Every decision leaves a signed trace.
Reliability, latency, and cost — measured on every run, not estimated.
An output-contract grader the loop cannot rewrite checks form, not correctness.
Every decision links to the previous one. Tamper with a block and the chain breaks.
Did the cage cost more than it saved? The ledger answers in cents.
The cheap mechanical signals measure reliability, cost, and output form — not correctness. We do not claim the machine knows “good.” That is why the next section exists.
We're running it hot — and trying to break our own thesis.
A blind optimizer — it never sees the answer key — evolves a data-QA classifier under a cheap, label-free form signal. Then, out of band and behind a firewall, we measure whether that signal actually drags real accuracy up, or whether quality stays flat while the proxy looks perfect. That second outcome has a name: Goodhart.
The sweep is still running. We publish the verdict when the numbers settle — including if the answer is no.
Form and held-out quality are scored independently. The whole point is to catch them diverging. Values shown are placeholders until the run is wired in.
Few infrastructure projects will show you their own falsification test. We built one because the self-improvement claim is a bet — and a bet you can't lose isn't worth making.
Run it where your
agents already work.
Everything the dashboard does, the CLI and MCP do too. Install the bridge once and your agents can list, fork, schedule, run, tail, cancel, and audit loops without leaving their workspace.
today, list, run, tail, trace, deploy. Your loops in the terminal, with live event streams.
Expose your fleet to Claude Code, Cursor, Codex, or Continue. Loops become tools any agent can call.
REST for state, events for streams, a typed client for everything. Scoped, bearer-auth keys.
