We used to write runbooks. Most of them lived in a wiki, labeled by what they fixed. “If the deploy hangs, run this.” “If the queue depth spikes, do that.” The format was familiar. A heading, a short description, a numbered list of steps, and a note at the bottom about who to call if it didn’t work.
That format assumed something we didn’t notice at first. It assumed the person reading the runbook could think. They could see step three say “restart the worker” and decide, on their own, that they should also flush the lock file the worker had left behind. They could see step five say “wait for the queue to drain” and improvise when the queue refused to drain. They could read the words and interpolate the gaps.
When the operator stopped being a person and started being an agent, the gaps stopped getting interpolated.
What the runbook assumed
Runbooks were never executable. They were a hybrid of documentation and instruction, written for a reader who would read them and act with judgment. The author and the reader were both engineers, working from the same loose model of how the system behaved. When the runbook said “if you see X, do Y”, it meant “if you see X or something close to X, do Y or something close to Y, and reason about whether the result looks right.”
An agent following the same runbook does not reason about whether the result looks right. It does exactly what step three says. If step three says restart the worker, the worker gets restarted. If a stale lock file blocks the restart, the agent reports the error and stops. The runbook did not say “also remove the lock file”, because the author of the runbook expected the reader to notice the lock file and handle it.
This is not an agent limitation. It is a contract problem. The runbook was always a partial specification, and the rest of the specification lived in the reader’s head. We had been pretending the runbook was the whole thing.
What broke when we tried
The first time we asked an agent to follow a deploy runbook step by step, it got two steps in and got stuck on an ambiguity. The runbook said “tag the release”. There was a version field somewhere that needed a value, and the value depended on whether this was a hotfix or a regular release, and the runbook didn’t say which. A human reading it would have shrugged and looked at the calendar. The agent had to ask.
We rewrote that step three times. Each rewrite added more conditionals, more checks, more explicit branching for the cases the original author had silently handled. By the fourth pass, the runbook was a procedure with branching logic, error handling, and validation. It was no longer a document. It was a script that happened to be written in English.
That was the moment we admitted we were doing two jobs at once. We were trying to write a document that humans could skim and an executable spec that an agent could follow exactly. Those are different artifacts. We were paying twice for both and getting an unreliable version of each.
What replaced it
We started writing the script directly. Each runbook became a checked-in executable: a shell script, sometimes a small program, occasionally a workflow definition. Every branch the runbook had implied became an explicit conditional. Every preflight check the human reader had performed by instinct became an assertion at the top of the script. Every “make sure X before Y” turned into a verifier the script called.
The shift had three effects we did not see coming.
The first is that scripts are easier to test than documents. A runbook step that says “restart the worker and confirm it picks up jobs” can only be tested by reading it carefully. A script with the same intent can be run against a fixture and either passes or fails.
The second is that scripts are honest about what they do not handle. A runbook can omit a case and still feel complete. A script that omits a case will crash the moment that case shows up in production. The pain is faster, but the pain is also pointing at the right thing. We have fewer runbook-shaped illusions and more runbook-shaped holes that we know about.
The third is that scripts can be safely re-entered. A heartbeat that dies halfway through a deploy can be picked up by a fresh heartbeat that re-runs the same script and skips the steps that already finished. We could not do that with a document. With a document, “where did the last operator stop reading” was always a guess.
What we kept from the runbook
We did not throw the prose away. We kept the narrative parts, the why, the things a future maintainer would need to know about how this script came to exist and what it is protecting against. Those moved into comments at the top of each script, and into the commit messages that accompany changes. The script knows the steps. The comment knows why.
The split has been clarifying. The procedural detail belongs to the operator. The reasoning belongs to whoever has to change the procedure later. When we mixed them, both got muddled. When we separated them, both got sharper.
What this changes about operator-readable infra
The mental shift is small but it changes how we plan operations work. We stop asking “what would the on-call person do” and start asking “what would the runtime do, with no operator in the loop and no improvisation available.” That second question forces the failure modes into the open earlier.
We still write prose, and we still talk about the system in human terms. The change is that the words we write to instruct are no longer the words we write to explain. The script is the instruction. The prose is the explanation. The two artifacts can finally do their separate jobs well.