The last security boundary is the budget

Most of the security work we do on agents happens upstream. Permission scoping, runtime authorization, credential isolation, validation of arguments derived from outside text. These are the layers that should prevent an agent from doing the wrong thing in the first place. The layer that catches everything those miss is downstream and much less interesting to talk about. It is the monthly spending cap.

We give every agent a fixed amount of compute cost it can incur before the system pauses it. The number is small for most agents, often a few cents to a few dollars per month, depending on what role it holds. When the agent reaches the cap, the runtime stops accepting new work for that identity until the next cycle or until a human raises the ceiling.

We have come to think of this number as the blast radius. Not the budget. The blast radius.

Why the upstream layers eventually fail

The security model we describe elsewhere assumes that authorization checks, scope-limited credentials, and structural compartmentalization will catch most misbehavior. They do. What they cannot catch is a misbehavior that fits inside what the agent is legitimately allowed to do.

An agent with permission to read all issues in a project can read all issues in a project at a rate determined by its loop. An agent with permission to post comments can post comments at the rate determined by its retry behavior. An agent with permission to scrape pages can scrape pages until the network or a quota stops it. Each operation is individually authorized. Each is bounded only by whatever rate limit we built around it.

We have watched agents loop. We have watched them call the same API in a tight retry sequence because some intermediate error did not surface in a way they understood as terminal. We have watched them re-create work that already existed because their context did not include the recent change. None of these are security incidents in the classical sense. None of them are unauthorized. They are an authorized agent doing an authorized thing too many times in a row.

The economic ceiling is the thing that catches them. Whether the agent is malfunctioning, compromised, or stuck, the loop eventually costs money. Eventually it costs more than the cap. Then it stops.

The argument for setting the cap deliberately

Many systems have a budget mechanism but treat it as a billing concern, not a security concern. The cap is set by whoever is paying, sized to “what we expect to spend.” That framing misses what the cap is actually for.

The cap is for the situation where the agent is not behaving the way we expect. If the cap is sized to expected use, it absorbs unexpected use until it exhausts the entire monthly allocation, at which point the agent goes silent without any earlier warning. We started sizing the cap to roughly twice the normal monthly spend for a given role, and putting a separate alert at the expected number. The cap is the wall. The alert is the warning that we are approaching it from an unexpected direction.

For some roles, we go further. An agent that primarily reviews work has a much smaller cap than an agent that primarily writes code. A short-lived utility agent has a cap that allows perhaps two runs per month at expected cost. The asymmetry is intentional. The agents with the most reach should have the most constrained ceilings, because the cost of those agents looping or being subverted is the highest.

How the cap composes with run-level credentials

The cap composes with another defense we rely on, which is that each run gets its own short-lived credential. A run cannot persist past its allocated window. Even if some adversarial input convinces the agent to schedule future work, the credential to execute that work is already gone by the time the future arrives.

Combined with the spend cap, this produces a pattern we have come to like. The agent can do at most N dollars of work in a month. Any single attempt by the agent is bounded to a single run. The runs are independently authenticated. The compromise of one run does not extend to the next. The compromise of an agent identity is capped by the cap.

We are aware that this is a defense against carelessness more than malice. A determined attacker who has obtained an agent’s long-lived configuration could in theory raise the cap before the cap saves them. The runtime makes that difficult by requiring a separate, more privileged identity to change the cap, but that is one more authorization decision, and authorization decisions can fail.

What the cap reliably protects against is the much more common failure mode, which is an agent that wanders. A model that has been wandering for two weeks costs the same as a model that has been working for two weeks. We notice the difference by reading the issue feed, not by reading the bill. The cap is what limits the cost of not noticing immediately.

What we think the cap actually is

It is the layer that does not require anyone to be paying attention. Every other layer assumes that someone, somewhere, is doing their job. The validation logic was written correctly. The permission check was applied. The credential was scoped. The review happened. All of those are good assumptions in steady state. None of them survive a single careless deploy.

The cap survives the careless deploy. It survives the validation logic having a bug. It survives the permission check being too generous. It does not require any of those things to be working. It requires only that the billing system count.

We design for the cap before we design for the rest. Knowing how much an agent can cost while broken is the first decision. Everything else is a refinement on that number.

The last security boundary is the budget

Why the upstream layers eventually fail

The argument for setting the cap deliberately

How the cap composes with run-level credentials

What we think the cap actually is

More from the team

Why we ask the agent to stamp its own runs

The error path is a public response too