The Philosophical Ledger · Operating Models · June 2026

The Levels, Revisited for Agents

The four questions diagnosed whether your organisation was AI-native. The agentic era forces two more — and they decide whether your agents can be trusted to run at all.

A while ago I wrote that “AI-native” had become a personality trait — a phrase companies stretched to mean whatever served the press release — and I borrowed the autonomous-vehicle levels to force some precision back into it. Four diagnostic questions: what can AI see, what can it do, who can extend it, and how has the organisation actually changed. Your real level was your weakest answer. The asymmetry was the diagnostic.

Those four questions still hold. But I wrote them for a world of AI usage — copilots, summarisers, retrieval. Agents change the question. An agent does not just answer; it plans, acts on systems of record, hands work to other agents, and — if you let it — takes consequential action without a human pressing go. The four questions measure how much capability you have wired in. They say almost nothing about whether you should trust any of it in production. In the agentic era, that silence is the whole problem.

The four questions measure capability. The agentic era is decided by trust.

Authority is not the same as trust

Look again at the second question — what can AI do? It rewards a company for letting the system act: open the PR, update the CRM, reconcile the invoice. A year ago that was the frontier. Today it is the easy part. The hard question is the one hiding underneath it: would you let it do those things without watching?

Most “advanced” deployments fail this quietly. The agent can act, so the org checks the “agents in production” box — while a human reviews every action before it lands, because nobody can actually verify what the agent did or why. That is not autonomy. That is a very expensive intern with a fast typing speed. Authority you cannot verify is not capability; it is liability you have not priced yet.

This is why I keep saying the next layer of AI is not model capability. It is the operating discipline around agents — evaluation, traces, approvals, recovery — that lets you grant authority because you can prove how the system behaved. It is also why I build Ninja Harness and Agent OS: not to make agents more capable, but to make them trustworthy enough to actually run.

The fifth question: Trust

So the fifth question, the one the agentic era forces: would you let it act unwatched?

Not “can it act.” Whether you can let it act without a human in the loop — because you can verify the trace, not just the answer; because there are evals that gate a weak run before it ships; because there is a kill-switch and an audit trail an examiner would accept. The honest answers run from “no, we don’t trust it” to “it acts unsupervised because we can prove how it behaved.” Most organisations with impressive agent demos sit, answered honestly, near the bottom of this axis. That is not a failure. It is the truth the demo was built to hide.

The sixth question: Memory

The sixth follows from the old “compounding OS” level, but the original four never tested it directly: does the system compound, or start cold?

An agent that begins every run from zero cannot become an operating system, no matter how capable each run is. The difference between a stateless assistant and a compounding one is whether context, outcomes, and learning persist — whether the system gets better at your business over time or just gets invoked again. Memory is what turns a tool into an institution. Without it, the higher levels are unreachable; you are just running the lowest level faster.

An agent that forgets every run can run forever and never compound.

The asymmetry, again

So the diagnostic now has six axes — four about your organisation, two about your agents: substrate, authority, trust, memory, extensibility, structure. The rule is unchanged. Your real level is your weakest answer. A real company rarely sits at the same level across all six, and the asymmetry is the point.

What changes is where the asymmetry now hides. A year ago the bottleneck was substrate — the work simply wasn’t legible to a machine. Today, for the organisations that did the substrate work, the bottleneck has moved to trust. They built agents that can act and skipped the layer that lets those agents be trusted to. On the six-axis chart the shape caves in at trust — and a company that calls itself level four is, by its weakest axis, sitting at level zero.

Capability was never the bottleneck

The uncomfortable conclusion is the one the radar tracks every day: the constraint on enterprise AI was never how capable the models are. It is whether institutions can trust what they already have enough to let it run. Capability is abundant and getting cheaper. Trust is scarce, and you have to build it — in traces, in evaluation, in governance, in memory that compounds.

The levels are how you find out where you actually are. The four questions tell you whether your organisation is AI-native. The two new ones tell you whether your agents are anything more than a demo. Answer all six honestly. The asymmetry is still the diagnostic — it is just measuring the thing that finally matters.