The Philosophical Ledger · Operating Models · May 2, 2026
The Levels of an AI-Pilled Organization
A field guide: six levels, four diagnostic questions, and how to climb them.
For years, every conversation in autonomous vehicles got more honest the moment someone said the word “Level.” Cruise control was not autonomy. Lane keeping was not autonomy. Driver assistance was not the same thing as self-driving. The levels mattered because they forced precision on a word that companies were quietly stretching to mean whatever served the press release.
We are now living through the same moment with AI-native organizations. The phrase has become a personality trait. Founders wear it. CEOs invoke it on earnings calls. Job descriptions demand it. And like “self-driving” in 2018, it means almost nothing.
It is being used to describe a company where employees use ChatGPT to summarize meetings and a company where agents query systems of record, take bounded action, and propagate workflows across teams. Both call themselves AI-forward. They are not operating at the same level.
This essay borrows the autonomous vehicle playbook to define six levels of an AI-pilled organization, anchored to four diagnostic questions you can actually answer about your own company. It grounds each level in public behavior from companies that have walked this path — Ramp, Klarna, Shopify, JPMorgan, and Goldman Sachs — including where they succeeded and where they reversed course.
I write this as someone who has spent the last two years building agentic infrastructure inside a regulated bank, advising executives on AI strategy, and watching from the inside how companies actually transition — or fail to transition — between levels. The reader who treats it as a mirror will find it uncomfortable. The reader who treats it as a map will find it useful.
The Numbers That Frame The Conversation
of agent pilots never reach production.
Anaconda / Forrester, 2026of organizations have one agent in production.
2026 State of AI Agentscite integration as the primary blocker.
Anthropic / Arcade, 2026of successful agent teams have a named owner.
Digital Applied, 2026The numbers tell the story before the framework does. Agent capability is no longer the constraint. Integration, governance, and ownership are. The companies crossing the chasm have, almost without exception, made deliberate choices about substrate before they made bold choices about org charts.
The Diagnostic
The four questions that actually tell you where you are.
What can AI see?
Is the work of your company legible to a machine, or does it live in someone’s head, in undocumented meetings, and in SaaS tools the AI cannot read? This is the question of substrate.
What can AI do?
Can it act on systems of record — open PRs, update CRMs, reconcile invoices, draft customer communications — or can it only summarize what humans already wrote down?
Who can extend the system?
Are non-engineers shipping production internal tools, or is every workflow held together by a few power users whose work walks out the door when they leave?
How has the organization changed?
Or are you running 2023’s org chart with better autocomplete? This is the question of structural commitment.
Most companies cannot answer these honestly because they have never tried. They speak in aggregates — “80% of our employees use AI weekly” — that are simultaneously true and meaningless. The four questions force you down to behavior.
A real company rarely answers all four questions at the same level. The asymmetry is the diagnostic.
The Six Levels
AI as theater
The CEO gives an excellent speech. There is a slide. There is a Head of AI. There is a Slack channel where people post impressive ChatGPT screenshots. The company still runs on the same meetings, reporting lines, and hiring plans it ran on in 2022.
Hard test: can AI complete any recurring business process end-to-end?Personal productivity
AI is a private tool. Saved prompts. Scratch files. A prompt file on someone’s laptop. The work can be brilliant, but when the brilliant person leaves, their work walks out the door.
Hard test: if your best AI user resigned tomorrow, would their workflow remain?Team workflow
Sales has prospecting agents. Support has tier-1 triage. Engineering has automated code review. Each function is more productive, and each has built its own private stack.
Hard test: does the workflow cross team boundaries?Organizational infrastructure
The organization itself is queryable. Core systems of record are exposed through well-defined APIs and MCP servers. Agents can act on them, not just observe them.
Hard test: can an agent answer across systems without convening a meeting?Compounding operating system
The organization stops being a place where humans work with AI tools and starts being a system that learns. Skills propagate. Duplicate efforts disappear. Internal tools become agentic workflows.
Hard test: show a workflow that improved because the system learned from prior runs.Self-driving organization
L5 does not exist yet. It is an organization where core operating loops sense reality, diagnose issues, initiate work, execute within delegated authority, update shared memory, and improve future behavior.
Hard test: what did the company notice, decide, act on, and learn without a human initiating it?The Field Cases
What the climb actually looks like.
Ramp: what an actual climb looks like
Ramp is instructive because the climb was not theatrical. Eric Glyman and Karim Atiyeh did not announce a transformation. They built one. Ramp runs more than 300 internal Notion Agents in production, including a Product Q&A Oracle, Sales Feedback Categorizer, and Referral Bonus Roy.
In 2025, Ramp shipped over 500 features, hit $1B in revenue, and did it with 25 PMs. Its internal coding agent, Ramp Inspect, now accounts for over 50% of production PRs merged weekly. Its Procurement fleet can triage requests, source vendors, review contract terms, and handle compliance.
But the climb is not the agent count. In 2024, Ramp made the deliberate decision to consolidate work into a single legible substrate before deploying agents. By early 2026, after hundreds of specialized agents, it consolidated around one unified agent with thousands of skills through Omnichat.
What to copy: substrate before agents, named owner before announcements, public usage data, and the discipline to consolidate when sprawl becomes the bottleneck.
Klarna: the cautionary tale every board now references
Klarna’s OpenAI-powered customer service assistant handled 2.3 million conversations in its first month — work the company described as equivalent to 700 full-time agents. Headcount fell, revenue grew, compensation rose, and Klarna IPO’d in September 2025 at a $19.65B valuation.
Then, within weeks of the IPO, Sebastian Siemiatkowski admitted the AI-first strategy had produced lower quality outcomes. Klarna began rehiring human agents in an “Uber-type setup.” Customer satisfaction had deteriorated on complex interactions.
The AI handled volume and the average case. It failed on the tail: emotionally charged, multi-step, edge-case interactions where complex resolution matters. The reversal cost more than the original transformation.
What to avoid: confusing high-volume capability with full-spectrum capability, and announcing the climb before you have finished it.
Shopify: culture as the substrate
In April 2025, Tobi Lütke published the memo: “Reflexive AI usage is now a baseline expectation at Shopify.” The more important rule came later: before asking for more headcount, teams must demonstrate why they cannot get the work done using AI.
Shopify gave everyone access to every tool, not just engineering. Cursor adoption grew fastest in support and revenue. AI usage became part of peer feedback, performance reviews, hiring decisions, and 360s.
Shopify sits between L2 and L3 on many technical axes, but its cultural substrate is closer to L4. That asymmetry is interesting. The cultural readiness is ahead of the technical readiness. Most enterprises have it the other way around.
JPMorgan & Goldman: what L3 looks like at scale in regulated finance
JPMorgan has over 450 AI use cases in production and a target of 1,000 by the end of 2026. The bank is spending approximately $19.8B on technology this year. Internally, the strategy is described as building an AI factory: lowering the marginal cost of the next AI application toward zero.
Goldman moved aggressively on a different axis by deploying Devin across its 12,000-strong developer workforce. CIO Marco Argenti described the path plainly: first the assistant talks like another Goldman employee; then it starts to do things like a Goldman employee.
For regulated banks, the unique constraint is governance. The L3 leap in banking happens at the speed of the security and audit substrate, not at the speed of the model.
What to copy: the AI factory frame. Treat infrastructure as the unit of investment, not use cases. Make the marginal cost of the 451st application near zero.
The Climb
Where companies stall — and how to leap.
L0 → L1
The stall: procurement and security treat AI as a vendor decision.
The leap: centralize the model and identity layer. Let high-value use cases come from anywhere.
L1 → L2
The stall: each team buys its own tool and knowledge stays personal.
The leap: centralize the plumbing, decentralize the workflows, and make adoption visible.
L2 → L3
The stall: systems of record are not addressable; no identity, audit, or policy layer exists.
The leap: treat substrate as infrastructure to acquire, not invent. Name an owner with budget authority.
L3 → L4
The stall: agents act, but the organization does not learn. Every run starts from zero.
The leap: take one end-to-end loop to L4 before broadening. Then consolidate.
L4 → L5
The stall: humans still do all the noticing.
The leap: define authority by reversibility, not hierarchy. Escalation, not approval, becomes control.
The Leapfrog Moves That Work
Skip L1 with shared infrastructure
Team workflows and shared context avoid the private prompt-library debt of personal productivity.
Buy the substrate, build the workflows
MCP-compatible vendors, identity, observability, and policy layers should be infrastructure, not bespoke art projects.
Hire for L3
Promote people who build tools that automate other people’s work, not only their own.
Instrument before you automate
L4 is impossible without observability of agent behavior, cost, quality, and failure modes.
Take one loop deep
Depth in one workflow teaches more than breadth across ten shallow pilots.
Treat governance as the unlock
What AI is allowed to do matters as much as what it can do.
The Checklist
A practical diagnostic for your next leadership offsite.
Substrate: what can AI see?
- Core systems of record are MCP-accessible.
- There is a single, queryable knowledge layer.
- Meeting, document, and code capture flow into the same substrate.
Authority: what can AI do?
- One recurring process runs end-to-end through agents in production.
- Every agent has defined identity, scope, and audit trail.
- The organization can stop an agent instantly.
- Authority is calibrated by reversibility.
Distribution: who can extend it?
- A non-engineer shipped a production internal tool last quarter.
- An internal skills marketplace lets workflows move across functions.
- AI proficiency is part of hiring, reviews, and promotion.
Structural commitment: how has the org changed?
- There is a named, budgeted owner for agentic infrastructure.
- Hiring requires showing why AI cannot do the work.
- The org shape has visibly changed in the last 18 months.
- Compensation rewards AI-leveraged output, not headcount.
The L4 tests
- One workflow improved because the system learned, not because a human improved it.
- There is unified observability across production agents.
- There is compaction discipline: retire and consolidate agents, not only create them.
The Closing
Steve Blank once said that a startup is not a small version of a large company. It is something different in kind — built around discovery, not execution. The same is true here. An AI-pilled company is not a small AI-assisted version of an old company. It is an organization rebuilt around a new operating model.
What I want you to take from this is not a verdict on your company. It is a diagnostic and a direction. Use the four questions to find your asymmetry. Use the stall map to name your specific gravity well. Use the checklist to make the next intervention concrete.
Then pick one end-to-end loop, take it all the way to L4, and let the lessons compound.
If you have read this far, you already know which level you are at. The honest answer is rarely the flattering one. The good news is that the climb is mappable, the moves are known, and the playbook is now in the open.