The Amnesia Tax — The Philosophical Ledger

Every evening, your best credit officer goes home. The exception she granted this afternoon — the borrower the model wanted to decline, the one she approved anyway because she has seen this pattern survive three downturns — goes home with her. Tomorrow she will make forty more calls like it. Your AI will have learned nothing from any of them. It captured the answer she typed. It did not capture the judgment behind it, and it will never see whether she was right.

That is the real problem, and it is not the one we keep talking about. We argue about which model is smartest. Meanwhile the only asset that could actually set a firm apart is being generated all day, by people, under real stakes — and then quietly discarded at close of business.

Call it what it is. Every failure mode we worry about — commoditised models, stalled pilots, work migrating to whoever owns the data — is one mistake wearing different masks: treating intelligence as a rented capability instead of an institutional learning system. Rent it and you get exactly what rent buys — access, not ownership; capability, not memory. You pay full price for the intelligence and discard the judgment that would have made it yours. That recurring, invisible loss is the amnesia tax, and almost every enterprise is paying it in full. Everything below ladders back to that one error — and to its fix.

The whole idea in 60 seconds

Models get cheaper and more alike every week, so rented intelligence can't be your edge — your competitor rents the same thing tomorrow. The one thing that's truly yours is the judgment your people apply to your decisions, with outcomes only you can see. Most firms throw it away: the system records the answer and forgets the reasoning, so nothing accumulates. That waste is the amnesia tax.

The fix is a loop that captures each decision — the call, the reason, the result — and feeds it back, so the system gets better at your work. And in a bank there's a gift hiding in the catch: the exact record a regulator demands is the exact record a model can learn from. So governance isn't the brake on this. It's what makes the loop run. Build the loop, govern it, and your AI compounds judgment instead of renting intelligence.

Nadella's essay last weekend got close to this. His argument is right, and I have been making a rougher version of it for two years. The shift we are in is not another layer of tooling over human work; it is the first time we can close a genuine loop between people and systems. Every firm must build two kinds of capital — human (the judgment, relationships and pattern recognition of its people) and token (the AI capability it builds and owns) — and the two compound each other. Without human direction, he writes, you have "compute running in circles."

When the person who sells more frontier-model access than almost anyone alive tells you the model is not the moat, the argument is settled. So take this as a friendly amendment. Because the essay describes the two capitals and asserts they compound — but it leaves out the mechanism. What, exactly, converts a person's judgment into a capability the firm owns? Name that mechanism and the whole picture sharpens, the architecture follows, and the missing word — the one I will get to — stops sounding like compliance and starts sounding like the drivetrain.

The model melts — and the whole layer can migrate

Start with why this is urgent, because the numbers are no longer arguable.

The one thing everyone wants to compete on — access to the best model — is the thing that loses value fastest. Stanford's 2025 AI Index found the cost of a fixed capability — GPT-3.5-grade — fell roughly 280-fold in about eighteen months; J.P. Morgan tracked a 99.7 percent collapse on the same basis, from $37.50 per million tokens in early 2023 to fourteen cents two years on. Epoch AI puts the rate anywhere from nine to nine hundred times a year, depending heavily on the task. The frontier itself isn't getting cheaper so neatly — inference-time compute means the best answers can cost more, not less — but yesterday's frontier becomes today's commodity on a brutal schedule, and that is the curve your moat cannot sit on.

99.7% Fall in the price of a fixed capability tier in two years (J.P. Morgan). Anything you can rent is a depreciating asset. Your edge has to live somewhere the price curve can't reach it.

Read that as a strategist, not an engineer. Anything you can rent from an API is shared intelligence — your competitor can rent the identical thing tomorrow, cheaper. Models commoditize precisely because they train on the world's shared knowledge. So your durable edge cannot come from shared knowledge. It can only come from knowledge that exists nowhere but inside your own operations: the decisions your people make, under your stakes, with outcomes only you can see — private by construction, because no provider has it.

And there is a sharper version of the threat than falling prices, one the investors have named: layer migration. In software you were killed by a competitor. In AI the work itself moves — into the model, into an open-weight alternative, into the customer's data platform, into an agent runtime, into the device — the moment any variable shifts enough. But notice what migrates. What gets absorbed are the prosthetics: the retrieval pipelines, the parsers, the prompt scaffolds that existed only to paper over a model's weaknesses. When the model improves, it reclaims them.

Captured judgment is the one asset that is not a prosthetic. The model cannot reach up and absorb it, because it cannot generate it — the credit officer's exception, with her reason and the repayment outcome six months later, is worth more than a million pages of the open web precisely because it is true, it is yours, and it is scored by reality. With one caveat worth stating plainly: a provider can still capture you sideways — through platform lock-in, hosted memory, telemetry, eval tooling, the agent runtime — if you let your judgment accumulate on someone else's layer. No provider can absorb your judgment unless you hand them the memory it lives in. Everything else on your stack is rented from a curve that's moving; the governed knowledge layer is the only floor that doesn't migrate. The melting asset is the model; the appreciating one is captured judgment.

The 95 percent is judgment evaporating

Now the number everyone misreads. MIT's NANDA initiative studied 300 deployments across 150 executive interviews and 350 employee surveys. The finding that travelled: 95 percent of enterprise generative-AI pilots delivered no measurable impact on the P&L, against $30–40 billion in spend. Only about 5 percent produced real value.

That got read as proof AI is overhyped. It is the opposite. MIT was explicit that the divide was not driven by model quality and not by regulation. It was driven by integration: the systems that failed were never embedded in the workflow and never built to learn from it — they didn't retain feedback, adapt to context, or improve over time. Put plainly, they bought tools that sit beside the work and forget it, so judgment had nowhere to accumulate. The 95 percent did not fail at AI; they failed to turn their own operations into a learning system — they paid the amnesia tax in full. The 5 percent built a loop. Tellingly, systems built with external partners reached production about twice as often as internal builds — not because the partners had better models, but because they brought the discipline of building the loop instead of the demo.

The missing word, and why it isn't a brake

Here the operator's view parts from the platform's. Nadella describes the loop in clean terms: private evaluations against outcomes that matter; private environments where models grow stronger on real traces; a knowledge base that makes memory queryable. Architecturally, exactly right. But read "real traces from inside the organization" as the Chief Risk Officer of a bank, and the abstraction stops being clean.

Inside a regulated institution, a trace is not neutral telemetry. It is customer data with residency and consent constraints; a credit decision carrying a legal duty to be explained; an action a named human is accountable for. You cannot just let a model improve on that in the background. An ungoverned learning loop in a bank is not an asset — it is an incident waiting for an audit. The market already prices this: Gartner forecasts that more than 40 percent of agentic AI projects will be cancelled by the end of 2027, and names inadequate risk controls as one of the three causes. The ungoverned loop doesn't compound slowly. It gets switched off before it compounds at all.

So the missing word is governance — and here is the turn that makes it compelling instead of dull. Ask what you must capture to satisfy a regulator: for every consequential action, which agent acted, under whose authority, on what data, with what outcome, and why. Now ask what you must capture to make the loop learn: for every action, the context, the decision, and the outcome, tied to the metric that matters. It is the same event. Not, to be precise, the identical record — audit asks whether an action was authorised, explainable and compliant; learning asks whether it was useful, predictive and generalisable. But with the right schema, the audit trail becomes the governed raw material from which the training signal is safely derived. The discipline a regulator forces on you — log everything, attribute everything, tie every action to an identity, a policy and an outcome — is exactly the discipline that yields high-grade learning signal. The 95 percent failed the regulator and starved the loop in the same motion, for the same reason: they captured nothing worth keeping.

This is the inversion. Governance is not the brake on the machine. It is the drivetrain — the thing that transmits a human's judgment into the firm's capability, and the only thing that lets the loop run on regulated data without being shut down. A governor, in the mechanical sense, was the device Watt put on the steam engine: not a brake on the Industrial Revolution, but the part that let an engine run hot, continuously, without tearing itself apart. Token capital needs a governor in exactly that sense.

And there is a deeper reason it cannot be a brake bolted on at the end. A learning loop compounds whatever it captures — including bias, bad habits, informal exceptions, and institutional politics. Capture indiscriminately and you will industrialise your prejudices as efficiently as your wisdom. Governance is what decides whether a firm compounds judgment or prejudice. It isn't the brake. It's the steering.

Double-entry. One captured event, posted to two ledgers — the audit record the regulator needs, and the training signal derived from it under consent and policy. Capture once; derive both. That is why governance and compounding are the same asset, not a trade-off.

The capital stack

Nadella named two capitals. There are five.

Human and token capital are the two everyone is discussing. But the firms that compound have quietly built a fuller stack — and the layer nobody names is the one that decides whether the rest are allowed to compound at all.

Capital	Old world	AI-native firm
Financial	money	compute budget
Human	expertise	expert judgment
Data	records	outcome-labelled traces
Token	model usage	reusable AI capability
Governance	controls	permission to compound

In regulated AI, governance capital is what gives token capital permission to exist.

How to actually build the loop

Philosophy is cheap. Here is the whole loop as one picture — then as seven moves a team could start on Monday, each with the concrete build, a worked example from the same credit exception, and a test you can hold yourself to. Together they make one asset worth naming: a sovereign memory layer you control — your traces, schemas, outcome labels, evals and lineage, kept portable across models. The tooling is generic on purpose; what matters is the role each part plays, not the brand.

One machine, not seven separate ideas. A decision is captured, remembered, scored against what actually happened, and fed back so the next one is better — the teal return arc is the compounding. The seven moves below are simply the spokes of this wheel. Governance is the field it all runs inside: the same record that satisfies the regulator is the one the loop learns from.

Capture the decision, not just the answer

Most stacks log the output — the field the system filled in — and throw away the reasoning and, fatally, the result. The atomic unit you should be storing is the whole decision: the case, who or what decided it, the authority it ran under, the data it drew on, the action taken, and — when it lands — the outcome. The reason and the outcome are the judgment. Everything else is just the receipt.

Build

Instrument capture at the orchestration layer, not inside each app, so every consequential action emits one structured decision record. The outcome field is written later, when reality reports back, and joined to the original record by a stable id.

case LN-44192 · SME term loan
decider officer:a.sokhn (model recommended: DECLINE)
action GRANT exception · authority: credit-policy §7.2
reason "cyclical sector; 3-yr cash-flow recovery pattern"
outcome [t+6mo] performing · 0 days past due

Worked example. Today the officer's override vanishes into a status field. Captured properly, the override and her reason are recorded now, and six months later the repayment outcome is stitched on. That single row is the most valuable thing your AI saw all day.

Test: pick any decision from last week. Can you see why it was made and how it turned out, from one record? If not, you are logging answers and discarding judgment.

Make one record serve audit and learning

This is the inversion, turned into a build instruction. Don't run two pipelines — one for compliance, one for data science. Run one. The record from step 01 holds the raw material for both an auditor and a model; the failure mode is letting two teams design two half-records that satisfy neither — or pretending an audit log can be poured into training untouched. What stands between those two mistakes is a question most firms never make explicit: learning rights. Not who owns the data — who is permitted to learn from a decision: the customer, the bank, the vendor, the regulator, the model provider. Answer it in the schema and the loop is yours; leave it implicit and you have built, in your CRO's words, a beautiful illegal one.

Build

Agree a single decision-record schema that your risk function and your data-science function both sign off on — and make it carry the fields a CRO will demand before any model touches it: consent basis, purpose limitation, retention window, a training-eligibility flag, PII tokenisation, residency, and erasure/suppression logic. Write it once to tamper-evident, append-only storage. The audit trail is not automatically the training signal — but with that schema, it becomes the governed raw material the signal is safely derived from, with training use gated by consent and eligibility rather than bolted on afterwards. One capture; two readers.

One schema, two readers. Compliance stops being a tax the moment its output is also your training set.

Test: can the same captured event satisfy a regulator and feed your eval pipeline — with training use gated by consent and eligibility, not bolted on after? If the two needs live in two systems, you've built the cost of governance twice and the benefit once.

Turn every correction into a test

Public benchmarks measure whether a model is good at being a model. They say nothing about whether your system is getting better at the decisions that move your money. The fix is a private evaluation set built from your own graded cases — and the cheapest source of new cases is the corrections your experts already make.

Build

Have domain experts assemble a few hundred real, outcome-labelled cases as a starting "golden set." Define metrics that map to money and risk: accuracy against expert ground truth, override rate, rework rate, time-to-decision. Wire the suite into your release pipeline so every model, prompt, or retrieval change must clear it before promotion. Then close the loop: every human override becomes a new graded case automatically.

Corrections compound. The credit officer's exception becomes a permanent test case. Your evaluation set is your firm's judgment, frozen and runnable — and it grows every time someone disagrees with the machine.

Test: a model that climbs public leaderboards while your real-outcome scores stay flat should never ship. Does anything in your release process actually stop it? If the vendor's benchmark is your gate, you don't have one.

Keep the model swappable

If swapping your model provider would erase your accumulated expertise, you don't have a knowledge layer — you have a dependency, on a melting asset. The sharpest test of ownership is whether you can change the engine and keep the veteran.

Build

Put a single model-access layer between every application and any provider — nothing calls a vendor SDK directly. Keep prompts, evaluation sets, memory and knowledge in your stores, outside the weights. The model becomes a stateless, replaceable component; everything that encodes your judgment sits beneath it and survives the swap.

The engine is replaceable; the veteran is not. When model B beats model A next quarter — and it will — you switch in an afternoon and lose nothing.

Test: run your full private eval suite against two providers this quarter. If outcome scores hold and no accumulated context is lost, you own the asset. If they crater, you've been renting your moat.

Give the memory a structure

Captured judgment that piles up unstructured is a swamp, not an asset. To be queryable — and, in a regulated firm, explainable — it needs a spine: shared meaning, defined relationships, traceable lineage. Otherwise retrieval is just nearest-neighbour guessing, and you can't tell a regulator why the system said what it said.

Build

Structure the captured record into a knowledge layer: a controlled business vocabulary so terms mean one thing, a domain ontology so relationships are explicit, and a knowledge graph with lineage so every answer traces to an authoritative source. Ground retrieval in that graph, not in raw text similarity, so answers arrive with citations.

From swamp to spine. Structure is what makes institutional memory queryable and an answer defensible.

Test: can a brand-new agent answer a domain question and cite the authoritative source? If the only place that knowledge lives is the model's weights, it is neither queryable nor auditable — it is exhaust.

Gate the action; earn the autonomy

"Let the model grow stronger on real traces" is only safe if the model can't act on the world without passing a checkpoint, and if autonomy is something a task earns rather than something you switch on. This is what lets you train on real, regulated data without the loop becoming the incident.

Build

Check every action against policy before it executes, at a single control point. Give each agent a verifiable identity and a scoped mandate — which tools, which data, which actions, what limits — enforced at runtime, not just at design time. Grant autonomy per task and risk tier, and widen it only as the measured override rate for that task class falls below threshold. Version every model and policy so you can roll back in seconds.

Autonomy is measured, not declared. The gate makes the loop safe to run; the override rate decides how much rope each task has earned.

Test: for any task the system runs autonomously, can you state its risk tier, the override rate that earned its autonomy, and the switch that retires it? Autonomy you can't justify with a number is autonomy you haven't earned.

Put people where judgment is scarce — and capture what they do

The loop captures judgment; it does not replace the people who supply it. Gartner's own framing of its cancellation prediction is that today's models can't pursue complex goals autonomously over time — which makes good judgment more indispensable, not less. The job is to spend it where it's scarce, and to stop letting it evaporate.

Build

Map each workflow into pattern-execution (automate it) and genuine judgment (route it to a human). Then make the human-in-the-loop the labelling step: every expert call feeds the capture and the test set, so your best people generate training signal simply by doing their work. Name a loop owner accountable for outcomes, not uptime — and track judgment yield, the reusable improvement you get per expert correction, so human-in-the-loop reads as capital formation rather than cost.

The human-in-the-loop is the labelling step. Route judgment to people — then capture it, so the same call doesn't have to be made from scratch tomorrow.

Test: when your best analyst corrects the system, does that correction make it permanently better — or does it evaporate? If it evaporates, you are paying experts to babysit amnesia.

Start Monday

You don't need all seven moves to begin. You need one captured decision.

Pick one decision your experts make every day — a credit exception, a pricing override, a fraud call. Capture three things each time: what the system recommended, what the human decided and why, and what happened in the end. Do nothing else for a month. Then sit down and read fifty of them in one go. You will see your experts' judgment written down for the first time — and you'll know exactly which spoke of the wheel to build next. The loop starts with one decision, not a platform.

Running a trading book

The one position that's long every curve

The investors have a metaphor for all this: building in AI is less like underwriting a software company and more like running a trading book — long some curves, short others, exposed to correlations that break exactly when they matter most. Most firms will read that as a mandate to stay fast and treat everything as disposable. Read it the other way. If every position you hold is short-dated, you don't have a business; you have a series of trades.

The governed knowledge layer is the one position that's long every curve at once. It appreciates whether capability rises, cost falls, latency drops, or work migrates — because it isn't a bet on any single variable. It's a bet on the accumulation of your own judgment. In a book full of expiring options, it's the only thing you get to hold.

That is the part the other two vantage points miss. The platform that sells the models and the capital that funds the layer have both arrived at the same verdict — the model is not the moat. Neither finishes the sentence. The operator does: the moat is captured judgment, the mechanism that captures it is governance, and it is the only asset migration cannot relocate.

The stable equilibrium

The firms that win will have captured what others threw away

Nadella closes on political economy, and he is right to. The world nobody should want is one where a few models absorb every industry's knowledge and capture all the returns — the AI re-run of the first globalization, where the GDP line held while whole regions were hollowed out. There is no societal permission for that.

But notice the mechanism that prevents it. What stops an industry's knowledge from being commoditized out from under it is not the goodwill of model providers. It is whether each firm owns a governed, sovereign learning loop — a knowledge layer that encodes its judgment in a form the firm controls, that survives a change of model, and that no provider can absorb. Sovereignty, in practice, is governance. The frontier ecosystem Nadella wants is, mechanically, a world full of governed knowledge layers.

So the optimistic ending and the governance argument are the same argument. The firms that reach the 5 percent won't be the ones renting the smartest model. They'll be the ones who, for years, captured the judgment their competitors let walk out the door each evening — and who governed that capture well enough to be allowed to keep running it.

The model is the engine. Human direction is the throttle. The governor is what lets the thing run hot for years without flying apart. Build that, and the loop is yours.

— G.S. · The Philosophical Ledger