The Knowledge Nobody Writes Down

Martin Šrubař · April 8, 2026

A professional cyclist racing confidently next to someone wobbling on a bike while reading a Bike Manual
I’ve been using both Claude and Gemini for coding work — building an app that processes Microsoft Word documents. In chat mode, the two models performed comparably at the general coding tasks. One was better at some things, the other at others. But one step required sophisticated manipulation of raw OOXML — the underlying markup inside a Word document — and Gemini was consistently better at it. Not marginally. Noticeably. Across a range of related tasks, it handled the well-defined, specialist-knowledge challenge more precisely. I’ve since seen the same pattern with other complex but tightly scoped problems. When the task is clearly defined and the domain knowledge is what matters, Gemini often has the edge.

Then I switched to Claude Code and the picture inverted. Claude could successfully complete far larger, more ambitious tasks than Gemini with similar tooling. Multi-file projects, ambiguous requirements, error recovery, the orchestration of many steps toward a goal. Not because it was smarter at any individual step, but because it could hold the whole thing together.

This gap kept nagging at me. If Gemini can be more precisely correct on a specialist task, why does Claude Code pull ahead on larger work? The obvious answer is better infrastructure — tool use, file access, iterative execution. But that’s descriptive, not explanatory. What is that infrastructure actually providing?


To answer that, I think we need to distinguish between three kinds of knowledge that are routinely conflated.

Layer 1: Domain knowledge. The OOXML specification. The anatomy of the human knee. The case law around contract disputes. The syntax of Python. This is the knowledge that lives in textbooks, documentation, and reference materials. It transfers well in written form — you can study it, memorise it, look it up. LLMs are very good at this. It’s well-represented in training data, and it’s the kind of knowledge where a well-defined question has a documentable answer. This is what formal education primarily provides, and it’s where Gemini’s advantage on specialist tasks comes from: precise domain knowledge, deployed accurately.

Layer 2: Practitioner knowledge. This is what separates the fresh graduate from the person with thirty years of experience — and it’s not thirty years of additional reading.

A newly qualified surgeon and a veteran with three decades of practice both know anatomy. They both passed the same exams. The difference is that the veteran knows which tissue feels wrong under their fingers before there’s any visible indication of a problem. They know when a situation is about to deteriorate — not from the monitors, but from a pattern they recognise across thousands of previous cases that they couldn’t fully describe if you asked them. They know how to adapt a procedure mid-operation when the textbook approach meets the specific reality of this patient’s body, which is never quite what the textbook assumed.

This is the knowledge that apprentices learn from masters. Not the domain knowledge — that’s in the curriculum — but the practical wisdom accumulated through years of doing, watching what works, noticing patterns, and internalising lessons that were never explicitly articulated. It’s what a junior developer acquires after years of working alongside senior colleagues: not how JavaScript works (that’s Layer 1), but when to refactor and when to leave it alone, which architectural patterns will cause pain at scale, how to read a codebase and know where the problems are hiding.

And here’s the critical point: this knowledge mostly isn’t in books. Not simply because it’s hard to write down, but for two deeper reasons.

The first is a translation problem. Some knowledge exists in one representation — embodied, sensory, motor — and cannot be losslessly converted into another — propositional, textual. The mathematical formulas that describe bicycle balance are accurate. A control engineer could derive them and build a balancing robot. But you cannot internalise those formulas as the ability to ride. And the experienced cyclist cannot derive the formulas from their riding. The two representations are incommensurable — not because either is incomplete, but because they serve different cognitive functions. The formula describes the physics. The skill is the physics, enacted through a body. Writing down the formula doesn’t transfer the skill, and possessing the skill doesn’t surface the formula.

The second is what I’d call the instinct problem. Much of expert judgment is pattern recognition trained on high-noise, low-signal data over long periods. The experienced surgeon’s “this doesn’t feel right” isn’t one learnable rule. It’s the output of their actual neural network — the biological one — trained across thousands of cases on subtle indicators that individually mean almost nothing but collectively form a signal. You can’t write it down because there’s no single “it” to write. It’s a statistical pattern distributed across a career’s worth of noisy observations, and it may not decompose into articulable rules any more than a trained neural network’s weights decompose into human-readable logic.

These two mechanisms — knowledge that doesn’t translate across representations, and pattern recognition that doesn’t decompose into rules — explain why the codifiability curve is so steep. The first 80% of practitioner knowledge consists of things that can be stated propositionally, even if nobody has bothered to. The remaining 20% is genuinely hard, split between translation-resistant embodied knowledge and instinct-like pattern recognition built from noisy signals over time.

There is a whole field — Cognitive Task Analysis — dedicated to extracting this kind of knowledge from experts. It works, imperfectly. Research has found that experts make roughly 70% of their decisions unconsciously, and that even when experts deliberately try to teach what they know, they omit a substantial proportion of both the steps and the reasoning. The knowledge is there — they use it every time — but it doesn’t surface through normal explanation. Structured elicitation can recover some of it, but the process is expensive, time-consuming, and inevitably lossy.

Layer 3: Meta-procedural knowledge. This is the layer beneath both domain expertise and practitioner wisdom: the basic ability to approach complex, unfamiliar tasks effectively. How to break something down into manageable parts. How to recognise which dependencies matter. How to sequence actions so that earlier steps don’t undermine later ones. How to estimate effort, allocate attention, and adapt when reality diverges from plan. Cognitive science has terms for aspects of this — executive function, metacognition, heuristic reasoning — but these are studied primarily in developmental and clinical contexts (how children acquire planning ability, what happens when brain injury disrupts it), not in the context of systematically encoding these capabilities for AI.

This isn’t a specialist skill. Everyone who functions competently as an adult possesses it to some degree. And almost none of it has been captured or documented, because nobody thinks of it as knowledge at all.

Some practitioner knowledge was indeed once conscious — the surgeon who deliberately learned a new technique, the developer who intentionally studied a new framework. But a great deal of it was acquired through the same unconscious absorption that characterises language learning. The surgeon didn’t consciously decide to learn what abnormal tissue feels like — that sensitivity accumulated through thousands of procedures without ever passing through a deliberate learning phase. The senior developer didn’t intentionally develop an instinct for bad architecture — it compiled from exposure to failures that were never explicitly catalogued.

The difference between Layer 2 and Layer 3 is not that one was consciously learned and the other wasn’t. Much of Layer 2 was just as unconsciously acquired. The difference is scope: Layer 2 is domain-specific — tied to surgery, or coding, or carpentry. Layer 3 is universal in the sense that it applies across all domains — but it is not evenly distributed. Some people have a much stronger Layer 3 than others, and that variation may matter more than we typically acknowledge. The ability to approach unfamiliar complex tasks effectively — to decompose, sequence, prioritise, adapt — is arguably what distinguishes highly effective people from less effective ones, regardless of intelligence. Much like the Gemini vs Claude Code observation: raw cognitive power on well-defined tasks is one thing; the ability to orchestrate that power into reliable complex outcomes is another, and the two don’t always go together. But Layer 3 has no established domain, no credentialed expert class, and no systematic body of practice — which makes it uniquely difficult to study or capture.


Now look at what Claude Code actually has that makes it effective, and you can see both Layer 2 and Layer 3 being provided simultaneously, though the distinction isn’t made explicit.

What Claude Code’s procedures are actually encoding, across both layers, is a particular kind of knowledge: things that have obvious solutions if you think about them, but that the trick is knowing to think about them in the first place. Validate after creating a file. Read the documentation before starting. Start with the hardest part, not the easiest. These aren’t insights — any developer who hears them would say “obviously.” But “obviously” is exactly the problem. The knowledge feels trivial once stated, which is why nobody states it, which is why AI doesn’t have it.

In humans, this knowledge compiles through experience into something that looks like instinct. You don’t think “I should validate” — you just validate. It was trained by years of feedback loops: you skipped validation, something broke, you learned. Hundreds of small corrections from low-signal, high-noise experience, accumulated over time into reliable patterns that fire below conscious awareness. What’s remarkable is that this kind of knowledge — built in humans through experiential neural-network training that nobody could teach because each lesson is too small and too obvious to articulate — turns out to be substantially encodable in text. Not perfectly. The propositional part — knowledge that can be stated as rules and steps — transfers well. The translation-resistant part — embodied intuition, pattern recognition trained on noisy signals over years — remains with the human. That gap explains many of the moments where AI produces work that is technically correct but somehow wrong in ways any experienced person would immediately recognise. But the codifiable portion is larger than you’d expect, and codifying it produces genuine capability gains.


What strikes me about the current AI landscape is that the Layer 2 and Layer 3 problem is being worked on intensively — but under a framing that may be missing the deeper issue.

There is no shortage of research on agentic AI. The computer science literature is full of work on planning, chain-of-thought reasoning, tool-use orchestration, self-reflection — hundreds of papers on how to get LLMs to string actions together effectively. Frameworks like ReAct, SWE-agent, and countless scaffolding architectures all address, in engineering terms, what I’m calling the Layer 3 problem. And inside AI companies, the work goes further: every team building agentic systems is, in effect, surfacing and codifying the invisible competence layer that separates “can do the individual steps” from “can get the whole thing done.”

But almost all of this work treats the problem as an engineering challenge: how do we prompt or scaffold the model to produce better action sequences? What’s largely missing is the epistemological question underneath: what is the nature of the knowledge that competent humans deploy when they approach complex tasks, why does it resist articulation, and what are the actual prerequisites for capturing it? The engineering approach builds scaffolding that works without fully understanding what it’s scaffolding around. The result is solutions that are effective but unprincipled — we know that certain prompting patterns improve planning performance, but we don’t have a clear theory of why, grounded in how this knowledge works in humans.

This matters because without that conceptual framework, the scaffolding work is largely ad hoc. Each team discovers Layer 2 and Layer 3 knowledge independently, encodes it in their own way, and has no systematic basis for knowing what they’ve missed. Understanding this problem clearly — as a knowledge capture problem with identifiable layers and distinct prerequisites — could accelerate not just AI development but the broader question of how human competence works, how it transfers, and why so much of it remains invisible to the people who possess it.


There’s a question underneath all of this that I think is more fundamental than any specific question about AI architecture or training methods.

Is all knowledge codifiable in some form?

The three layers suggest the answer is yes — in principle — but that the prerequisites for capturing each layer are radically different.

Layer 1 requires only language. Write it down, it transfers. The OOXML specification is a document. Anatomy is in textbooks. This is the easy case, and it’s why LLMs are good at domain knowledge: it was already in their native medium.

Layer 2 requires the right elicitation methods — and sometimes the right sensory inputs. You can interview the experienced surgeon and recover some of their decision-making through Cognitive Task Analysis. But the knowledge of what abnormal tissue feels like under their fingers isn’t a failure of description — it’s a different modality entirely. To capture it, you wouldn’t need better questions. You’d need instrumented gloves, force sensors, haptic recording. The prerequisite isn’t finding the right words. It’s having the right sensory apparatus to observe what’s actually happening. The bike-riding robot makes this concrete: the physics of balance is codifiable from first principles, but the experienced cyclist’s ability to handle a cobblestone street in crosswind is built from vestibular, proprioceptive, and haptic feedback loops that were never represented as propositions in the cyclist’s mind. You could capture them — but you’d need to instrument the cyclist, not interview them.

Layer 3 requires something we don’t yet have: a method for making visible the cognitive scaffolding that underlies effective action across all domains. The knowledge is demonstrably real — people use it successfully every day, and the variation in how well they use it is one of the strongest predictors of who gets things done and who doesn’t. But we lack even the basic vocabulary for describing it, let alone systematic methods for capturing it. We’re in the position of pre-scientific medicine: practitioners who can heal but can’t explain how, because the conceptual framework for understanding what they’re doing hasn’t been developed yet.

In the meantime, my homework is simpler. I have two sons — five and eleven — and I’m going to start watching more carefully how they approach problems that are difficult for them. How they break things down, where they get stuck, what kind of help actually helps. Because if the argument in this post is right, then somewhere in the gap between my five-year-old’s struggles and my eleven-year-old’s growing competence is the knowledge that AI most needs and that nobody has figured out how to capture yet.

You can send comments to this email.

Twitter, Facebook