The Structure Problem

Pawel Zimoch · ~12 min read

Aviation uses rigid phraseology. Surgical teams use checklists. Nuclear plant operators follow written procedures and confirm each step out loud.

This looks like bureaucracy. It's actually error correction.

These fields learned—through accidents, deaths, disasters—that unstructured processes fail unpredictably when stakes are high. The structure they developed isn't arbitrary overhead. It's what makes errors visible before they cascade into catastrophe.

AI agents are learning the same lesson now.

The Core Problem: Errors That Compound

AI models are remarkably capable. They reason, plan, write code, handle ambiguity. But most agent deployments fail—not dramatically, but quietly. The agent handles 80% of cases well enough, then silently corrupts state, makes decisions that compound into problems, or confidently does the wrong thing in ways that surface weeks later.

Impressive demos, disappointing production deployments. This pattern keeps repeating.

The reason: agents operate at machine speed, but the systems they operate on were designed for human-speed error correction. The weekly sync where someone catches a discrepancy. The senior person who reviews work. The customer who complains. These mechanisms are slow—but that's fine when the underlying process is also slow.

Agents break this equilibrium. A thousand decisions per hour can't be reviewed by a person who catches mistakes. Errors compound faster than anyone can catch them.

Shannon figured this out in 1948. His foundational insight: reliable communication over noisy channels requires encoding messages into discrete symbols with redundancy. Discretization creates structure that makes errors detectable. If a signal gets corrupted, you can still identify which symbol was intended. Without structure, errors are invisible until something breaks.

This applies beyond communication theory. It's the fundamental requirement for any reliable system operating in a noisy environment. High-reliability domains discovered it empirically. AI agents are discovering it the hard way.

For the theoretical foundation, see Why Reliable Systems Look the Way They Do.

What Structure Actually Means

When I say structure enables error correction, I mean something specific: structure constrains what states are reachable. Only certain configurations are valid. This constraint is what makes errors detectable.

Consider an agent that tries to cancel an order that's already shipped. With structure, the operation fails immediately—a shipped order can't be cancelled, that transition isn't defined. The agent made an error, but it got caught at the boundary. Without structure, the cancellation goes through. A refund gets issued. The package still arrives. The customer has their money back and the product. Someone notices weeks later during inventory audit, but by then the loss is real.

Both cases detect the error eventually. The difference is when: at the boundary before damage, or downstream after it's compounded.

Shannon discovered the same principle for communication. If all signals are valid, corruption looks like a different valid signal—you can't tell anything went wrong. But if you constrain to a subset, invalid signals stand out. The constraint creates detection capability. Aviation applies this to speech. "Cleared for takeoff, runway two-seven left" exists in a constrained vocabulary. The readback must match. Pilots can't ad-lib because ad-libbing escapes into unconstrained space where errors blend in.

For system state, structure means defining what's meaningful:

What entities exist: customers, orders, tickets—not arbitrary objects
What states are valid: an order is pending, confirmed, shipped, or cancelled—not "in progress"
What transitions are allowed: shipped can become delivered; cancelled cannot become shipped
What operations are available: you can add items to a pending order, not to a shipped one
What invariants hold: refund total can't exceed payment total

These constraints shrink the space of reachable states. The smaller that space, the more an invalid operation stands out. Errors get caught at the boundary—not downstream in an audit.

The Architecture That Follows

This suggests a specific architecture, borrowed from operating systems: the kernel pattern.

An OS kernel doesn't trust user programs. It can't—user code is arbitrary and unverifiable. Instead, the kernel provides a controlled interface. Programs request operations; the kernel validates and executes only what's permitted. The program might be buggy or malicious—doesn't matter. It can't corrupt the system because it never has direct access.

Agent systems should work the same way:

flowchart TB
    Agent["<b>Agent</b><br/>Interprets situation<br/>Proposes actions<br/>Fundamentally unverifiable"]
    Validation["<b>Validation Layer</b><br/>Checks against schema<br/>Enforces invariants<br/>Rejects invalid operations<br/>Small, auditable, verifiable"]
    State["<b>System State</b>"]

    Agent -->|"Proposed operations"| Validation
    Validation -->|"Validated operations"| State

The validation layer is where error detection happens. It's the structure that makes agent errors visible before they cause harm. The agent proposes; the system validates; only valid operations execute.

Why can't agents verify themselves? To check if an action is correct, the agent needs some specification to check against. If that specification is explicit and external—a schema, rules, a state machine—we're back to explicit structure. If the specification is implicit in the agent's reasoning, the agent uses the same process to verify that it used to generate. Errors in generation can replicate in verification.

External structure is how you get error detection. There's no shortcut.

For implementation details, see Building the Interface.

What Agents Actually Are

This reframes what agents do. They're interfaces to structure, not replacements for it.

Before agents, humans did translation work. A customer sends a rambling email; a support rep reads it, figures out the actual issue, categorizes it, and enters it into the system. The human translates unstructured input into structured data.

This translation is what agents excel at. They can read the rambling email and extract the issue. They can look at a receipt and identify the category. They can take natural language requests and map them to system operations.

But the system still needs to exist. The agent translates into structure. An agent without structure operates in a void where actions can't be validated, errors can't be detected, and state corrupts invisibly.

When the structure is rich enough—not just categories to translate into, but operations that compose, preconditions that validate, feedback when something fails—translation becomes programming. The agent reasons about which operation to invoke. It checks whether preconditions are met. It composes sequences to accomplish goals. It handles errors and adjusts.

This is exactly why coding agents work. Code already has structure: syntax that can be validated, types that constrain what's possible, tests that signal when something broke. The agent writes code, runs it, sees the error, fixes it. Structure plus feedback. Building structure for your domain is building what code already has—a language the agent can program in.

For more on this framing, see What Agents Are Actually For.

Discretization: How Error Detection Works

Shannon's insight applies directly to agent outputs.

If an agent produces unstructured output—free-form text, arbitrary JSON—errors are hard to detect. What makes a response "wrong"? How do you measure accuracy on mush?

But if the output must be one of a defined set—{APPROVE, REJECT, ESCALATE, REQUEST_INFO}—error detection becomes mechanical:

An output that doesn't match any valid option is detectably malformed
You can measure "what percentage of APPROVE decisions were correct"
Patterns become visible: "REJECT accuracy is 95%, but ESCALATE accuracy is only 70%"
Improvement becomes possible: refine the categories, adjust the prompts, expand the schema

Discretization doesn't tell you whether APPROVE was the right decision for this case. But it creates the structure needed to measure, evaluate, and improve. You can't systematically evaluate mush. You can only evaluate structure.

This is why long-running agent tasks fail even when individual decisions are mostly correct. If each decision has 99% accuracy, a 100-step task has only 37% chance of being entirely correct (0.99^100 ≈ 0.37). Errors compound. The only solution is error detection and correction at each step—which requires structure.

See Long-Running Agents for the mathematics of error compounding.

Starting Small: The Boundary Model

This might sound like heavy upfront investment. Define all entities, enumerate all states, implement validation for everything—only then deploy an agent.

That's backwards. Structure emerges through operation, not upfront design. You can't encode judgment you haven't exercised.

Start with agents as translators. The agent reads, interprets, proposes actions. You approve everything. In expense processing: the agent extracts amount and vendor, guesses a category, drafts a decision. You review. Approve, reject, or correct.

This feels slow. That's fine. You're learning what operations actually exist in your domain. What the agent keeps proposing. What you keep approving. Where it gets confused.

Turn patterns into validation. Your approvals have patterns. "Under $100 and category is meals—I always approve these." Turn that into a validation rule. Now cases matching the pattern flow through automatically. Cases outside the pattern still come to you.

The interface grows from observation: each validation rule encodes something you learned. Each structured error type started as confusion you clarified. Each tool started as information you provided manually.

Expand the boundary incrementally. More operations, more categories, more state transitions. Each addition comes from noticing "I keep handling this case manually, and there's a pattern." The UNKNOWN escape hatch—routing uncertain cases to human review—tells you where to expand next.

Over time, one person supervises multiple agents. Not because agents are fully autonomous, but because the interface handles the routine. Human attention goes to what actually needs judgment.

For the complete playbook, see The Boundary Model.

Structure Doesn't Mean Rigid

A reasonable worry: if everything is explicit rules and defined states, don't you lose flexibility?

This conflates structure with rigidity. A well-designed structure handles exceptions explicitly rather than pretending they don't exist.

The agent can approve exceptions—but within constraints:

Explanations required: The agent articulates why, creating accountability
Rate limits: Exceptions above threshold trigger review
Budgets: Bounded cost of errors
Sampling: Exceptions get reviewed, patterns emerge, rules improve

This is different from either rigid rules or unstructured "use your judgment." The agent exercises judgment, but within structure that makes judgment visible, bounded, and improvable.

See The Exception Problem for handling flexibility within structure.

Why This Is Hard

If structure enables reliable automation, why doesn't it already exist?

Because humans fill gaps. The structure never needed to be explicit—it lived in people's heads, passed through training and tribal knowledge. That was fine when humans operated the systems. Humans naturally correct errors, fill in missing context, notice when something seems off.

Agents can't access implicit knowledge. They operate on what's explicit. Everything else is a gap where errors go undetected.

Building explicit structure has real costs: it surfaces disagreements, creates visible accountability, threatens autonomy. Organizations avoided this work because humans made it unnecessary.

Now agents change the economics. The structure that was never worth building for humans becomes essential for reliable automation.

See The Missing Scaffolding for the organizational dynamics.

The Evidence

This isn't just theory. Look at where agents actually work.

Coding agents succeed because code has inherent structure—syntax, type systems, compilation, tests. The environment gives immediate, unambiguous feedback on errors. The structure exists.

General office assistants fail ("Clippy" syndrome) because they operate on unstructured domains—vague intent, unconstrained action spaces, no clear error feedback. There's nothing to detect errors against.

Vertical AI companies that build domain-specific structure are succeeding. Those treating structure as secondary keep hitting the same wall: impressive capability, unreliable outcomes.

Structure enables error detection; error detection enables reliability; reliability enables production deployment.

See Why Coding Agents Work for detailed analysis.

What This Means

The models are good enough. The structured systems—the validation layers and domain languages that enable error detection—mostly don't exist yet. That's the work.

Successful products have always been structure-providers. Jira structures project management. Salesforce structures sales processes. QuickBooks structures accounting. These products succeeded not because of brilliant engineering—but because they made complex domains explicit and navigable. Structure is what products do.

Agents don't create this need. They reveal it. They expose the difference between structure that works at human pace (fuzzy, implicit, compensated for by human judgment) and structure that must work at machine speed (explicit, validated, error-detecting). Agents change the economics: the structure that was never worth building for humans becomes essential for reliable automation.

For engineers: The valuable skill is defining structure that enables error detection. Understanding domains well enough to enumerate states, specify invariants, design operations that compose cleanly. This is harder than writing prompts and more durable than chasing the latest model.

See What Software Engineers Actually Do.

For product managers: Before asking "can we build an agent for this?", ask "do we have structure that would make agent errors detectable?" If not, that's the prerequisite work.

For investors: Look for domain-specific structure as competitive moat. Companies that accumulate encoded judgment—refined rules, validated schemas, error-detecting boundaries—have defensible advantages.

For entrepreneurs: The opportunity is building structure that doesn't exist. Find domains where experts carry implicit knowledge that could be made explicit. That structure becomes the foundation for reliable automation.

Continue Reading

This essay provides the overview. The detailed essays explore each concept in depth:

The Framework

Why Reliable Systems Look the Way They Do — The theoretical foundation: why structure enables reliability
What Agents Are Actually For — Agents as translators, not workers
Long-Running Agents — Error compounding and why structure is the solution

Building It

Building the Interface — Building the structured interface that enables error detection
The Boundary Model — Incremental deployment playbook
The Exception Problem — Handling flexibility within structure

Context & Implications

The Missing Scaffolding — Organizational and economic barriers
What Happens Next — Structure as the missing piece
What Software Engineers Actually Do — How engineering roles evolve
The Irreducible Human — Open questions about human irreducibility
Why Coding Agents Work — Real-world evidence for the framework