Why Reliable Systems Look the Way They Do

Pawel Zimoch · ~13 min read · Essay 02

Every high-reliability domain has discovered the same thing independently.

Aviation has rigid phraseology. Surgery has checklists with verbal confirmation. Nuclear plants have procedure read-and-verify protocols. Submarines require orders to be repeated back verbatim. While there has been some cross-pollination - the WHO Surgical Safety Checklist was explicitly inspired by aviation - each field largely evolved its structure through painful experience: incidents investigated, ambiguities identified, procedures tightened.

This convergence isn't coincidence. It reflects something fundamental about how reliable systems work.

Structure isn't bureaucracy for its own sake. It's the mechanism by which errors become detectable and correctable. Understanding why this is true - not just that it's true - changes how you think about building systems that work.

The Core Problem: Noise

Any channel through which information flows is noisy.

In aviation, it's literal noise - static, overlapping transmissions, accents, similar-sounding call signs. In a hospital, it's the chaos of an emergency room, the fatigue of a long shift, the cognitive load of tracking multiple patients. In a business, it's ambiguous emails, assumptions that don't match reality, context that didn't get passed along.

Noise corrupts information. What was sent isn't exactly what was received. What was intended isn't exactly what was understood.

In informal communication, humans handle this by relying on shared context. If you mishear something, you fill in the gap with what makes sense. If an instruction is ambiguous, you interpret it based on what you know about the situation. Most of the time, this works well enough.

But "well enough" breaks down when stakes are high or scale is large. Small error rates compound into big problems. The interpretation that seemed reasonable turns out to be wrong in exactly the case where it matters most. The gap that got filled in incorrectly leads to consequences that can't be undone.

High-reliability domains learned this the hard way. Every piece of structure in aviation traces back to some accident where informal communication failed. The rules aren't arbitrary - they're scar tissue.

How Structure Enables Error Correction

Shannon's foundational insight in information theory was this: reliable communication over a noisy channel requires encoding messages into discrete symbols with redundancy.

Why discrete symbols? Because discretization creates what you might call "basins of attraction." If you're sending one of four possible messages - A, B, C, or D - and noise corrupts A slightly, the receiver can still identify that A was intended, because the corruption didn't move it all the way to B, C, or D. The discrete structure lets you detect and correct errors.

Continuous signals don't have this property. If you're sending a voltage that could be any value, and noise shifts it from 1.0 to 0.97, you can't know whether that's corruption or whether 0.97 was the intended value.

Why redundancy? Because redundancy creates constraints. If valid messages must satisfy certain patterns, then corrupted messages that violate those patterns are detectably wrong. You can catch the error, request retransmission, or apply correction.

This is how digital communication works. It's why your internet connection doesn't garble data even though the physical channel is noisy. Discrete symbols plus redundancy plus error-correction protocols equals reliability.

The same principle applies to human systems, discovered empirically rather than mathematically.

How High-Reliability Domains Apply This

Aviation's readback protocol

When a controller issues an instruction - "United 452, descend and maintain flight level 280" - the pilot doesn't just say "okay." They read it back: "Descend and maintain flight level 280, United 452."

This is redundancy. The same information is transmitted twice - once by the controller, once by the pilot. If the pilot reads back "270" instead of "280," the controller catches the discrepancy immediately. The error is detected before it causes a problem.

The protocol also uses discrete formats. Altitudes are stated in specific ways. Call signs follow patterns. If someone uses non-standard phraseology, it stands out - the deviation from expected structure is itself a signal that something might be wrong.

Surgical checklists

Before an operation, the team runs through a checklist. The surgeon states what procedure they're performing. The anesthesiologist confirms. Nursing confirms the equipment. Each statement is acknowledged.

"Scalpel." "Scalpel." The repetition isn't ceremony - it's error correction. It creates a checkpoint where wrong-site surgery, missing equipment, or miscommunication about the procedure gets caught before the incision.

Studies show surgical checklists reduce complications and mortality significantly. Not because surgeons don't know what they're doing, but because explicit structure catches the errors that slip through informal communication.

Nuclear plant procedures

Operators don't act from memory or judgment. They read procedures aloud and verify each step. Another operator confirms. Actions are logged.

The structured protocol means errors get caught at multiple points. The logging means you can reconstruct what happened. The verification means no single person's mistake propagates uncaught.

The Invisible Error Correction in Normal Organizations

Most organizations don't have this level of structure. They don't need it - the stakes aren't life and death, and errors are usually recoverable.

But that doesn't mean they lack error correction. They just have informal error correction that nobody recognizes as such.

The senior person who "just knows." They catch errors because they've seen everything. Experience functions as error detection. They review things and say "that doesn't look right" - pattern-matching based on years of accumulated cases.

The weekly sync meeting. People surface discrepancies verbally. "Wait, I thought we were doing X?" This catches cases where different people had different understandings. It's slow, but it works.

The "check with Sarah" phenomenon. Organizations develop informal routing - certain people who understand how things actually work. When something seems off, you check with them. Social networks function as error-correction infrastructure.

Pre-deadline scrambles. Before an audit, a launch, a board meeting - everyone scrambles to reconcile things that have drifted. Batch error correction, expensive and stressful, but functional.

These mechanisms share certain properties: they're slow, they depend on specific people, and they don't scale. That's fine when the underlying process is also slow. Human-speed error correction is adequate for human-speed processes.

Why Speed Changes Everything

There's a pattern in history: when processes speed up, error correction has to become more explicit.

In craft production, a single craftsman made each item. Error correction was simple - the craftsman caught their own mistakes as they went. Slow, but the pace matched the error-correction capacity.

Assembly lines changed this. No single person sees the whole product. The craftsman's holistic error correction doesn't work. So you add quality checkpoints - inspection stations at defined points in the process. The structure becomes explicit because the speed requires it.

Automated production pushed further. Statistical process control, automated inspection, real-time monitoring. The error-correction mechanisms had to become as fast as the production process itself.

The same pattern applies to AI agents.

Human-speed processes had error correction, but it was informal, slow, and people-dependent. That was fine because the process itself was slow. There was time for someone to notice something was off. There was time for the weekly sync to catch discrepancies.

Agent-speed processes outrun this informal error correction. An agent can take a thousand actions per hour. The senior person who reviews everything can't keep up. The weekly sync can't surface discrepancies that compound by the minute.

This is why "just add agents to existing processes" produces marginal gains at best. The process was designed for a different error-correction regime. The informal mechanisms that kept it working don't function at agent speed.

To get reliability at agent speed, you need explicit structure - defined states, validated transitions, enforced invariants, discrete outputs. The structure that high-reliability domains evolved for safety reasons, you need for speed reasons.

Syntactic vs. Semantic Errors

Structure catches a specific category of errors: what you might call syntactic errors. Malformed outputs, invalid states, violated constraints. The agent outputs a JSON object missing a required field - caught. The agent tries to ship a cancelled order - blocked. The agent proposes an operation that doesn't exist - rejected.

Structure doesn't automatically catch semantic errors - outputs that are well-formed but wrong. The agent says "APPROVE" when the right answer was "REJECT." The output matches the schema; it's just incorrect.

This distinction matters. Structure is necessary but not sufficient for reliability.

But you can't even begin to address semantic errors without structure. You can measure "what percentage of APPROVE decisions were correct" only if APPROVE is a discrete category. You can identify patterns in errors only if the errors are classifiable. You can build evaluation loops only if you have something evaluable.

Structure enables measurement. Measurement enables improvement. Without structure, you're not just unreliable - you're unreliable in ways you can't even quantify.

For the semantic errors that do slip through, the answer is recoverability—designing operations to be reversible, adding delays before irreversible actions, building in review checkpoints. You can't prevent all semantic errors, but you can ensure they remain correctable rather than catastrophic.

Structure Creates Observable Systems

The deeper point is about observability.

A well-structured system is observable. You can see what state it's in. You can see what operations are being attempted. You can see what's being approved and rejected. You can audit, sample, measure, compare.

An unstructured system is opaque. The agent does... things. They seem to work? Sometimes they don't? It's hard to tell why.

Observability is what enables everything else:

Debugging: When something goes wrong, you can trace what happened
Evaluation: You can measure accuracy, track trends, compare approaches
Improvement: You can identify patterns in failures and address them
Trust: You can demonstrate that the system works, with evidence
Compliance: You can show auditors what the system does and doesn't do

High-reliability domains understood this. Aviation doesn't just have protocols - it has black boxes, incident reporting, systematic investigation. The structure creates the observability that enables continuous improvement.

If you want reliable agent systems, you need the same thing. Not just structure that constrains behavior, but structure that makes behavior visible.

Why Structure Isn't About AI's Limitations

A natural interpretation: AI needs structure because AI is limited. Sufficiently advanced AI wouldn't need these constraints.

But reliable processes need structure because communication through noisy channels requires discretization for error correction. This is true whether the actors are humans, machines, or both.

High-reliability human organizations discovered this. They didn't add structure because humans are limited. They added structure because reliable operation requires it, regardless of how capable the individuals are.

AI makes this more visible for a few reasons:

The channel is especially noisy. Natural language is ambiguous. Context gets lost. Intent doesn't transfer cleanly. The gap between what you want and what the agent understands can be large, and varies unpredictably.

AI doesn't share your background. Humans fill gaps using shared context - culture, training, relationships. AI doesn't have this. The gaps that humans would fill implicitly stay unfilled or get filled incorrectly.

AI operates at scale. Small error rates compound when you're processing thousands of cases. The error that happened once a week in a human process happens hundreds of times a day in an automated one.

But the principle - that structure enables error correction - isn't about AI's limitations. It's about the fundamental requirements of reliable systems.

AI doesn't change this principle. It just removes the option of relying on humans to fill gaps. It forces you to make explicit the structure that high-reliability organizations discovered they needed all along.

What This Means for Building Agent Systems

If structure exists for error correction, then agent system design is fundamentally about error-correction regimes. Different error types require different responses.

Malformed outputs are caught through schema validation and discrete output types. If the agent produces something unparseable, the system rejects it immediately. This type of error—the agent produces syntactically invalid output—is mechanical to catch.

Invalid operations are prevented through state machines that enforce valid transitions. An agent can't move an order from "cancelled" to "shipped" because the structure prohibits it. The rules are encoded, not left to judgment.

Business rule violations are caught through invariant checks. Refunds can't exceed original payments. Resolved tickets must have resolution notes. The rules are enforced rather than relied upon.

Wrong judgments are addressed through sampling and evaluation. Random decisions are audited by humans to measure accuracy. This type of error—the agent produces syntactically valid but semantically incorrect output—is harder to catch automatically but visible through systematic review because discrete outputs can be evaluated.

Escalation pathways route cases when errors are detected or uncertainty is high. UNKNOWN becomes a valid output category. Cases route to human review queues. The escape hatch is infrastructure, part of the error-correction system.

Observability is built in through logging. Operations are recorded. Decisions preserve their inputs. Approvals, rejections, escalations are tracked. This data structure creates visibility.

Evolution happens through failure analysis. Patterns in errors reveal structural gaps. UNKNOWN spikes indicate categories that don't fit reality. Error-correction systems improve through this learning—that's the pattern that took aviation from where it started to where it is now.

The goal is structure that makes errors visible and correctable, and that improves as learning accumulates.

Components of Error-Correction Systems

Effective error-correction regimes in agent systems typically include these elements:

Discretization: Agent outputs are constrained to defined categories or schemas rather than free-form. Invalid outputs are mechanically detectable, not subjectively flagged. Each output type has clear, measurable success criteria.

Validation: Operations are checked against preconditions before execution. State transitions are explicitly defined and enforced rather than implicit. Business rule violations are caught mechanically before they reach production.

Observability: Every agent decision is logged with its inputs, outputs, and confidence signals. You can reconstruct what happened and why for any case. Metrics exist for accuracy, error rates, and escalation patterns.

Escalation: UNKNOWN and uncertain cases route to human review rather than forcing a choice. Escalation queues are monitored with explicit SLAs. Human decisions feed back into system improvement.

Recoverability: Errors can be detected after the fact, not just prevented upfront. Mistakes can be reversed or corrected without cascading damage. There's an operational path from "we found a problem" to "we fixed it."

Evolution: The system has mechanisms for identifying patterns in failures. Structure can be updated as learning reveals what doesn't fit reality. Changes are versioned and tested before deployment.

Systems lacking these elements tend to have gaps where failures compound. The most reliable systems explicitly incorporate all of them.

Conclusion

Structure isn't a constraint imposed on reliable systems. It's what makes reliable systems possible.

High-reliability domains discovered this empirically, through accidents and incidents and hard-won learning. Information theory discovered it mathematically, through Shannon's work on noisy channels. The principle is the same: discretization and redundancy enable error detection and correction.

AI agents don't change this principle. They intensify it. The speed at which agents operate, the scale at which they work, the noisiness of natural language as an interface - all of these make explicit structure more necessary, not less.

The organizations that understand this will build agent systems that actually work. They'll invest in structure not as overhead but as infrastructure. They'll design for observability, build escalation paths, expect to iterate.

The organizations that don't will keep experiencing the same pattern: impressive demos, disappointing deployments, errors that compound faster than anyone can catch them.

Structure enables agent deployment. It's the infrastructure that makes everything else work.

This essay is part of a series on building reliable AI agent systems.

Overview: The Structure Problem

Previous: What Agents Are Actually For

Next: Long-Running Agents — Error compounding and correction mechanisms