TO2D

Architecture Lab

Correctness / Page 3

Coverage Selection Boundary

Selecting a valid coverage option for a healthcare claim.

This example illustrates how a reliability boundary turns an automation failure into a useful domain signal.

Context

A developer builds automation for a healthcare portal form that selects a coverage option before submitting a claim.

The form originally contains two options:

Verified Coverage
Self-Pay

The automation logic is simple:

  • select a valid option
  • ensure exactly one option is selected
  • submit the form

Later, the healthcare portal introduces a third option:

Pending Insurance Review

The new option is preselected by default.

The automation now fails.

Naive Interpretation

A naive interpretation would treat this as a general automation failure:

  • maybe the DOM changed
  • maybe the prompt needs improvement
  • maybe the browser interaction failed

This widens the problem space too early.

Engineers start investigating multiple layers of the system even though most of them may not have contributed to the failure. Time gets spent adjusting infrastructure, prompts, or selectors without improving the underlying workflow.

The result is effort directed toward problems that do not contribute to business value.

Reliability Boundary Interpretation

A reliability-boundary interpretation is different.

A reliability boundary considers the path between where the business goal is defined and where the error occurs.

Instead of treating the event as a generic system failure, the system asks:

  • what assumptions were made along this path
  • which components remained valid
  • which invariant was violated

In this case the business goal is:

Select a valid coverage option and submit the claim form.

The automation assumes the admissible option set is:

{ Verified Coverage, Self-Pay }

The reliability boundary enforces that assumption before submission.

Developer Logic

Example implementation:

const state = inspectCoverageForm()

assertAdmissible(state, {
  allowedOptions: ["Verified Coverage", "Self-Pay"],
  requireSingleSelection: true
})

await submit(state)

This logic encodes two invariants:

option in KnownCoverageOptions
count(selected) = 1

These invariants define the reliability boundary.

Observed Website State

The portal now presents:

Coverage Eligibility

[ ] Verified Coverage
[ ] Self-Pay
[x] Pending Insurance Review

The system detects that the observed state is outside the known admissible configuration.

Boundary Result

CoverageSelectionBoundaryError

unexpected option detected:
Pending Insurance Review

known option set violated
submission blocked

The system stops before submission.

Why This Is a Reliability Boundary

The error localizes the failure along the path between the business goal and the observed state.

The system can now determine that several components did not contribute to the failure:

  • the browser successfully loaded the page
  • the DOM was correctly inspected
  • the automation reached the submission stage
  • the interaction logic executed correctly

The failure occurred because the domain interface changed.

The decision space of the form expanded.

Positive Measurement

Once the boundary exists, components outside the failure point begin contributing positive certainty.

Instead of:

automation failed
everything is suspect

the system can determine:

browser execution: valid
page load: valid
DOM inspection: valid
automation path: valid
domain interface: changed

The boundary therefore does two things:

  1. contains the error
  2. measures which parts of the system remained correct

At that point reliability stops being a vague property of the automation stack and becomes a measurable metric.

Each component along the path between the business goal and the observed error can now be evaluated independently. If a component consistently remains valid across runs, it contributes measurable reliability to the system. If a component contributes to boundary violations, it becomes the focus of improvement.

Over time this turns reliability into something observable and cumulative rather than something inferred from occasional failures.

Instead of asking:

Is the automation reliable?

the system can measure:

page load reliability
DOM inspection reliability
interaction path reliability
domain interface stability

This is another reason reliability boundaries are powerful: they transform reliability from a general perception into something the system can measure and improve systematically.

Capitalizing on the Error

Because the failure is localized, the system can immediately produce useful information even before the automation is fixed.

For example the system can:

  • capture the new form state
  • record the unexpected option
  • route the event to the operations team responsible for the workflow
  • update the internal domain specification for the form
  • flag the automation assumption that was violated
  • identify similar failures in future runs

Once the event reaches operations, the investigation can go further.

In many domains, especially regulated ones like healthcare, websites tend to converge around similar workflows. A change observed on one portal may indicate a broader shift across the domain.

The investigation might expand to questions like:

  • did this portal introduce a new coverage state across the system?
  • are other providers beginning to support the same option?
  • should the internal form specification be expanded?
  • should automation be updated across the entire category of sites?

In this way the error becomes a signal not only about a single website, but about the evolving interface of the domain itself.

The investigation can go as far as useful for the organization: from updating a single automation rule to refining how an entire category of websites is handled.

What This Example Shows

Automation systems do not only interact with software interfaces. They interact with domain interfaces: workflows, decisions, and rules that evolve over time.

When systems operate without clear boundaries, failures appear as generic automation errors. Engineers must investigate many layers of the stack at once: prompts, DOM structure, browser behavior, infrastructure, or model output.

A reliability boundary changes that dynamic.

By encoding invariants around the business objective, the system creates a clear interface between developer logic and domain behavior. When that interface is violated, the failure is localized. The system can determine which components remained valid and which assumption no longer holds.

This transforms the role of automation errors.

Instead of representing brittle infrastructure failures, they become signals that the external system or workflow has moved outside the known operating region.

Once that signal exists, organizations can respond productively:

  • engineers refine the invariants that define valid behavior
  • operations teams update domain specifications
  • new workflow states are recorded and classified
  • automation improves across future runs

Over time, these boundaries accumulate into a more accurate representation of how the domain actually behaves.

The result is not just more reliable automation.

It is a system that continuously converts unexpected events into structured knowledge about the domain it operates within.

Key Principle

A reliability boundary turns an automation error into a domain signal.

Instead of widening the problem space, it narrows it. Instead of obscuring the cause, it reveals the interface that changed. And instead of producing only fixes, it produces knowledge that improves the system over time.

← Back: Where Reliability Boundaries AppearNext: Reliability Boundary Explorer ->