Correctness / Page 2

Where Reliability Boundaries Appear

Automation systems operate across multiple layers. Each layer introduces assumptions about how the system behaves.

Reliability breaks when the real cause of failure sits outside the boundary of the system that is trying to handle it.

What a Reliability Boundary Is

In practice, a reliability boundary usually appears as an interface defined by parameters and constraints.

At that interface, the system declares what it assumes to be true and what must be validated before execution continues.

Typical boundary parameters include schema constraints on model output, expected DOM structure for extraction, authentication state for session reuse, and environment properties such as IP location or device identity.

When these parameters are explicit and enforced, failures become diagnosable. When they are implicit or undefined, failures leak through and appear random.

Model Output Boundary

Many automation systems rely on language models to generate structured actions or data.

{
  "action": "click",
  "target": "login_button"
}

Sketch: boundary location and boundary movement

Default LLM system boundary: the model produces probabilistic output. A reliability boundary decides whether to accept, repair, retry, or reject the result.

If the model output is malformed or inconsistent, the system must decide how to handle it.

In many systems today, the reliability boundary sits at the prompt. Developers try to improve reliability with better instructions, more examples, and stronger formatting hints. But when the output breaks, the system often has limited ability to explain why.

More robust systems move the boundary outward by enforcing structure after generation rather than relying only on prompt quality. That can include schema validation, structured outputs, repair mechanisms, and retry strategies.

This is the same direction behind llm-contract, one of my other projects. The idea is simple: model output should pass through an explicit typed contract, so the system can validate it, repair it, retry it, or fail clearly instead of silently accepting malformed structure.

https://github.com/operatorstack/llm-contract

Page Interpretation Boundary

Automation depends on assumptions about selectors, DOM structure, and layout patterns. When pages change, these assumptions break.

Sketch: page interpretation boundary movement

Boundary at selectors

Problem introduced: brittle click behavior when DOM assumptions change.

Website DOM

↓

Screenshot capture

↓

Element lookup by selectors

↓

Click action

Example assumption:

button[type="submit"]

If DOM assumptions break, failures leak through and click behavior becomes unreliable.

Boundary at interpretation layer

Problem introduced: if checks are missing, the system cannot explain boundary failures.

Website DOM

↓

Screenshot capture

↓

Page Interpretation Boundary

selector resolution

iframe detection

ambiguity checks (duplicate targets)

↓

Click action or boundary error

When assumptions fail, return a boundary error instead of silently clicking the wrong element.

Page interpretation boundary: automation depends on DOM assumptions. The boundary decides whether the system silently acts on incorrect structure or detects that the page no longer matches expectations.

duplicate elements
iframe nesting
dynamic rendering
layout changes

If the developer did not anticipate these changes, the system fails. Systems that move this boundary outward introduce semantic element detection, DOM inspection, and validation of extracted data.

Session and Authentication Boundary

Many workflows rely on session reuse:

Sketch: login state through a session boundary

password

Reliability Boundary

HTML -> Authentication / Web Storage Tech -> Backend Authentication

Session State Stack

cookies (partial persisted)
session cookies (volatile)
indexedDB state

session interface: browser state translated into backend authentication context

Service Backend

auth service

session validation

trust checks

This may work locally but fail in production due to IP changes, device trust checks, or additional security verification. The visible error often appears as a generic login failure while the real cause is environment trust.

Environment and Network Boundary

Some failures originate outside automation logic, in infrastructure variables treated as static configuration.

proxy location
IP stability
browser fingerprint
device identity

Expanding this boundary requires diagnostics, proxy viability checks, and security challenge detection so systems can identify when environment state is the root cause.

The Pattern

Reliability boundaries are the minimum structure needed to keep a system solving problems in the right direction. Especially with AI, which can reason over many signals in automation at once, systems can appear powerful while still drifting.

The value of the boundary is that it gives that intelligence structure. It creates a path for AI not only to act, but also to resolve, classify, and explore failures in ways that compound into better system behavior over time.

Across all examples, failures occur when the true cause lies outside the system reliability boundary.

When boundary parameters are explicit, the system can diagnose failures. When they are implicit, failures turn into guesswork.

In practice, this is also a system design choice. Boundaries can be defined in ways that are observable, reportable, and classifiable.

Customer-facing errors can map to explicit boundary states.
Internal telemetry can identify which boundary parameter failed.
Recovery paths can be associated with boundary failure categories.

Inside the boundary

failure -> diagnosis -> recovery

Outside the boundary

failure -> guesswork

The next section looks at three concrete examples from browser automation where these boundaries become visible in practice.

A Note on Boundary Choice

One reason I was initially drawn to computer-use style agents is that they let a single builder place the boundary almost anywhere.

If the system is understood deeply enough, operational knowledge can be translated directly into automation behavior. For example, an operations rule can become a direct action in the agent, with very little translation layer between operations teams and automation.

In that setup the boundary is highly flexible. The tradeoff is that this approach requires high skill from the system designer: domain understanding and automation stack understanding both need to be strong enough to choose boundaries and encode assumptions correctly.

This can work well for a single builder or a small team with deep context. It becomes harder as systems are used by larger teams or external developers.

Libraries and frameworks often choose more structured boundaries. They trade some flexibility for systems that are easier to reason about, operate, and extend.

This does not make computer-use models the wrong approach. It reflects a different boundary choice.

In practice, the design question is not whether one approach is universally better, but where the reliability boundary should sit for the people building and operating the system.

Prompt Guidance as Boundary Guidance

One thing that still needs better treatment in automation infrastructure docs is prompt guidance. Prompt changes can affect end-to-end system behavior, but guidance is often too loose, usually framed as a generic suggestion to keep adjusting the prompt.

A stronger approach is to connect prompt changes to specific failure signals and boundary conditions. That makes it clearer when prompt edits are the right tool, which part of the system they are likely to affect, and how to make those edits without introducing new instability.

Computer Use and Observability

Computer use also has an extremely high skill ceiling because it creates a path from an idea to direct execution, and each path introduces its own boundary choices.

At the same time, one advantage of computer use is that it creates a direct boundary from input to page action, which improves observability. Agent logs and action traces are often human-readable.

That can be a useful reason to give customers or end users access to those logs, since it makes diagnosis faster when failures occur.

Human-readable action traces can act as a boundary interface between the automation system and the people trying to understand it.

More generally, this shows that observability does not have to be added only after the fact. It can be designed through the boundary itself. When boundaries are chosen well, they do not just constrain execution. They create surfaces through which systems can be observed, explained, and improved.

In that sense, one of the things boundaries make possible is observability itself.

Boundary Choice as Product Advantage

A simple interface can hide a very deep reliability boundary.

What looks like a prompt is often really a maintained boundary.

Boundaries can also be a source of product advantage. For end users looking for real leverage, one useful place to look is the boundary a product has chosen to build around the problem.

A system may look simple on the surface. A product might expose what appears to be a short prompt that automates work across many websites. If evaluation focuses only on that interface, the value can look easy to replicate.

In many cases, the harder-to-reproduce value is what the boundary enables around that interface: maintenance, debugging, observability, recovery, and continued reliability over time.

Another team may be able to write a similar prompt. Reproducing the surrounding system with the same maintained correctness and operational support is typically much harder.

That is one reason boundary choice can become a product differentiator. It shapes not only what a product does, but how well it continues to work as real users, real failures, and real operational complexity enter the system.

← Back: Correctness Next: Reliability Boundaries in Practice ->