to2d

A space for ideas, notes, and ongoing work.

Hierarchical Control for Browser Automation

Applying supervisory control and distributed systems theory to agentic architectures

December 9, 2025

Abstract

Browser automation agents face a fundamental architectural constraint: single-page state binding forces sequential, destructive navigation patterns that mirror the limitations of single-loop control systems. This paper presents multi-context browser control through the lens of control systems engineering, demonstrating that isolated page spawning enables a transition from monolithic to hierarchical control architectures. We draw parallels to supervisory control, distributed model predictive control (DMPC), and sensor fusion strategies, showing how concepts from aerospace, chemical process control, and autonomous vehicles inform the design of robust agentic systems.

1. The Control Problem in Browser Automation

1.1 Agents as Feedback Controllers

At its core, a browser automation agent is a feedback control system:

  • Reference signal: The task specification ("Fill out this form and submit")
  • Controller: The LLM reasoning about actions to take
  • Plant: The browser environment being manipulated
  • Sensor/Observer: Screenshots and DOM state returned to the agent
  • Actuation: Mouse clicks, keyboard input, navigation commands

1.2 The Single-Loop Limitation

Traditional browser agents operate as single-input single-output (SISO) controllers with one observation channel (the current page) and one actuation channel (commands to that page).

Control Theory ConceptBrowser Agent Manifestation
Limited bandwidthCan only observe/actuate one page
No parallel estimationCannot gather information from multiple sources simultaneously
State destruction on mode switchNavigation overwrites current context
No hierarchical decompositionAll tasks handled by single control loop
Coupled disturbance responseExternal changes affect entire system

Industrial control systems solved these problems decades ago through multi-loop architectures, supervisory control, and distributed controllers.

2. State-Space Representation

In control theory, a system is described by its state-space representation:

ẋ = f(x, u) // state dynamics

y = g(x) // observation equation

For browser automation:

  • State vector x: Complete browser state (DOM, cookies, localStorage, network state)
  • Input vector u: Agent actions (click, type, navigate, scroll)
  • Output vector y: Observable state (screenshot, accessibility tree)

The critical insight: the state space is partitioned across pages, but certain state components are shared across the partition.

x = [x_shared, x_page1, x_page2, ..., x_pageN]

where:

x_shared = [cookies, localStorage, proxy_state, session_tokens]

x_pageI = [DOM_I, render_state_I, JS_runtime_I, network_I]

2.1 Multi-Context as State-Space Decomposition

With page spawning capability, the architecture supports parallel state partitions:

y₁ = g(x_shared, x_page1) // Main agent observes page 1

y₂ = g(x_shared, x_page2) // Tool observes page 2

y₃ = g(x_shared, x_page3) // Another tool observes page 3

u₁ affects x_page1 // Main agent actuates page 1

u₂ affects x_page2 // Tool actuates page 2

u₃ affects x_page3 // Tool actuates page 3

Critically, x_shared is accessible to all controllers, enabling authenticated operations across all pages.

3. Hierarchical Control Architecture

3.1 Supervisory Control Framework

The multi-context architecture naturally maps to supervisory control, a well-established pattern in industrial automation:

3.2 Control Hierarchy Levels

Level 0 — Plant

The browser context containing all pages and shared state

Level 1 — Local Controllers

Tools that operate on specific pages:

  • Receive high-level objectives from supervisory layer
  • Execute local control loops to achieve objectives
  • Return structured results to supervisory layer
  • Manage their own page lifecycle

Level 2 — Supervisory Controller

The primary agent:

  • Decomposes complex tasks into subtasks
  • Dispatches subtasks to appropriate local controllers
  • Aggregates results and maintains global state estimate
  • Handles exceptions and coordinates recovery

4. Parallel Observation and Sensor Fusion

4.1 Aerospace Analogy: Multiple Kalman Filters

In aerospace systems, multiple observers process different sensor streams. Each filter processes a specific sensor modality, has its own noise characteristics, and provides a state estimate. The fusion layer combines estimates weighted by confidence.

4.2 Browser Automation Parallel

In browser automation, multiple pages function as independent observers:

  • Each page provides a different "sensor reading" from a different source
  • Each observer has reliability weight (site availability, selector stability)
  • The supervisory agent fuses these observations
  • Weighted averaging produces more robust estimates
Aerospace ConceptBrowser Automation Equivalent
Sensor modalityWeb source (news, social, market, regulatory)
Measurement noiseSite reliability, selector stability
State estimateExtracted data with confidence
Kalman filterPage-specific extraction logic
Sensor fusionConfidence-weighted aggregation

5. Stability and Failure Mode Analysis

5.1 Failure Mode Isolation

A key property of multi-context architecture is failure isolation. In control theory, this is analogous to fault-tolerant control systems where subsystem failures don't cascade to global failure.

5.2 BIBO Stability Properties

Bounded-Input Bounded-Output (BIBO) Stability requires:

  • Bounded tool inputs (valid URLs, reasonable parameters)
  • Bounded outputs (structured results, error messages)
  • Tools should never hang indefinitely or crash the browser context
  • Timeouts and cleanup ensure bounded execution time

5.3 Disturbance Rejection

Robust tools implement disturbance rejection patterns:

1. Sensor Redundancy

Multiple extraction strategies (CSS selectors, ARIA labels, text matching)

2. Retry with Backoff

Exponential backoff for transient network failures

3. Graceful Degradation

Return partial results with confidence indicators

6. Chemical Process Control Analogy

Chemical plants employ a well-established control hierarchy that directly maps to browser automation:

Process Control LevelBrowser Automation Equivalent
Level 3: Supervisory/OptimizationPrimary Agent (LLM reasoning)
Level 2: Advanced Process ControlTool invocations and coordination
Level 1: Regulatory Control (PID)Tool internal logic (click, fill, extract)
Level 0: Sensors and ActuatorsPlaywright commands

7. Architectural Constraints as Control-Theoretic Boundaries

Browser LimitationControl Theory Analog
No cross-page communicationNo direct inter-controller coupling
Sequential sub-agent executionTime-sliced control loops
Page lifetime = tool call durationControllers without persistent internal state
Shared cookies/storage onlyLimited shared state space
Single browser contextSingle plant instance

8. Conclusion

Multi-context browser control represents a fundamental architectural evolution that has clear parallels in decades of control systems engineering. By framing the browser context as a shared plant, the primary agent as a supervisory controller, and spawned pages as local controllers, we can apply well-established principles from hierarchical control, distributed systems, and sensor fusion.

Key insights from the control-theoretic perspective:

  1. State-Space Decomposition: Multi-context enables parallel observation and actuation on partitioned state while maintaining coupling through shared session state.
  2. Hierarchical Control: The architecture naturally supports supervisory patterns where high-level reasoning is separated from low-level actuation.
  3. Parallel Observation: Multiple pages functioning as independent observers improves information gathering, analogous to sensor fusion in aerospace systems.
  4. Failure Isolation: Page-level isolation provides disturbance rejection properties that make the overall system more robust.

The page spawning primitive is small in implementation but significant in architectural impact. It transforms browser automation from single-loop control to distributed hierarchical control, unlocking patterns that industrial control systems have relied on for decades.

For implementation details and code examples:

Multi-Context Browser Control for Agentic Systems →
← Back to Control Systems × AI