to2d

A space for ideas, notes, and ongoing work.

Why LLMs Fail in the Real World: Design-Time Optimization vs Real-Time Stability

December 1, 2025

Why modern LLM systems overfit the setup phase and collapse during execution

Design-Time Optimization vs Real-Time Stability Diagram
Design-time preparation vs real-time adaptation in control systems

In classical control engineering, every system is shaped by two forces:

  • How well it is designed before operation
  • How well it stabilizes itself during operation

These are not interchangeable.

They solve fundamentally different problems.

Today's AI systems invest almost entirely in the first category — design-time optimization — while largely ignoring the second — real-time stability.

This chapter explains why that imbalance exists, how it manifests in LLM agents and browser automation, and why the future of AI requires shifting from feed-forward thinking to control-theoretic architectures.

1. The Two Forces That Shape Intelligent Systems

In manufacturing, robotics, aerospace, and even biology, the distinction is universal:

Design-time optimization

prepares the system.

Real-time stability

keeps the system alive.

Both matter.

But only one determines whether the system continues to function under uncertainty.

AI engineers are extremely good at design-time work: models, prompts, evals, fine-tuning, retrieval, and planning.

But the moment their systems touch the real world — the browser, APIs, timing, partial observability, corrupted states — everything becomes unstable.

Why?

Because design-time optimization cannot substitute for real-time control.

2. What AI Engineers Call "Performance" Is Mostly Design-Time Optimization

In LLM systems, design-time optimization includes:

  • prompt engineering
  • chain-of-thought templates
  • RAG tuning
  • dataset curation
  • fine-tuning / LoRA
  • planning mechanisms
  • deterministic agent scripts
  • evaluation-driven iteration
  • tool schemas and guardrails

These techniques modify the initial feed-forward policy of the model.

They improve the start of the trajectory — not the behavior of the trajectory itself.

Example:

A browser agent's YAML spec makes it start with a better plan.

But once the DOM shifts?
Once a button moves?
Once a flow times out?

The plan becomes irrelevant.

Design-time improvements can only decrease the initial error.

They cannot correct runtime error.

This limitation is universal — you can prompt your way to a better launch, but not to a stable flight.

3. Why Real-Time Stability Is a Different Category of Intelligence

Real-time stability is not about better prompts or better models.

It's about continuous sensing, comparison, correction, and convergence.

In control engineering, this is called closed-loop control:

  1. Observe the world
  2. Compare expected vs. actual
  3. Compute error
  4. Correct immediately
  5. Repeat

This loop is what stabilizes:

  • drones
  • rockets
  • autonomous vehicles
  • industrial robotics
  • biological systems

And, increasingly, LLM agents.

When an LLM agent interacts with the browser, the filesystem, or an API, it needs to:

  • detect when it misclicked
  • validate that it reached the correct state
  • handle dynamic changes
  • re-plan mid-run
  • retry based on observed reality
  • recover from drift
  • correct its own hallucinations
  • enforce invariants

This is online adaptation, not offline tuning.

It is what separates "smart models" from "reliable systems."

4. The Core Distinction: Preparation vs. Survival

Let's frame this concretely.

Design-time optimization

is tuning an autopilot controller before takeoff.

Real-time stability

is adjusting thrust and pitch 200 times per second as the aircraft descends through turbulence.

Both are valuable.

But they solve fundamentally different problems.

In AI terms:

  • prompt engineering ≈ adjusting parameters before takeoff
  • fine-tuning ≈ improving the default controller before leaving the ground
  • planning ≈ optimizing the initial flight path

These all happen before the system touches uncertainty.

In contrast:

  • DOM validation
  • state checking
  • corrective loops
  • dynamic re-binding of elements
  • detecting divergence
  • re-planning mid-run

These happen inside uncertainty.

This is why browser automation reveals the truth faster than any other domain:

An LLM without real-time stability is a rocket without sensors.

It might launch beautifully, but it cannot land.

5. Why Fine-Tuning Feels Like Magic — But Isn't Enough

Fine-tuning looks powerful because it introduces offline feedback:

  • The model sees its past failure
  • It adjusts its internal policy
  • It improves the first step next time

This gives the illusion of closed-loop behavior.

But it is still design-time correction, not runtime correction.

During execution:

  • fine-tuning cannot detect drift
  • fine-tuning cannot apply corrective action
  • fine-tuning cannot recover from invalid states
  • fine-tuning cannot handle timing sensitivities
  • fine-tuning cannot adapt to real-world invariants

It only prepares the model.

It does not enable the model to stabilize itself.

A tuned policy is still open-loop.

Closed-loop architecture is where stability actually emerges.

6. The Control-Theoretic Reframe for AI Engineers

The key insight:

The missing piece isn't in the model.

It's in the architecture around the model.

LLMs generate intelligent actions.

Control systems ensure those actions work in the real world.

Today's systems are:

  • intelligent without stability
  • powerful without robustness
  • capable without awareness
  • high-performance without self-correction

This is why every serious agent engineering team is converging toward the same realization:

We don't need smarter models.

We need architectures that keep models stable during execution.

Closed-loop systems are the only known solution to this problem in every other field.

AI is no exception.

← Back to Control Systems × AI