to2d

Agents fail not because models are weak, but because the input domain is wrong.
Domain extractors and problem rewriting are the machinery that converts a messy real-world situation into a solvable representation — a representation that places the model in the correct manifold region and eliminates ambiguity before any operator is applied.

This section formalizes the design of domain extractors, the theory behind representation rewriting, and includes concrete examples that show exactly how this works in production.

1. Why domain extraction is necessary

Real-world inputs contain:

irrelevant data,
inconsistent structures,
mixed modalities,
conflicting cues,
high entropy,
ambiguous semantics.

LLMs cannot disentangle this. They treat everything as one collapsed input.

Domain extractors isolate only what matters.

They reduce the problem to a tractable slice with minimal ambiguity.

2. Formal definition

Define the system state:

xₜ ∈ S

A domain extractor is a projection:

zₜ = Pₛ(xₜ)

Where zₜ is the canonical form used by the operator.

Properties:

idempotent (extracting again does not change the result),
structure-preserving,
domain-purifying,
entropy-reducing,
invariant-respecting.

3. Problem rewriting

After extraction, the canonical form may still not reflect the representation the model can solve.

Problem rewriting transforms the slice into a representation that aligns with a stable manifold region:

rₜ = R(zₜ)

Where R:

normalizes structure,
enforces schemas,
removes ambiguity,
simplifies the objective,
clarifies constraints.

This step converts an otherwise impossible prompt into a solvable one.

4. Example: Browser automation (the most intuitive case)

Raw input

A full DOM tree with:

hidden nodes,
inconsistent structure,
irrelevant sections,
noise from scripts.

Domain extractor

zₜ = visible_interaction_region(DOM)

Examples:

the form currently displayed,
the active modal,
the table row that changed.

Rewriter

rₜ = canonical_DOM_structure(zₜ)

This may:

flatten tables into lists,
normalize forms,
remove dynamic attributes,
simplify selectors,
preserve only actionable elements.

Only then is the operator called:

actions = f(rₜ)

This eliminates 90% of browser-agent instability.

5. Example: Document extraction

Raw input

A 50-page PDF with layout noise, mixed styles, and irrelevant sections.

Domain extractor

zₜ = extract_section(document, target_field)

Examples:

only the "Payment Information" table,
only the "Employment Start Date" line,
only the W-4 box the workflow needs.

Rewriter

rₜ = normalize_table(zₜ)

Or:

rₜ = canonical_text_form(zₜ)

Now the operator extracts fields reliably.

6. Example: Compliance / HR workflows

Raw input

Employee data across:

multiple systems,
different formats,
optional fields,
irrelevant attributes,
location-dependent rules.

Domain extractor

zₜ = jurisdiction_relevant_subset(employee_state)

Rewriter

rₜ = schema_align(zₜ, compliance_rule_schema)

Now the operator produces:

steps = f(rₜ)

And each step is correct because the domain is correct.

7. Why domain extractors stabilize LLMs

Domain extractors:

contract the manifold region,
remove high-entropy elements,
eliminate mixed-domain patterns,
prevent attractor drift,
align the input with the model's strongest internal structure.

Rewriting ensures that the operator sees only the representation it can reliably transform.

This is the hidden source of high reliability in well-designed agent systems.

8. Representation rewriting patterns

1. Canonicalization

Convert multiple possible input forms into a single standard form.

tables → lists
messy paragraphs → key-value blocks
raw DOM → interaction graph

2. Constraint encoding

Bake constraints into the representation instead of describing them.

3. Goal specifying through structure

Use structure, not text, to express the objective.

4. Noise pruning

Delete everything irrelevant.

5. Semantic flattening

Simplify concepts into machine-stable forms.

9. Why most agent frameworks fail

They skip this entire step.

They give the model raw:

HTML,
user history,
entire conversations,
full documents,
mixed tasks.

The model collapses all of this into one latent state → unstable trajectories → hallucination → failed workflows.

Domain extraction + rewriting solves this.

10. Link to 0-Context Architecture

0-context is essentially domain extraction plus strict rewriting with zero residue.

You isolate:

one domain,
one structure,
one objective,
one representation.

And you present only that to the operator.

This is why 0-context outperforms long-context systems on real automation tasks.

11. Research directions

automated canonicalization of enterprise schemas,
stability analysis under different rewriting strategies,
manifold-region mapping using domain-extracted embeddings,
designing robust domain-projection languages,
cross-domain extraction for multi-operator pipelines.

The extractor–rewriter architecture is the backbone of verifiable agents.

Domain Extractors & Problem Rewriting

1. Why domain extraction is necessary

2. Formal definition

3. Problem rewriting

4. Example: Browser automation (the most intuitive case)

5. Example: Document extraction

6. Example: Compliance / HR workflows

7. Why domain extractors stabilize LLMs

8. Representation rewriting patterns

9. Why most agent frameworks fail

10. Link to 0-Context Architecture

11. Research directions