Back
A2UIGenerative UIArchitecture

A2UI Part 2: Taming State in Streamed Interfaces

When an agent continuously streams new UI JSON, how do you prevent the user's client-side state from wiping out? A deep dive into Semantic Diffing and DOM Reconciliation.

Generative UI feels like magic right up until the exact moment a user tries to type into a text input while an LLM is still streaming the layout. Suddenly, focus is lost, keystrokes are dropped, and the UI flickers wildly.

This is the "Flashing Input" Problem, and it is the single biggest hurdle in moving Generative UI from a party trick to a production-grade paradigm. In Part 1, we introduced the A2UI flat adjacency list. Now, we'll dive into the mechanics of State Reconciliation.

The Re-render Nightmare

When an LLM streams a UI definition, it usually sends chunks of JSON. If your React frontend naively parses this JSON and renders it on every chunk, it effectively destroys and recreates the entire DOM tree every 50 milliseconds.

React's internal reconciler tries to help by matching elements based on their position in the tree. But LLMs are non-deterministic; an agent might decide to wrap an input box in a new <fieldset> mid-stream because it suddenly realized it needs a grouped layout. When the tree depth changes, React unmounts the old input (destroying the user's typed text) and mounts a new one.

The Solution: Decoupling Layout from State

To solve this, we must enforce a strict separation of concerns:

  1. The Agent owns the Layout (What components exist and where they go).
  2. The Client owns the State (What the user has typed or toggled).

We achieve this by abandoning React's positional reconciliation and implementing a Global Semantic Key Registry.

The Flawed Approach (Positional)

If an LLM generates UI elements based on array indices, state destruction is inevitable.

TS.SNIPPET
// ❌ BAD: Re-renders destroy state when the array order changes function StreamingForm({ jsonStream }) { return ( <form> {jsonStream.map((component, index) => ( // React uses 'index', meaning if a new component is prepended, // every subsequent component loses its state! <DynamicInput key={index} {...component} /> ))} </form> ) }

The A2UI Approach (Semantic Hashing)

In A2UI, the agent is forced (via its system prompt and JSON schema constraints) to generate a deterministic, semantic ID for every interactive element based on its purpose, not its position.

TS.SNIPPET
// ✅ GOOD: A2UI Semantic Keying function StreamingForm({ a2uiStream }) { // We maintain an external state store that survives tree destruction const [globalState, setGlobalState] = useGlobalStore(); return ( <form> {a2uiStream.elements.map((el) => { // IDs are semantic: e.g., "input_billing_zipcode" // Even if the LLM wraps this in 4 new divs mid-stream, // the ID remains stable. return ( <DynamicComponent key={el.semanticId} config={el} // Bind value directly from the protected client store value={globalState[el.semanticId]} onChange={(val) => setGlobalState(el.semanticId, val)} /> ) })} </form> ) }

Handling Latency: Optimistic UI

What happens if the user interacts with a button that tells the agent to regenerate the UI? There is often a 2-second LLM processing delay.

Instead of freezing the UI, A2UI uses Optimistic Shadowing. When the user clicks "Add another shipping address", the client immediately injects a placeholder shadow-node into the flat adjacency list:

json.SNIPPET
[ { "id": "addr_1", "type": "address_block" }, { "id": "addr_2", "type": "address_block", "isOptimistic": true } ]

When the agent finally streams the actual new layout, the A2UI reconciler sees that the agent generated an addr_2 equivalent. It silently swaps the optimistic shadow node with the agent's generated node, preserving any text the user managed to type into the shadow node during the 2-second wait.

Executive Conclusion

You cannot build robust Generative UI by treating the LLM as a traditional backend database. The stream is volatile, non-deterministic, and constantly shifting.

By utilizing flat adjacency lists, enforcing semantic keys, and completely decoupling the user's state from the agent's layout tree, you can achieve a generative interface that feels as stable and responsive as a hard-coded React app.

Read more articles

Explore the full tech feed for more research.