Skip to story

Ai Frontend Integration

Streaming Tokens Without Layout Thrash

9 min read · May 31, 2026 ★ Flagship

Reading level

60 tokens per second, 500 layout recalculations

You're building an AI chat interface. The LLM streams its response token by token — every character fires a React state update, which triggers a re-render, which calculates a new layout. On a typical 200-word response at 50 tokens/second, that's 50 DOM mutations per second. On a phone, the page stutters. The CPU is maxed. The battery drains faster than it should.

The fix is a render buffer: accumulate tokens in memory, flush to the DOM at 60fps via requestAnimationFrame. Same tokens, smoother experience, 90% less CPU for the rendering work.

The mismatch: LLMs emit 30–80 tokens/second. Displays refresh at 60fps (16ms/frame). Without buffering, you're forcing 30–80 DOM mutations/second when the display can only show 60 frames/second. The excess mutations are wasted CPU that never produce visible output — they happen between frames and are immediately overwritten by the next mutation.

The performance model: each DOM mutation triggers a potential style recalculation and layout reflow. In streaming text rendering, appending a character to a text node forces the browser to recalculate the width of the container and potentially reflow siblings. At 60 mutations/second, you're triggering 60 potential reflows/second — the browser's style recalculation budget is 16ms/frame, shared across all work. You're burning most of it on work that has no visible output.

The requestAnimationFrame buffer pattern

The idea: collect all tokens that arrive between frames into a string, then flush the whole string to the DOM in one update at the start of the next frame. One DOM update per frame instead of one per token.

// ❌ One DOM update per token — thrashes layout
for await (const chunk of stream) {
  setContent(prev => prev + chunk); // re-renders on every token
}

// ✅ Buffer tokens, flush at 60fps
let buffer = '';
let rafId = null;

function flushBuffer(setContent) {
  setContent(prev => prev + buffer);
  buffer = '';
  rafId = null;
}

for await (const chunk of stream) {
  buffer += chunk;
  if (!rafId) {
    rafId = requestAnimationFrame(() => flushBuffer(setContent));
  }
}

The word-boundary variant flushes at word boundaries (space or punctuation) instead of frame boundaries — this makes streaming feel more natural, since full words appear at once rather than character by character:

let wordBuffer = '';
let rafId = null;

for await (const chunk of stream) {
  wordBuffer += chunk;
  // Flush when we have a complete word (whitespace or punctuation)
  if (/[\s.,!?;:]$/.test(wordBuffer) && !rafId) {
    const toFlush = wordBuffer;
    wordBuffer = '';
    rafId = requestAnimationFrame(() => {
      setContent(prev => prev + toFlush);
      rafId = null;
    });
  }
}
// Flush remainder at stream end
if (wordBuffer) setContent(prev => prev + wordBuffer);

Framework integration notes:

  • Vercel AI SDKuseChat and useCompletion return isLoading and content; the buffer is your responsibility. Wrap the onChunk callback with the rAF pattern.
  • React 18 concurrent modestartTransition can defer streaming updates as non-urgent, letting React batch them with other state updates. Combine with the rAF buffer for optimal scheduling.
  • Virtualized lists — if streaming into a long conversation, ensure the scroll-to-bottom behavior uses scrollIntoView inside a useEffect keyed to content length, not on every token.

Cleanup: cancel the rAF handle when the component unmounts or the stream ends, to avoid a dangling animation frame attempting to update unmounted state.

A chat that stutters at 50 tokens per second

Priya built a chat interface for a GPT-4 integration. It worked — the AI responded, text appeared. But on fast connections the page flickered. On mobile, it stuttered. Users on slower laptops saw the browser tab go unresponsive mid-stream. Scroll position jumped as new tokens pushed content down. She hadn't changed anything about the AI — just how she displayed its output.

The implementation was simple: every SSE message called element.textContent += token. On a fast model at 60 tokens/second, that was 60 DOM writes per second. The browser tried to reflow the page layout on every single one.

The technical cause: each call to set textContent invalidates the element's layout box. The browser schedules a style recalculation and potential reflow. At 60 mutations/second, that's 60 scheduled reflows per second in a 16ms/frame budget — consuming effectively all available CPU for layout work, leaving nothing for user input handling, scroll, or other animations. On a 4-core machine this is survivable; on a 2-core phone, it jank-locks the thread. The frame rate collapses from 60fps to under 20fps during streaming.

Systemic impact: this pattern scales badly across product surfaces. In a customer-facing AI product serving thousands of concurrent streams, 60 mutations/second per user means you are competing with the browser's own rendering pipeline on every device, including the weakest ones in your user base. Perceived quality regresses linearly with token rate — the faster the model, the worse the experience — which is precisely backwards. The faster the model should be, the smoother the display should feel. Token-rate DOM mutations invert this relationship.

One DOM write per frame, not per token

Priya added a two-line buffer: accumulate incoming tokens in a string variable, schedule a requestAnimationFrame callback if one isn't already scheduled, and in that callback flush the buffered string to the DOM. Frame fires, DOM updates once, buffer clears. The next batch of tokens accumulates until the following frame.

The stutter disappeared. Scroll stopped jumping. Mobile felt smooth. The content of the stream was identical — only the display rhythm changed. One DOM mutation per frame instead of one per token.

Implementation changes: the critical detail is the if (!rafId) guard — only one rAF is queued at a time, regardless of how many tokens arrive between frames. Without the guard, you queue one rAF per token and recreate the original problem. The buffer itself is a module-scoped string variable (or a React ref), not state — writing to a ref doesn't trigger a re-render, which is the point. The rAF callback is the only place that writes to state, gating re-renders to 60fps maximum.

Measurement: instrument with a PerformanceObserver watching longtask entries during a stream. Before the fix: multiple long tasks (>50ms) per second during active streaming. After: zero or near-zero long tasks during streaming. Use Chrome DevTools → Performance → record a 5-second stream, count "Recalculate Style" events in the flame chart. Before: one per token (50–80). After: one per frame (8–12 for a 2-second stream). The visual difference is directly measurable in the flame chart before any user testing.

Pattern at a glance

Annotated example: token streaming render strategies

❌ TOKEN-PER-RENDER

for await (const chunk of stream) {
  // DOM write on every token
  el.textContent += chunk;
  // 60 reflows/sec at 60 tok/s
}

Each token triggers layout recalculation; 60+ reflows/second on fast models causes jank and scroll jumps

✅ RAF BUFFER FLUSH

let buf = '', rafId = null;
for await (const chunk of stream) {
  buf += chunk;
  if (!rafId) rafId =
    requestAnimationFrame(() => {
      el.textContent = buf;
      rafId = null;
    });
}

Tokens batch between frames; single DOM write per 16ms frame; smooth at any token rate

Watch it: unbuffered vs buffered token streaming

The "Unbuffered" animation shows tokens arriving one at a time, each causing a visible reflow. Notice the jank. The "Buffered" animation shows the same token stream but flushes to the DOM once per animation frame — smooth, no jank.

Open DevTools → Performance and record while each mode streams a 150-token response. In unbuffered mode, you'll see 150 "Recalculate Style" tasks. In buffered mode, about 12 (one per frame at 60fps for a 2-second stream).

The third mode in the demo shows the word-boundary pattern: flush on whitespace, not on every token. Compare the visual rhythm — words popping in feels more natural than characters trickling, even at the same underlying stream rate.

⚡ Interactive demo

SSE parsing, rAF buffers, and streaming markdown

Two ways to consume a streaming LLM response in the browser. The simplest is EventSource — built-in SSE support, auto-reconnects, no setup. The limitation: GET requests only, no custom headers (so no Authorization Bearer token).

// EventSource — simplest, GET only
const es = new EventSource('/api/stream');
es.onmessage = (e) => {
  buffer += e.data;
  if (!rafId) rafId = requestAnimationFrame(flush);
};

// Key pitfall: always close the EventSource when done
es.addEventListener('done', () => es.close());

If you need POST or custom headers (the common case with API keys), use fetch with ReadableStream instead.

The fetch + ReadableStream pattern for authenticated streaming:

const res = await fetch('/api/stream', {
  method: 'POST',
  headers: { 'Authorization': `Bearer ${token}` },
  body: JSON.stringify({ prompt }),
});
const reader = res.body.getReader();
const decoder = new TextDecoder();

let buffer = '', rafId = null;
const flush = (setContent) => {
  setContent(prev => prev + buffer);
  buffer = ''; rafId = null;
};

while (true) {
  const { done, value } = await reader.read();
  if (done) break;
  // Parse SSE manually: lines starting with "data: "
  const text = decoder.decode(value);
  for (const line of text.split('\n')) {
    if (line.startsWith('data: ')) {
      const chunk = JSON.parse(line.slice(6))?.delta ?? '';
      buffer += chunk;
      if (!rafId) rafId = requestAnimationFrame(() => flush(setContent));
    }
  }
}

Markdown consideration: streaming markdown mid-parse can produce invalid intermediate states (e.g., a half-rendered code fence). Use a streaming markdown parser like marked with incremental input, or buffer until a complete "block" (double newline) before rendering markdown — rendering raw text until the stream ends, then switching to rendered HTML, is often the simplest correct approach.

At product scale, the rAF buffer pattern needs a few additions. First, cleanup: store the rAF ID in a ref and call cancelAnimationFrame in the component's cleanup function to prevent state updates on unmounted components. Second, scroll anchoring: scroll-anchor CSS or a useEffect on content length that calls scrollIntoView inside the rAF callback — not on every token — prevents scroll thrash. Third, React 18 concurrent mode: wrapping the buffer flush in startTransition marks the streaming update as non-urgent, letting React defer it during user interactions (typing, scrolling) and batch it with other low-priority updates. This is the correct model for production chat UIs where the stream should never block user input. Combine: startTransition(() => setContent(...)) inside the rAF callback, not instead of it.

References

Remember

Key takeaways

  • Don't append each token directly to the DOM. Buffer tokens in a string and flush to the DOM once per animation frame using requestAnimationFrame.
    The mismatch: LLMs emit 30–80 tokens/second; displays refresh at 60fps. Extra mutations between frames are wasted CPU — they never produce visible output. Buffer to frame rate.
    Each token-level DOM mutation triggers a potential style recalculation. At 60 mutations/second, you consume the browser's entire 16ms style-recalc budget on invisible work. One rAF flush per frame reduces this by 95%.
  • Clean up the rAF handle when the stream ends or the component unmounts — use the return value of requestAnimationFrame with cancelAnimationFrame in cleanup.
    Word-boundary flushing (flush on space/punctuation) feels more natural than frame-boundary flushing. Combine both: flush at word boundaries but only once per rAF to cap at 60fps.
    React 18 startTransition + concurrent mode can handle buffering at the framework level for streaming updates — mark streaming state updates as non-urgent transitions to let React batch them. This is complementary to, not a replacement for, the rAF buffer.

Enjoyed this case?

Case 1 of 1 in Ai Frontend Integration · 7 of 31 live

Keep going

Finish this takeaway, then continue the track — Casey saved your spot locally.

Sign in with email to sync progress across devices (beta).

Inside the Casebook

New cases every few weeks — patterns from production UI engineering. Double opt-in, easy unsubscribe.

No spam. Unsubscribe anytime. Emails sent via Buttondown.

RSS feed
Casey, junior (idle)
Casey · Junior

Hey! I'm Casey — scroll through the case and I'll chime in with hints.