Back to blog
WebSocket Market Data Performance C++

Reducing WebSocket Tick-to-Screen Latency in the Trading UI

The market data interface I built streams live order book updates over WebSocket. Here's how I tracked down a 3ms latency spike to a JSON serialisation bottleneck and fixed it.

When I launched the initial version of trading.jordandouglaswhite.com, the order book updates looked fine most of the time - but every few seconds there’d be a visible stutter. Tick-to-screen latency would spike from ~1ms to 3-4ms.

This post is about tracking that down.

Baseline measurement

First step: instrument the path. I added timestamps at three points:

  1. The server receives a raw UDP datagram from the exchange feed
  2. The server sends the processed update over WebSocket
  3. The browser’s WebSocket onmessage handler fires

The gaps told the story:

  • Network (feed → server): steady ~0.2ms
  • Server processing (receive → send): steady ~0.15ms
  • WebSocket delivery (send → onmessage): normal 0.6ms, spikes 2.8ms

The spikes were in the transport, not my processing. Two candidates: Nagle’s algorithm, or the browser’s event loop.

Nagle’s algorithm

Nagle coalesces small TCP writes into larger segments to reduce packet count. For a latency-sensitive stream, it’s catastrophic. I was running the WebSocket server in C++ using a library that didn’t set TCP_NODELAY by default.

int flag = 1;
setsockopt(fd, IPPROTO_TCP, TCP_NODELAY,
           reinterpret_cast<char*>(&flag), sizeof(flag));

That cut the spike rate roughly in half but didn’t eliminate them.

JSON serialisation as the actual bottleneck

The remaining spikes correlated with order book snapshots rather than incremental updates. Snapshots happen when a new client connects or after a sequence gap - they contain the full book, ~200 price levels.

I was serialising to JSON using a reflection-based library. Profiling showed it was allocating for every field name and value string during serialisation - triggering the allocator under high message rate, and occasionally causing a GC pause in the JavaScript runtime when very large JSON strings landed.

The fix was two-part:

On the server: Switch to a pre-computed, template-filled binary format (custom simple protocol, not FIX - overkill here) for incremental updates, and only send JSON for the initial snapshot. Binary ticks are ~40 bytes vs ~400 bytes for JSON equivalents.

On the client: Parse incremental updates in a Web Worker so the main thread isn’t blocked parsing when the JS engine decides to collect.

// main.ts
const worker = new Worker('/tick-worker.js');
ws.onmessage = (e) => worker.postMessage(e.data);
worker.onmessage = (e) => updateOrderBook(e.data);

// tick-worker.js
self.onmessage = (e) => {
  const tick = parseBinaryTick(e.data);
  self.postMessage(tick);
};

Results

After both changes:

MetricBeforeAfter
p50 tick-to-screen0.9ms0.7ms
p99 tick-to-screen3.2ms0.9ms
Max observed8.1ms1.4ms

The stutter is gone. The p99 improvement is the meaningful one - that’s what makes the UI feel responsive during volatile markets when message rate is highest.

What I’d do differently at lower latency budgets

For a proper trading terminal where you’re making decisions based on the book, you’d want:

  • Kernel bypass networking (DPDK or RDMA) from the exchange
  • Shared memory between the feed handler and the display process instead of sockets
  • A fixed-size ring buffer on the server side to avoid any allocation on the hot path

The WebSocket path is fine for a demo/monitoring context, but for actual HFT the display is the wrong place to be optimising - you’re not trading on what you see, you’re trading on what the co-located system sees.