Server Metrics — Real-Time Telemetry over Server-Sent Events

Server telemetry dashboards usually fall into one of two camps: a sprawling Grafana stack with TimescaleDB and a separate metrics agent, or a single endpoint that returns last-second values and gets polled every few seconds. The Server Metrics tool aims at neither — it is a dashboard that tells you what one Node.js process is doing right now, no agent, no time-series database, no extra services. Just an SSE stream and a ring buffer.

Server-Sent Events over WebSockets. Telemetry is one-way: the server pushes, the client never sends application data back. WebSockets would solve the same problem but introduce a duplex protocol where one direction is unused. SSE rides on plain HTTP/1.1 or HTTP/2, works through every proxy that handles HTTP, and reconnects automatically via the browser's built-in EventSource. The route handler at /api/metrics/stream returns a ReadableStream<Uint8Array> with Content-Type: text/event-stream, a 2-second setInterval, and an abort listener that cleans up when the client disconnects. There is no library — the entire stream is roughly 30 lines of route handler code.

Singleton request tracker with a rolling window. RPS, p95 latency, and error rate are all derived from one in-memory object: a singleton requestTracker that records {status, durationMs, timestamp} on every request via a withTracking route wrapper. The tracker keeps entries in an array and prunes anything older than 60 seconds on every read. RPS is entries.length / 60, error rate is 4xx+5xx / total, p95 is the 95th percentile of durations sorted ascending. Single-process, no Redis, no Prometheus exporter — fine for a single-VPS deployment, replaceable with an external store the day it stops being.

Exponential-backoff reconnect with a hard cap. The client hook useMetricsStream opens an EventSource and listens for error. On error it closes the stream and schedules a retry with delays of 1s, 2s, 4s, 8s, 16s — five attempts totalling about 31 seconds. After the fifth failure it stops trying and the connection badge switches to OFFLINE · click retry. This is a deliberate choice: silent infinite retry burns battery and bandwidth on mobile, and a user with an active page is better positioned to decide "something is wrong" than the script is. Manual retry via the badge resets the counter and reopens the stream from scratch.

Snapshot bootstrap before the stream takes over. When the page first renders, the SSE stream takes up to 2 seconds to deliver the first event — that would be 2 seconds of collecting… placeholders. Instead, the hook fires a one-shot fetch('/api/metrics/snapshot') in parallel with opening the stream, and uses whichever arrives first. The snapshot endpoint returns the same MetricsSnapshot shape the stream emits, so the StatsBar and SparklineCard components cannot tell the difference. First paint shows real data within roughly 100ms instead of 2s.

Threshold-based color coding for at-a-glance triage. CPU, memory and error-rate cells in the stats bar use three thresholds: under 50% is green, 50–80% is amber, over 80% is red. Any other coloring scheme — gradients, percentile-based, dynamic — adds cognitive load without adding signal. The threshold values come from Google's SRE error-budget conventions and are good enough for a personal-server dashboard. They are constants in the component file and not configurable, because configurability is overhead for a single-instance tool.

Sparklines as a 60-second visual buffer. Each sparkline card receives a 30-point ring buffer (the last 60 seconds at 2-second tick) and renders an SVG line with a faint area fill, a 50% reference gridline, and a dot on the latest value. The 60-second window is the same as the request tracker's rolling window — they are deliberately aligned so the sparkline you see and the RPS number you read are derived from the exact same data. Beyond 60 seconds, no history is kept; this is a real-time tool, not a historical one. Anyone who needs longer windows belongs in Grafana with a real time-series database, and that boundary is documented rather than hidden.

What the dashboard does not do. It does not alert — there is no threshold-triggered notification. It does not persist — restart the server and the buffer is empty. It does not multi-instance — every process has its own tracker, so a load-balanced fleet would need external aggregation. These omissions are not bugs; they are the line between "single-process observability you can ship in an evening" and "production telemetry stack." The dashboard is the first; anything past that is a different tool.

← Back to Docs