P2 · Apr 15, 2026
OS Load Spike + Backend Event-Loop Stalls — Two Problems, Causation Unproven
data_node_1 + data_node_4 hit load.1 5.3 (CPU 60–87%) for ~10min; backend Node.js event loop stalled 1–3s in same window AND independently at 14:41Z. Two problems, real correlation, causation NOT proven without OS metrics. PR #1246 enabled runtime metrics that made the event-loop problem visible.
P1 · Apr 14, 2026
OpenSearch Data Node Load Spike — Post-Deploy
2 of 4 data nodes breached load.norm > 1.0 after marketplace-aware rollout; new bulk engine (2 parallel workers + refresh-window) removed the natural pacing from per-chunk refresh: true; PR #114 backpressure pending
P1 · Apr 14, 2026
Slow Site Load — 10s TTFB
RDS steamarbitrage at 96% CPU with 3,400 connections + OpenSearch sequential queries causing 10s page loads
P1 · Apr 5, 2026
CPU & Load Spike — Two-Wave
Nuxt frontend nodes hit 92% CPU + OpenSearch data nodes reached 95%, one node crashed
P1 · Apr 4, 2026
Backend CPU & OpenSearch Degradation
OpenSearch data_node_4 at 60%+ avg CPU, two error bursts (~193/5min), backend CPU 52% peak