tradeit.gg

Week 18 · 2026

Engineering Operations

April 20 – 26

Highlight of the Week

Dependency hygiene system live across 12 repos · Socket-server 503s resolved · 935 vulnerabilities triaged

What We Shipped

Delivered This Week

Security visibility, automated dep upgrades, and a Slack-driven warden bot

npm Security Audit — All Repos
Full scan of 25 Node.js repos. 935 vulnerabilities found (67 critical, 383 high). Every repo got a Linear ticket with affected packages, auto-fix vs. manual, and effort estimates. Top 4 repos broken into subtasks.
First time we have full visibility · ~102h of fix work pre-planned
Ehud (via Claude Code) · Apr 20
Renovate Rolled Out — 12 Repos
Automated dependency-update bot live across all 12 engineering repos via shared preset (zengamingx/renovate-config). Security patches fast-tracked + auto-merged; regular updates grouped + scheduled.
Self-healing pipeline · security patches now hours, not weeks
Ehud (via Claude Code) · Apr 22
Warden Bot + CF Dashboard
Slack warden bot with 4-button flow (approve / merge / snooze / claim) and a Cloudflare Pages dashboard at ops.tradeit.gg/deps. 16-slide launch deck shipped to the team. 4 PRs auto-merged in the first 7 days.
Triage moves to Slack · no one logs into GitHub for routine bumps
Ehud · Apr 23–27

Incident Resolved

tradeit-socket-server — 503 Errors Cleared

WebSocket handshakes were being blocked at two layers — DB pool exhaustion and an over-eager nginx rate-limit

Symptom
Clients saw intermittent 503s on Socket.IO endpoints. WebSocket upgrades failing. Real-time trade events dropping for some users.
Reported Apr 26
Fix (PR #23)
  • MySQL pool sized 10 → 25 (pool exhaustion under burst)
  • Nginx Socket.IO upstream config tuned
  • Removed /socket.io4 rate-limit blocking WS handshakes
Deployed Apr 26 · 503s gone, handshakes clean
Ehud · PR #23

Dependency Hygiene

End-to-End: From Audit to Auto-Merge

Visibility
npm audit across 25 repos · 935 vulns indexed in Linear · effort + risk per repo · CF dashboard at ops.tradeit.gg/deps shows live state.
Layer 1
Automation
Renovate runs against zengamingx/renovate-config · rangeStrategy, hostRules, ignoreDeps tuned this week · security PRs auto-merge, others queue for triage.
Layer 2
Triage
Slack warden bot posts grouped digest · 4-button flow keeps decisions in Slack · warden-guide.html with screenshots onboards the team in 5 min.
Layer 3
Open: Monday Digest Workflow Silent
The 08:45 ICT Monday digest workflow didn't fire today. Debug list: schedule mis-set / GH Actions permissions / token scope. Not blocking — manual digest sent.
Investigating · Apr 27

Internal Tooling

ops.tradeit.gg Sharpened

The internal ops portal got real navigation, real guides, and a cleaner data layer

Nav Reorg
Sections color-coded: Guides purple · Research amber. Predictable scanning when the portal grows.
Apr 23
warden-guide.html
Walkthrough doc with screenshots covering the warden bot flow end-to-end. New devs can self-onboard without a 1:1.
Apr 23
Data Layer Fixes
Fixed search filters in deps.json.js + post-warden-digest.js — both were missing creator=renovate[bot] matches, dropping a chunk of the dataset.
Apr 23
Renovate Config Tightened
rangeStrategy for package.json, hostRules for private-registry auth, ignoreDeps to suppress noisy lookups. Cleaner PR stream.
Apr 23

Reliability

System Health

All Systems Healthy

Site stable post-W17 keepalive fix · socket-server 503s resolved Apr 26

No new incidents in W18. Real-time event delivery back to baseline.

P1 Incidents
0

across the week

Resolved This Week
1

socket-server 503s (PR #23)

Running Tally

Cumulative Infra Savings

No new W18 line items — focus shifted to security & automation. Tally from prior weeks holds.

Monthly
$2,987

cumulative monthly run-rate reduction

Annualized
~$35.8K

/year locked in across 6 line items

Time Span
4 wks

Mar 30 → Apr 19 sweep window

Top contributors: Redis same-AZ ($1.3K) · ElastiCache replicas ($746) · EC2/EBS/EIP sweep ($385) · OpenSearch right-size ($390)

What's Next

Coming Up — Week 19

Monday Digest Cron — Debug
Get the 08:45 ICT warden digest workflow firing reliably. Inspect schedule, GH Actions perms, token scope. Add a heartbeat alert so silent failures are caught next time.
Vuln Backlog — Start Burning Down
Begin work on the 935-vuln backlog. Priority: tradeit-tradebot-server + pricempire-pricing. Renovate auto-merging the easy wins; humans on the breaking-change ones.
Data Nodes Perf Tweaks
Continue ticket #26 — data-nodes performance work. Targeting OpenSearch query latency on the heavier marketplace shards.

Week 18 in Numbers

Hygiene Shipped. Sockets Stable. Backlog Mapped.

12
repos on Renovate
935
vulns triaged (67 critical)
4
PRs auto-merged in 7d
1
incident resolved (PR #23)

Dependency upgrades now self-driving. Sockets clean. Security backlog visible for the first time — and shrinking.