AS-BUILT · tradeit-analytics · revised 2026-05-05
Revised 2026-05-05 — as-built state
All phases (1-5) complete. Live in production.
Aurora Serverless v2 merged target. DMS for steamarbitrage bulk load. Fargate scheduled task for 7 pricing tables across 4 strategy groups. analytics-mcp Fargate service live at analytics-mcp.tradeit.gg.
SOLUTION
Aurora Serverless v2 merged cluster (writer-only) DMS daily for steamarbitrage Fargate every 5–15 min for pricing ~$115–160/mo
2026-04-29 design review — all implemented and live
Persistent Aurora Serverless v2 cluster, writer-only — operator access only, not website traffic. 0.5-2 ACU range, idles toward 0.5 ACU when nobody's querying.
VPC/IAM-attached, unlimited runtime — supports bulk loads beyond the 15-min ceiling that constrains Lambda. Two tasks: pricing-replicator (0.25 vCPU / 0.5 GB, every 15 min) and analytics-mcp (ARM64, 0.25 vCPU / 512 MB) with cloudflared sidecar.
bot_trade_item_logs can mutate up to 8 days post-creation (Steam revert). Source DDL fix applied (Phase 0 / OPS-208): ON UPDATE CURRENT_TIMESTAMP + updated_at_idx, INPLACE, no write blocking. Live.
DMS full-load completed 2026-04-30. 167 tables, zero errors. Gap-fill DMS reload-target (OPS-225) completed 2026-05-05: 41 tables, ~196M rows, ~1h47m wall-clock. DMS daily scheduler cancelled (cost-prohibitive) — on-demand only per OPS-feedback.
reserved_items (OPS-230) — LIVENew strategy: INSERT side uses WHERE reserved_at > watermark with 5-min lookback. DELETE side reads reserved_transactions for type IN (WITHDRAW, DELETE, EXPIRED, FAIL). Cold-start seeds CDC watermark to MAX(tx_id) to skip historical sweep. DELETE-before-INSERT order handles asset_id reuse race correctly.
What this spec covers
steamarbitrage.* via AWS DMSpricing.* via Fargatei-0adec4087a63a67b5Inherited from project decision doc — all hard
Aurora merged cluster fed by two independent pipelines — as-built 2026-05-05
Fargate ARM64 service (0.25 vCPU / 512 MB) running the tradeit-analytics MCP server with a cloudflared sidecar. Cloudflare Worker OAuth proxy fronts it at analytics-mcp.tradeit.gg, providing Claude AI with authenticated read-only access to the merged Aurora cluster via plain MySQL queries. Deployed Phase 4 cutover 2026-05-04. ~$8-15/mo.
15-min cadence · per-table strategy dispatch
All in eu-west-1c, all VPC-internal — no EC2-MySQL
Aurora MySQL 8 Serverless v2, 0.5-2 ACU range. Writer-only — operator access only, no website-traffic SLA. Native PITR, automatic backups.
~$50-90/mo · idles toward 0.5 ACU when no operators are querying
db.t4g.medium reader on pricing cluster. Reuse if cluster already has a reader. Storage-layer replication, no binlog needed, no writer impact.
~$50/mo (or $0 if reusing)
dms.t3.medium, ephemeral. Started by daily Lambda or EventBridge → runs full-load task → terminates. Used for steamarbitrage daily refresh and Group C/D bootstrap.
~$5-10/mo (ephemeral) or ~$30/mo (always-on)
pricing-replicator: 0.25 vCPU / 0.5 GB. Container: Node 22 + mysql2. Triggered by EventBridge every 5-15 min. VPC-attached. No time limit.
~$5/mo
pricing._repl_state(table_name PK, last_id, last_updated_at, last_run_at, rows_last_run, status, error_message)
Single source of truth per source table
New: tradeit-analytics/pricing-reader-ro, tradeit-analytics/merged-aurora-rw. DMS endpoint roles, Fargate task role.
No KMS grants needed (Aurora-native encryption)
Why each table got its strategy — size × write-frequency
banned_users
full-refresh every run
trade_revert_reserved_items
full-refresh every run
user_favorite_items
full-refresh every run
pricing_groups
39 rows · static · full-refresh every run
Strategy: full-table refresh on every Fargate run. Negligible cost; eliminates drift risk on tiny reference data.
reserved_items
incremental-with-cdc-deletes · 15-min cadence · OPS-230
Strategy: incremental-with-cdc-deletes (OPS-230) — previously full-refresh at 4h cadence, now incremental at 15 min.
WHERE reserved_at > watermark with 5-min lookbackreserved_transactions for type IN (WITHDRAW, DELETE, EXPIRED, FAIL)MAX(tx_id) — first run skips the historical sweepasset_id reuse race correctlyitem_prices_raw
36k · updates every 10 min
item_price_adjust_percent
36k · updates every 10 min
store_item_stats
36k · updates every 5 min
trade_item_stats
36k · updates every 5 min
Strategy: full-table refresh every 5 min. 36k rows is sub-second to load. Eliminates incremental complexity entirely — no updated_at tracking, no watermarks, no UPSERT race conditions.
REPLACE INTO pricing.item_prices_raw
SELECT * FROM pricing_source.item_prices_raw;
-- same shape × 4 tables in this group
item_eod
millions · daily inserts only, no updates
Strategy: incremental by id, daily cadence. No need for the 5-15 min cycle since rows only land once a day.
SELECT <columns>
FROM pricing_source.item_eod
WHERE id > :last_id
ORDER BY id LIMIT 50000;
-- paginate, INSERT into target, advance watermark
bot_trade_item_logs
118M+ rows · updates+inserts continuously · incremental-updated-at (OPS-208)
guess_questions
15-min cadence · 2ms steady-state · OPS-226
guess_questions is now on 15-min cadence after the source-side idx_updated_at_id covering index landed (OPS-226, 2026-05-05). Steady-state query time: ~2ms.
bot_trade_item_logs: Activity log: every transaction on the site. Most mutations land within 15 min (trade auto-cancel). Steam's trade-revert feature can mutate a row up to 8 days after creation. Uses updated_at_idx from OPS-208.
Strategy: incremental by updated_at with UPSERT, every 5-15 min. The watermark predicate catches reverts AT ANY AGE because the Phase 0 source DDL fix makes updated_at auto-bump on every mutation at the DB layer (independent of app code).
SELECT <columns>
FROM pricing_source.bot_trade_item_logs
WHERE updated_at >= :last_run_at - INTERVAL 5 MINUTE
ORDER BY updated_at, id LIMIT 50000;
-- uses updated_at_idx (added in Phase 0)
-- watermark catches an 8-day-old revert as easily as a 5-min-old insert
Loaded with INSERT ... ON DUPLICATE KEY UPDATE for idempotent replay.
Phase 0 DDL (in scope): add ON UPDATE CURRENT_TIMESTAMP + updated_at_idx on source. INPLACE + LOCK=NONE on Aurora MySQL 8 — ~30-90 min on 118M rows, no write blocking, ~1.5-3 GB extra storage.
Bootstrap via DMS, not Fargate paginated. 118M rows full-load in ~30-90 min on parallel streams. See Phase 3.
Log scale — the gap between 39 and 118M rows is 6.5 orders of magnitude
Lag targets per group — alert thresholds drive Slack pages
Bar width = relative lag tolerance per cycle. Full bar = sole table on its own cadence; shorter bar = tighter SLO for that group's run cycle.
tradeit-analytics/PricingReplicatorRowsReplicated, RunDurationMs, LagSeconds, Errors per {table}FullLoadProgressPercent, throughput, validationAll 5 phases complete — live in production as of 2026-05-05
On the production pricing cluster, run two ALTERs on bot_trade_item_logs before any sync runs:
ALTER TABLE bot_trade_item_logs
MODIFY COLUMN updated_at DATETIME DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP,
ADD INDEX updated_at_idx (updated_at),
ALGORITHM=INPLACE, LOCK=NONE;
~30-90 min on 118M rows. INPLACE + LOCK=NONE means no write blocking. ~1.5-3 GB additional index storage (Aurora auto-scales). Coordinate with the team that owns the pricing cluster. Verify with one synthetic UPDATE that updated_at bumps as expected before declaring Phase 0 complete.
steamarbitrage + pricing schemas (empty)pricing._repl_state watermark tablepricing_staging.* schemaRENAME TABLE pricing_staging.* TO pricing.* (2026-04-30) — preserves watermarks, no data movementDMS full-load of steamarbitrage completed 2026-04-30. 167 tables, zero errors. Ephemeral restore cluster used (zero prod impact). Scope reduced: pricing C/D and Group A/B bootstrap were moot — pricing.* already populated via Phase 2 cutover.
Gap-fill DMS reload-target (OPS-225) completed 2026-05-05: 41 tables, ~196M rows backfilled, ~1h47m total wall-clock.
pricing-replicator-schedule + daily pricing-replicator-daily for Group C)analytics-mcp.tradeit.ggsteamtrade.* → pricing.*/steamarbitrage.*)i-0adec4087a63a67b5 terminatedvol-08d3b49fbcf44f43a (500 GB) deletedtradeit-analytics-merged-mysql deletedtradeit-analytics-restore-pipeline deletedalias/rds-restore scheduled deletion 2026-05-07Unit: predicate builders per group, REPLACE/UPSERT SQL gen, watermark math, pagination loop termination, full-refresh idempotency.
Integration: ephemeral MySQL via docker-compose. Each group's strategy verified against synthetic data. Late-arriving update (8-day-old row) caught by updated_at predicate.
Staging E2E (Phase 2): ≥24h soak against pricing_staging.*. Row-count and MAX(id) / MAX(updated_at) parity per table.
Aurora load test: verify Serverless v2 ACU range (0.5-2) handles peak Fargate write rate without throttling.
Observed monthly steady-state — as of 2026-05-05
| Item | $/mo |
|---|---|
| Aurora merged cluster (Serverless v2, writer-only, 0.5-2 ACU, briefly 8 ACU max during heavy DMS reloads) | ~$30-60 |
| Aurora pricing reader (db.t4g.medium, same-AZ) | ~$50 |
| DMS replication instance (dms.t3.medium, on-demand only) | ~$50 |
| Fargate pricing-replicator (0.25 vCPU / 0.5 GB, every 15 min) | ~$15-25 |
| Fargate analytics-mcp (ARM64, 0.25 vCPU / 512 MB + cloudflared sidecar) | ~$8-15 |
| Cloudflare Worker (OAuth proxy) | free tier |
| CloudWatch Logs + Metrics | ~$3 |
| Data transfer (same-AZ VPC) | $0 |
| Total monthly steady-state | ~$110–160 |
Stays under P2 ceiling. Aurora Serverless v2 observed at 0.5 ACU in steady state, with brief spikes to 8 ACU max during heavy DMS reloads. DMS instance is on-demand (not always-on) — the ~$50/mo figure assumes occasional bulk operations. Cloudflare Worker on free tier.
updated_at — DONE (Phase 0 / OPS-208)ON UPDATE CURRENT_TIMESTAMP + updated_at_idx applied INPLACE on the 118M-row table. Live.
idx_updated_at_id on guess_questions — DONE (OPS-226, 2026-05-05)Covering index landed on the source cluster. guess_questions now on 15-min cadence at ~2ms steady-state query time.
41 tables, ~196M rows backfilled, ~1h47m total wall-clock.
reserved_items — DONE (OPS-230)New strategy live at 15-min cadence. Replaces prior full-refresh at 4h cadence. See Group A section for full strategy details.
analytics-mcp Cloudflare Worker OAuth proxy. Low urgency; token rotation and expiry hygiene improvements.
Row-level security / redacted views for PII fields. Not blocking current MCP usage (analytics_ro already scoped).
APM traces for the Fargate replicator task. CloudWatch metrics cover current observability needs.
Not auto-migratable. Requires manually updating each saved question DSN + schema references from steamtrade.* → pricing.* / steamarbitrage.*. No timeline set.
Currently on-demand full-load only. For sub-day freshness, evaluate DMS CDC (requires binlog_format=ROW on writer). Cost sign-off required before any recurring DMS task for steamarbitrage.
All implementation-time questions resolved — see "Resolved" line below
No open questions remain. Ongoing operational concerns (DMS daily refresh schedule, Metabase + MCP DSN cutover, EC2 retirement) are tracked in Linear (OPS-205, OPS-206, OPS-207).
bot_trade_item_logs plural, confirmed by DDL) · updated_at column status (folded into Phase 0) · Aurora sizing (Serverless v2 writer-only) · bootstrap path for 118M rows (DMS, not Fargate) · pricing reader instance provisioned (pricing-test-reader-analytics) · Group B refresh cadence locked to 15 min (5-min freshness sufficient given 10-min source updates) · Phase 0 DDL ran 2026-04-29 outside any specific window · DMS source for steamarbitrage: snapshot-restored ephemeral cluster (NOT live reader). P1 zero-prod-impact makes this non-negotiable; live reader endpoint puts read load on the source writer when no replicas exist.
Generated 2026-04-26 · visualised 2026-04-29 · as-built 2026-05-05 · tradeit-analytics · branch main
This HTML is the canonical source of truth for the architecture. All phases complete. Live in production.