Architecture · ops.tradeit.gg
flowchart TD
Users[Users / Clients]
subgraph Cloudflare
CF[Cloudflare DNS + CDN + WAF]
end
subgraph AWS["AWS Account"]
subgraph PROD["Production Environment"]
EC2P[EC2 Instance]
DockerP[Docker Containers]
Redis[Redis]
end
subgraph STAGE["Staging Environment"]
RedisS[Redis]
EC2S[EC2 Instance]
DockerS[Docker Containers]
end
RDS[(RDS Database)]
DD[Datadog Agent]
end
Users --> CF
CF --> EC2P
CF --> EC2S
EC2P --> DockerP
EC2S --> DockerS
DockerP <--> Redis
DockerP --> RDS
DockerS <--> RedisS
DockerS --> RDS
DockerP --> DD
DockerS --> DD
flowchart LR
subgraph AWS["AWS eu-west-1"]
subgraph VPC["Default VPC 172.31.0.0/16"]
IGW[Internet Gateway]
RT[Main Route Table]
subgraph AZA["eu-west-1a"]
SA[Subnet subnet-5af85e3d]
end
subgraph AZB["eu-west-1b"]
SB[Subnet subnet-f98f23b0]
end
subgraph AZC["eu-west-1c"]
SC[Subnet subnet-5f977504]
end
end
end
IGW --> RT
RT --> SA
RT --> SB
RT --> SC
flowchart LR EC2A[EC2 - AZ A] EC2B[EC2 - AZ B] RDSC[RDS - AZ C] CacheC[ElastiCache - AZ C] EC2A -->|Cross-AZ $| RDSC EC2B -->|Cross-AZ $| RDSC EC2A -->|Cross-AZ $| CacheC EC2B -->|Cross-AZ $| CacheC
Purpose
This document describes the current and target infrastructure architecture of Zengaming (TradeIT). It is intended for:
Zengaming (TradeIT) runs primarily in AWS eu-west-1 (Ireland) on the Default VPC model (public subnets). Workloads are deployed as Docker containers on EC2 using SSH + docker run (no ECS).
Key observations (validated via AWS + Datadog):
1a: 5, 1b: 14, 1c: 18)A target architecture is defined to reduce cost, improve scalability, and minimize public exposure.
Recommendation: In the future, split prod / non-prod into separate accounts for blast-radius and governance.
| Key | Value |
|---|---|
| Primary region | eu-west-1 (Ireland) |
| Availability Zones used | eu-west-1a, eu-west-1b, eu-west-1c |
| AZ strategy | One subnet per AZ (Default VPC) |
| High availability | Depends on EC2 / target distribution |
Notes:
1a: 5, 1b: 14, 1c: 18)| Key | Value |
|---|---|
| VPC type | Default VPC |
| CIDR | 172.31.0.0/16 |
| DNS resolution | Enabled |
| DNS hostnames | Enabled |
| Internet Gateway | Attached |
| Availability Zone | Subnet ID |
|---|---|
| eu-west-1a | subnet-5af85e3d |
| eu-west-1b | subnet-f98f23b0 |
| eu-west-1c | subnet-5f977504 |
| Key | Value |
|---|---|
| Subnet type | Public |
| NAT Gateways | None |
| Outbound access | Direct via Internet Gateway |
| Private subnets | Not in use |
Implications:
The platform runs application workloads directly on EC2 instances using Docker, without a container orchestrator (ECS/EKS).
deploy.sh)This design places EC2 as a primary operational and cost factor, requiring explicit documentation of placement, exposure, and ownership.
| Metric | Value |
|---|---|
| Total running instances | 595 |
| AZs in use | eu-west-1a, eu-west-1c |
| Dominant AZ | eu-west-1c |
| Publicly accessible | Majority (public IPv4) |
| Instance age range | 2017 → 2025 |
Key observation:
There is no meaningful workload distribution strategy across AZs.
Most compute (especially bots and inventory workloads) resides in eu-west-1c, creating:
- AZ blast-radius risk
- Cross-AZ data transfer costs when dependencies live elsewhere
eu-west-1a
eu-west-1c (dominant)
Risk:
Inventory, bots, and backend traffic are heavily concentrated in 1c, while some dependencies (historically OpenSearch / RDS endpoints) span other AZs → cross-AZ traffic amplification.
| Role | Example Instances | Instance Types | Notes |
|---|---|---|---|
| Edge / Entry | Login, Socket | c7a.large | Public-facing |
| Backend API | tradeit backend | t3.large | Public IPv4 |
| Inventory | Inventory | c5.4xlarge | High CPU / IO |
| Bots | bot-* | t3.micro | Large fleet |
| Legacy | old.tradeit.gg | c5.xlarge | Long-lived |
| Test / Staging | stg / test | t2.small / t3.small | Aged |
Implications
docker pulldocker rmdocker runContainer networking patterns
-p <hostPort>:3000--network host (notably inventory)Risk:
--network hostbypasses container isolation and couples service behavior directly to the host’s network stack.
--restart unless-stoppedSecurity concern:
Lack of IAM roles prevents least-privilege access to AWS services and complicates auditability.
ServiceEnvironmentOwnerCriticality| Layer | Current | Target |
|---|---|---|
| Compute | EC2 + Docker | ECS (EC2) |
| Placement | Manual | AZ-aware |
| Scaling | Manual | Service-based |
| Networking | Public IPv4 | Private + LB |
| Identity | SSH / static | IAM roles |
| Resilience | Host-bound | Task rescheduling |
This section intentionally documents reality, not aspiration.
Migration paths are defined elsewhere in the document.
The platform does not use AWS ALB or NLB.
All ingress traffic is terminated on NGINX running directly on EC2 instances, which act as:
Cloudflare functions strictly as an external CDN + WAF, not as an internal AWS load balancer.
Client ↓ Cloudflare (DNS, CDN, WAF) ↓ HTTPS (443) Elastic IP (EC2 instance) ↓ NGINX (EC2) ↓ Upstream services (Docker containers / other EC2 hosts)
| Layer | Component | Responsibility |
|---|---|---|
| Edge | Cloudflare | DNS, CDN, WAF, basic rate limiting |
| AWS L7 | ❌ ALB | NOT USED |
| AWS L4 | ❌ NLB | NOT USED |
| Application LB | NGINX on EC2 | Primary load balancer |
NGINX performs:
least_conn upstream balancing/api, /socket.io, /blog, /news, etc.)limit_req)| Layer | TLS Terminated? | Certificate Location |
|---|---|---|
| Cloudflare | Optional (Flexible / Full) | Cloudflare-managed |
| EC2 (NGINX) | YES (Authoritative) | /etc/nginx/*.crt |
| ALB | ❌ | Not applicable |
Source of truth: TLS is terminated inside AWS on NGINX, using local certificate files.
| Port | Purpose | Exposure |
|---|---|---|
| 443 | HTTPS | Public (via EIP) |
| 80 | HTTP → internal routing / health | Public |
| 3333 | Internal API proxy | Restricted |
| 3000–400x | Backend services | Internal only |
Public exposure is controlled by Security Groups, not subnet isolation.
| Aspect | Current State |
|---|---|
| Allocation model | One or more EIPs per EC2 |
| Count | High (hundreds observed historically) |
| Attachment | Directly to EC2 ENIs |
| Risk | Cost, quota exhaustion, ops complexity |
Important implications:
From nginx.conf:
upstream backend { least_conn; ... }limit_req_zone/healthcheckThis confirms:
NGINX is the actual L7 load balancer
❌ Incorrect (previous diagrams):
✅ Correct representation: [ Internet ]
[ Cloudflare ]
[ Elastic IP ]
[ EC2 + NGINX ]
[ Backend EC2 / Containers ]
ALB must not appear in diagrams unless it actually exists.
| Area | Risk |
|---|---|
| Scalability | Manual EC2 + EIP scaling |
| Availability | No managed LB failover |
| Security | Public EC2 exposure |
| Operations | SSH-managed infra |
| Cost | EIP sprawl + cross-AZ traffic |
| Component | Target |
|---|---|
| Ingress | AWS NLB (static EIPs) |
| TLS | ACM |
| Compute | Private EC2 / ECS |
| NGINX | Internal only |
| Cloudflare | DNS + WAF only |
This would:
Cloudflare is the edge; NGINX on EC2 is the real load balancer; AWS ALB is not used.
Cloudflare is the authoritative edge and DNS layer for all public-facing traffic for the tradeit.gg platform.
It provides:
Cloudflare is external to AWS and always sits in front of AWS infrastructure.
| Item | Value |
|---|---|
| Account | TRADEIT.GG |
| Managed zones | tradeit.gg, steam-trade.com, connect-tradeit.com, vmarket.gg |
| Nameservers | boyd.ns.cloudflare.com, leah.ns.cloudflare.com |
| DNS mode | Full (Cloudflare authoritative DNS) |
| Plan | Enterprise (for tradeit.gg) |
Client
↓ HTTPS (443)
Cloudflare Edge
- TLS termination
- WAF / rate limits
- Bot protection
- CDN
↓ HTTPS
AWS Origin (ALB or EC2)
Cloudflare is the only internet-facing entrypoint. AWS origins are never exposed directly unless explicitly documented.
DNS & Routing Strategy
Cloudflare DNS is the single source of truth for traffic routing.
Routing patterns in use:
Production Hostnames
Hostname: tradeit.gg Target: ntginx alb Proxy: Proxied Notes: Primary production entry
Hostname: www.tradeit.gg Target: tradeit.gg Proxy: DNS only Notes: Alias
Hostname: api.tradeit.gg Target: EC2 hostname Proxy: Proxied Notes: Legacy direct-to-instance
Hostname: inventory.tradeit.gg Target: EC2 IPv4 Proxy: Proxied Notes: Service-specific endpoint
Staging Hostnames
Hostname: stg-lb.tradeit.gg Target: nginx alb Proxy: Proxied Notes: Main staging
Hostname: stg-banana.tradeit.gg Target: nginx alb Proxy: Proxied Notes: Legacy / parallel
Hostname: stg-tunneled.tradeit.gg Target: Cloudflare Tunnel Proxy: Proxied Notes: No public origin
Proxy Status Policy
Public web / API: Always proxied ALB origins: Always proxied Email (MX, DKIM, SPF): DNS only Verification records: DNS only SaaS integrations: Case-by-case
Any A or CNAME record pointing to AWS infrastructure must be proxied unless explicitly documented.
TLS & Certificates
TLS terminates at Cloudflare Edge.
Client ↔ Cloudflare:
Cloudflare ↔ Origin:
TLS mode:
No plaintext HTTP traffic is intentionally exposed.
Security Controls
Cloudflare enforces:
High-value domains are fully proxied by default.
Email & Non-HTTP DNS Records
Cloudflare DNS hosts records for:
These records are DNS-only by design.
| Key | Value |
|---|---|
| Engine | Aurora MySQL |
| Deployment | Aurora cluster |
| Subnet group | Default (public) |
| Public endpoint | Enabled |
| Inbound access | Restricted via Security Groups |
| Internet-wide access | Blocked |
Notes:
| Key | Value |
|---|---|
| Engine | Redis |
| Public access | No |
| Inbound access | EC2 security groups |
| Port | 6379 |
--add-host ...).| Role | Name | Private IP | AZ | Instance type |
|---|---|---|---|---|
| Client / Coordinating | OpenSearch Cluster Client Coordinating Node | 172.31.25.126 | eu-west-1b | t3.medium |
| Index Node | OpenSearch Cluster Dedicated Index Node | 172.31.36.183 | eu-west-1c | c5.2xlarge |
| Staging backend host | TI Backend Stg | 172.31.45.74 | eu-west-1c | t3.large |
Key finding (cost + latency):
DISABLE_QUEUE=1)secrets_list.json)ssm_parameters_list.json)| Key | Value |
|---|---|
| Firewall type | Stateful |
| Default inbound | Deny all |
| Default outbound | Allow all |
| Scope | EC2, ALB, RDS, Redis, EIPs, OpenSearch nodes |
| Component | Allowed Source |
|---|---|
| ALB | Cloudflare IP ranges |
| EC2 app hosts | ALB SG and/or trusted IPs |
| Redis | EC2 application SGs |
| RDS | EC2 SGs / approved IPs |
| OpenSearch | EC2 application SGs (recommended) |
Hard rules:
Cost note:
RTO / RPO to be defined.
The platform relies heavily on EC2-attached EBS volumes, primarily used as root disks for Docker-based services running directly on EC2.
There is no centralized backup or lifecycle policy currently enforced.
| Attribute | Value |
|---|---|
| Volume type | gp2 |
| Typical size | 8 GiB |
| Typical IOPS | 100 |
| Encryption | Mostly disabled |
| Attachment | Single-instance |
| Delete on termination | Enabled (root volumes) |
| AZ distribution | Mostly eu-west-1c |
Observations:
| Item | Status |
|---|---|
| Automated snapshots | ❌ Not configured |
| DLM policies | ❌ None |
| Recent backups | ❌ 0 / 598 volumes |
| Snapshot origin | Mostly AMI creation |
| Snapshot age | Some dating back to 2018 |
Important:
ServiceEnvironmentOwnerThis account contains a large number of Amazon EBS volumes, mostly small (gp2, ~8 GiB) root volumes attached to EC2 instances running Docker-based services.
Without active management, EBS volumes and snapshots can become a significant hidden cost.
gp3 is the recommended default EBS volume type.
| Volume Type | Cost (EU-West-1) | Notes |
|---|---|---|
| gp2 | Higher | Performance tied to size |
| gp3 | ~33% cheaper | Performance independent of size |
Benefits
Cross-AZ Traffic
Elastic IP Sprawl
RDS Cost Amplification
| Component | Target Design |
|---|---|
| Ingress | NLB with 2–3 static EIPs, dual-stack IPv4/IPv6 (if feasible) |
| Compute | Private EC2 or ECS (optional later) |
| OpenSearch | Internal NLB + DNS, or migrate to managed service |
| Database | Aurora aligned with main compute AZs (A/B or B/C depending) |
| Cache | ElastiCache co-located with compute |
Expected outcomes:
This environment uses CircleCI for build and (in some cases) deployment. Deployments are still Docker-on-EC2 (not ECS), executed via SSH to target hosts (often via a bastion).
.circleci/config.yml$CIRCLE_PROJECT_USERNAME/<image>)Job: build Runtime: cimg/base:2022.03 + setup_remote_docker Output: Docker image pushed to Docker Hub
Build behavior:
branch-name transformed with / → -- Example: feature/new-ui → feature--new-ui
CIRCLE_TAG (used as-is)--build-arg values to inject runtime configuration at build-time.Risk note (important):
Using
--build-argfor secrets (tokens/keys) may bake secrets into image layers if passed viaARG → ENVinside Dockerfile.Prefer runtime secrets via SSM/Secrets Manager or Docker runtime env vars.
Build steps summary: 1) Docker Hub login (DOCKER_LOGIN/DOCKER_PASSWORD) 2) docker build with many build args 3) docker push
Slack:
tradeit-builds on success/failure with metadata (branch, author, commit, job link)Job: deploy Runtime: CircleCI machine executor (Ubuntu)
Deployment method:
dmz/open_tunnel orb opens a local port to the target0.0.0.0:$BASTION_PORT as a proxy to the target EC2 instanceDeploy steps (current): 1) Add SSH keys (CircleCI-managed fingerprint) 2) Pre-approve bastion host key via ssh-keyscan 3) Open tunnel to target via bastion 4) On target host:
docker image prune -fadocker pull <image>:<tag>docker tag ... new-tradeitdocker rm -f new-tradeitdocker run --name new-tradeit -p 3000:3000 -dit --restart unless-stopped new-tradeitSlack:
tradeit-buildssuccess_tagged_deploy_1)Workflow: build-docker-image-and-push
feature/*, hotfix/*, bugfix/*, release/*staging-*, dev-*master, likeprodvX.Y.Z...Note: Current deploy job appears primarily used for staging-style deployment via bastion.
Production deployments may still be performed via manual scripts or separate pipelines depending on service.
Planned improvements:
# Docker on EC2 — Operations Documentation
---
## 1. Service Info
- **Service name:** tradeit / tradeit-backend / tradeit-inventory-server
- **Environment:** prod / stg / likeprod
- **Repository:** https://github.com/zengamingx/<repo>
- **Docker image:** zengamingx/<image>
- **Runtime:** Docker + PM2
- **Deployment type:** SSH-based deployment
- **Owner:** DevOps / Platform
- **Slack channel:** #tradeit-dev
---
## 2. Hosts & Placement
| Env | Instance Name | Instance ID | AZ | Type | Public | Notes |
|---|---|---|---|---|---|---|
| prod | Login | i-0388bbadd9fb77b30 | eu-west-1a | c7a.large | Yes | User entrypoint |
| prod | Socket | i-07c992a075c2a5aeb | eu-west-1a | c7a.large | Yes | WebSockets |
| stg | TI Backend Stg | i-0089b6d6d03867c2d | eu-west-1c | t3.large | No | Backend |
---
## 3. Networking
- **Inbound path:**
Cloudflare → EC2 (direct EIP / Public IP)
- **Host ports:**
3000, 4001–4005 (domain-based routing)
- **Container ports:**
3000
- **Docker network mode:**
bridge (inventory uses host)
- **Security groups:**
- `sg-00219c4fb3212cea4` (base)
- `sg-051880495a8dddc9a` (Cloudflare HTTPS)
⚠️ **Important**
- Any instance with a public IP is internet-reachable
- 0.0.0.0/0 must never be allowed unintentionally
- SSH restricted to trusted IPs only
---
## 4. Dependencies
| Dependency | Endpoint | Port | AZ Sensitive | Notes |
|---|---|---:|---|---|
| Backend API | 172.31.45.74 | 3000 | Yes | Hardcoded |
| Aurora MySQL | cluster endpoint | 3306 | Yes | Public endpoint, SG-restricted |
| Redis | ElastiCache | 6379 | Yes | Same AZ preferred |
| OpenSearch | 172.31.25.126 / 172.31.36.183 | 9200 | Yes | Hard-pinned IPs |
| External APIs | Stripe, Intercom | 443 | No | Internet |
---
## 5. Build
- **Dockerfile:**
`/Dockerfile`
- **Secrets used at build time:**
- NODE_AUTH_TOKEN
- DOTENV_KEY
- FONT_AWESOME_TOKEN
⚠️ **Risk**
- Secrets passed via `ARG → ENV` are baked into image layers
- Migration to runtime secrets recommended
---
## 6. Deploy Procedure (Current)
### Summary
- SSH-based deploy via `deploy.sh`
- Manual approval for prod
- No health checks
- No rollback automation
This NGINX instance acts as a public-facing edge proxy responsible for:
This service is internet-exposed and participates in the Direct-to-Instance ingress architecture (no AWS ALB in front).
Client ↓ Cloudflare (CDN, WAF, optional TLS) ↓ NGINX (this service, public IP / EIP) ↓ Internal services (Docker / EC2 private IPs)
Notes
server_tokens off| Port | Purpose |
|---|---|
| 80 | Healthcheck only |
| 443 | Main HTTPS entrypoint |
| 3333 | Internal API v2 proxy |
| Path | Target |
|---|---|
/api/v2/ | Backend upstream (port 3000) |
/socket.io/ | WebSockets |
/static/ | Static service |
/blog/ | Internal HTTPS service |
/news/ | Internal HTTPS service |
/oauth2/ | OAuth service |
/imgproxy | External image proxy |
# For more information on configuration, see:
# * Official English Documentation: http://nginx.org/en/docs/
# * Official Russian Documentation: http://nginx.org/ru/docs/
user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
worker_rlimit_nofile 8192;
include /usr/share/nginx/modules/*.conf;
events {
worker_connections 8192;
multi_accept on;
use epoll;
}
http {
limit_req_zone $host zone=global_rate_limit:10m rate=100r/s;
log_format rate_limit_log '$remote_addr - $host [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$request_time" "$upstream_response_time" '
'"$pipe" "$limit_req_status"';
limit_conn_zone $binary_remote_addr zone=conn_limit_per_ip:10m;
limit_req_zone $binary_remote_addr zone=req_limit_per_ip:10m rate=85r/s;
resolver 8.8.8.8 ipv6=off;
sendfile on;
tcp_nopush on;
tcp_nodelay on;
keepalive_timeout 65s;
client_max_body_size 20M;
server_tokens off;
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;
ssl_session_cache shared:SSL:20m;
ssl_session_timeout 1d;
ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
ssl_prefer_server_ciphers on;
log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'"$host" reqtime=$request_time '
'upstream="$upstream_addr" '
'ratelimit=$limit_req_status '
'country=$http_cf_ipcountry';
access_log /var/log/nginx/access.log main_ext;
map $remote_addr $whitelisted {
default 0;
127.0.0.1 1;
::1 1;
10.0.0.0/8 1;
192.168.0.0/16 1;
}
upstream backend {
least_conn;
server ip-172-31-42-95.eu-west-1.compute.internal:3000;
server ip-172-31-44-148.eu-west-1.compute.internal:3000;
}
server {
listen 443 ssl http2 default_server;
server_name tradeit.gg www.tradeit.gg;
ssl_certificate /etc/nginx/tradeit.crt;
ssl_certificate_key /etc/nginx/tradeit.key;
limit_req zone=global_rate_limit burst=30 nodelay;
limit_req_status 429;
location /api/v2/ {
proxy_pass http://backend;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}
location / {
proxy_pass http://localhost:3000;
}
}
}
ops.tradeit.gg — Internal Engineering Docs