tradeit.gg
← Back to Index

Architecture · ops.tradeit.gg

Infrastructure

Contents


flowchart TD
    Users[Users / Clients]

    subgraph Cloudflare
        CF[Cloudflare DNS + CDN + WAF]
       
    end

    subgraph AWS["AWS Account"]

        subgraph PROD["Production Environment"]
            EC2P[EC2 Instance]
            DockerP[Docker Containers]
            Redis[Redis]
        end

        subgraph STAGE["Staging Environment"]
            RedisS[Redis]
            EC2S[EC2 Instance]
            DockerS[Docker Containers]
        end

        RDS[(RDS Database)]
        DD[Datadog Agent]
    end

    Users --> CF
    CF --> EC2P
    CF --> EC2S

     

    EC2P --> DockerP
    EC2S --> DockerS

    DockerP <--> Redis
    DockerP --> RDS
    DockerS <--> RedisS
    DockerS --> RDS

    DockerP --> DD
    DockerS --> DD
flowchart LR


    subgraph AWS["AWS eu-west-1"]
      subgraph VPC["Default VPC 172.31.0.0/16"]
        IGW[Internet Gateway]
        RT[Main Route Table]

        subgraph AZA["eu-west-1a"]
          SA[Subnet subnet-5af85e3d]
        end

        subgraph AZB["eu-west-1b"]
          SB[Subnet subnet-f98f23b0]
        end

        subgraph AZC["eu-west-1c"]
          SC[Subnet subnet-5f977504]
        end
      end
    end

    IGW --> RT
    RT --> SA
    RT --> SB
    RT --> SC
flowchart LR
  EC2A[EC2 - AZ A]
  EC2B[EC2 - AZ B]
  RDSC[RDS - AZ C]
  CacheC[ElastiCache - AZ C]

  EC2A -->|Cross-AZ $| RDSC
  EC2B -->|Cross-AZ $| RDSC
  EC2A -->|Cross-AZ $| CacheC
  EC2B -->|Cross-AZ $| CacheC

1. Document Info

Purpose

This document describes the current and target infrastructure architecture of Zengaming (TradeIT). It is intended for:


2. Executive Summary

Zengaming (TradeIT) runs primarily in AWS eu-west-1 (Ireland) on the Default VPC model (public subnets). Workloads are deployed as Docker containers on EC2 using SSH + docker run (no ECS).

Key observations (validated via AWS + Datadog):

A target architecture is defined to reduce cost, improve scalability, and minimize public exposure.


3. Accounts & Organization

Recommendation: In the future, split prod / non-prod into separate accounts for blast-radius and governance.


4. Regions & Availability

KeyValue
Primary regioneu-west-1 (Ireland)
Availability Zones usedeu-west-1a, eu-west-1b, eu-west-1c
AZ strategyOne subnet per AZ (Default VPC)
High availabilityDepends on EC2 / target distribution

Notes:


5. Network Topology (VPC)

VPC Configuration

KeyValue
VPC typeDefault VPC
CIDR172.31.0.0/16
DNS resolutionEnabled
DNS hostnamesEnabled
Internet GatewayAttached

Subnets

Availability ZoneSubnet ID
eu-west-1asubnet-5af85e3d
eu-west-1bsubnet-f98f23b0
eu-west-1csubnet-5f977504

Internet & NAT

KeyValue
Subnet typePublic
NAT GatewaysNone
Outbound accessDirect via Internet Gateway
Private subnetsNot in use

Implications:


6. Compute Layer (Docker on EC2 – Audited)

Overview

The platform runs application workloads directly on EC2 instances using Docker, without a container orchestrator (ECS/EKS).

This design places EC2 as a primary operational and cost factor, requiring explicit documentation of placement, exposure, and ownership.


EC2 Inventory Summary (eu-west-1)

MetricValue
Total running instances595
AZs in useeu-west-1a, eu-west-1c
Dominant AZeu-west-1c
Publicly accessibleMajority (public IPv4)
Instance age range2017 → 2025

Key observation:

There is no meaningful workload distribution strategy across AZs.

Most compute (especially bots and inventory workloads) resides in eu-west-1c, creating:

- AZ blast-radius risk

- Cross-AZ data transfer costs when dependencies live elsewhere


Availability Zone Distribution

eu-west-1a

eu-west-1c (dominant)

Risk:

Inventory, bots, and backend traffic are heavily concentrated in 1c, while some dependencies (historically OpenSearch / RDS endpoints) span other AZs → cross-AZ traffic amplification.


Instance Role Classification

RoleExample InstancesInstance TypesNotes
Edge / EntryLogin, Socketc7a.largePublic-facing
Backend APItradeit backendt3.largePublic IPv4
InventoryInventoryc5.4xlargeHigh CPU / IO
Botsbot-*t3.microLarge fleet
Legacyold.tradeit.ggc5.xlargeLong-lived
Test / Stagingstg / testt2.small / t3.smallAged

Networking & Exposure Model

Implications


Docker Runtime Model

Container networking patterns

Risk:

--network host bypasses container isolation and couples service behavior directly to the host’s network stack.


Process Management


IAM & Identity (Observed)

Security concern:

Lack of IAM roles prevents least-privilege access to AWS services and complicates auditability.


Operational Risks (Current State)


Immediate Improvement Opportunities (Non-Disruptive)

  1. Tag all EC2 instances
  1. AZ-aware placement
  1. Reduce public exposure
  1. IAM roles
  1. Prepare migration path

Strategic Direction (Target State)

LayerCurrentTarget
ComputeEC2 + DockerECS (EC2)
PlacementManualAZ-aware
ScalingManualService-based
NetworkingPublic IPv4Private + LB
IdentitySSH / staticIAM roles
ResilienceHost-boundTask rescheduling

This section intentionally documents reality, not aspiration.

Migration paths are defined elsewhere in the document.

7. Load Balancing & Edge

7.1 Actual Ingress Architecture (Verified)

The platform does not use AWS ALB or NLB.

All ingress traffic is terminated on NGINX running directly on EC2 instances, which act as:

Cloudflare functions strictly as an external CDN + WAF, not as an internal AWS load balancer.


7.2 End-to-End Request Path (Unambiguous)

Production & Staging (Current)

Client ↓ Cloudflare (DNS, CDN, WAF) ↓ HTTPS (443) Elastic IP (EC2 instance) ↓ NGINX (EC2) ↓ Upstream services (Docker containers / other EC2 hosts)


7.3 Load Balancing Responsibility

LayerComponentResponsibility
EdgeCloudflareDNS, CDN, WAF, basic rate limiting
AWS L7❌ ALBNOT USED
AWS L4❌ NLBNOT USED
Application LBNGINX on EC2Primary load balancer

NGINX performs:


7.4 TLS Termination

LayerTLS Terminated?Certificate Location
CloudflareOptional (Flexible / Full)Cloudflare-managed
EC2 (NGINX)YES (Authoritative)/etc/nginx/*.crt
ALBNot applicable

Source of truth: TLS is terminated inside AWS on NGINX, using local certificate files.


7.5 Exposed Ports

PortPurposeExposure
443HTTPSPublic (via EIP)
80HTTP → internal routing / healthPublic
3333Internal API proxyRestricted
3000–400xBackend servicesInternal only

Public exposure is controlled by Security Groups, not subnet isolation.


7.6 Elastic IP Usage Model

AspectCurrent State
Allocation modelOne or more EIPs per EC2
CountHigh (hundreds observed historically)
AttachmentDirectly to EC2 ENIs
RiskCost, quota exhaustion, ops complexity

Important implications:


7.7 NGINX as the Real Load Balancer (Evidence)

From nginx.conf:

This confirms:

NGINX is the actual L7 load balancer


7.8 Diagram Correction Notes (Important)

❌ Incorrect (previous diagrams):

✅ Correct representation: [ Internet ]

[ Cloudflare ]

[ Elastic IP ]

[ EC2 + NGINX ]

[ Backend EC2 / Containers ]

ALB must not appear in diagrams unless it actually exists.


7.9 Risks of Current Design

AreaRisk
ScalabilityManual EC2 + EIP scaling
AvailabilityNo managed LB failover
SecurityPublic EC2 exposure
OperationsSSH-managed infra
CostEIP sprawl + cross-AZ traffic

7.10 Target Direction (Not Yet Implemented)

ComponentTarget
IngressAWS NLB (static EIPs)
TLSACM
ComputePrivate EC2 / ECS
NGINXInternal only
CloudflareDNS + WAF only

This would:


7.11 Summary (One Sentence)

Cloudflare is the edge; NGINX on EC2 is the real load balancer; AWS ALB is not used.

8. Cloudflare (Edge, DNS & Security Layer)

Purpose & Role

Cloudflare is the authoritative edge and DNS layer for all public-facing traffic for the tradeit.gg platform.

It provides:

Cloudflare is external to AWS and always sits in front of AWS infrastructure.


Cloudflare Account & Zones

ItemValue
AccountTRADEIT.GG
Managed zonestradeit.gg, steam-trade.com, connect-tradeit.com, vmarket.gg
Nameserversboyd.ns.cloudflare.com, leah.ns.cloudflare.com
DNS modeFull (Cloudflare authoritative DNS)
PlanEnterprise (for tradeit.gg)

Edge Traffic Model (Authoritative)

Client
  ↓ HTTPS (443)
Cloudflare Edge
  - TLS termination
  - WAF / rate limits
  - Bot protection
  - CDN
  ↓ HTTPS
AWS Origin (ALB or EC2)

Cloudflare is the only internet-facing entrypoint. AWS origins are never exposed directly unless explicitly documented.

DNS & Routing Strategy

Cloudflare DNS is the single source of truth for traffic routing.

Routing patterns in use:

Production Hostnames

Hostname: tradeit.gg Target: ntginx alb Proxy: Proxied Notes: Primary production entry

Hostname: www.tradeit.gg Target: tradeit.gg Proxy: DNS only Notes: Alias

Hostname: api.tradeit.gg Target: EC2 hostname Proxy: Proxied Notes: Legacy direct-to-instance

Hostname: inventory.tradeit.gg Target: EC2 IPv4 Proxy: Proxied Notes: Service-specific endpoint

Staging Hostnames

Hostname: stg-lb.tradeit.gg Target: nginx alb Proxy: Proxied Notes: Main staging

Hostname: stg-banana.tradeit.gg Target: nginx alb Proxy: Proxied Notes: Legacy / parallel

Hostname: stg-tunneled.tradeit.gg Target: Cloudflare Tunnel Proxy: Proxied Notes: No public origin

Proxy Status Policy

Public web / API: Always proxied ALB origins: Always proxied Email (MX, DKIM, SPF): DNS only Verification records: DNS only SaaS integrations: Case-by-case

Any A or CNAME record pointing to AWS infrastructure must be proxied unless explicitly documented.

TLS & Certificates

TLS terminates at Cloudflare Edge.

Client ↔ Cloudflare:

Cloudflare ↔ Origin:

TLS mode:

No plaintext HTTP traffic is intentionally exposed.

Security Controls

Cloudflare enforces:

High-value domains are fully proxied by default.

Email & Non-HTTP DNS Records

Cloudflare DNS hosts records for:

These records are DNS-only by design.

8. Data Stores

Database – Aurora MySQL (RDS)

KeyValue
EngineAurora MySQL
DeploymentAurora cluster
Subnet groupDefault (public)
Public endpointEnabled
Inbound accessRestricted via Security Groups
Internet-wide accessBlocked

Notes:

Cache – Redis (ElastiCache)

KeyValue
EngineRedis
Public accessNo
Inbound accessEC2 security groups
Port6379

9. Search Layer (OpenSearch on EC2)

Current implementation

Verified node placement (AWS CLI)

RoleNamePrivate IPAZInstance type
Client / CoordinatingOpenSearch Cluster Client Coordinating Node172.31.25.126eu-west-1bt3.medium
Index NodeOpenSearch Cluster Dedicated Index Node172.31.36.183eu-west-1cc5.2xlarge
Staging backend hostTI Backend Stg172.31.45.74eu-west-1ct3.large

Key finding (cost + latency):


10. Messaging & Async Processing


11. Identity, Access & Secrets


12. Security Controls

Security Groups

KeyValue
Firewall typeStateful
Default inboundDeny all
Default outboundAllow all
ScopeEC2, ALB, RDS, Redis, EIPs, OpenSearch nodes

Inbound Rules Summary

ComponentAllowed Source
ALBCloudflare IP ranges
EC2 app hostsALB SG and/or trusted IPs
RedisEC2 application SGs
RDSEC2 SGs / approved IPs
OpenSearchEC2 application SGs (recommended)

Hard rules:


13. Observability (Monitoring, Logging, Tracing)

Cost note:


14. Backup, Disaster Recovery & Business Continuity

RTO / RPO to be defined.


15. Storage Layer – EC2 EBS Volumes & Snapshots


Overview

The platform relies heavily on EC2-attached EBS volumes, primarily used as root disks for Docker-based services running directly on EC2.

There is no centralized backup or lifecycle policy currently enforced.


EBS Volume Characteristics

AttributeValue
Volume typegp2
Typical size8 GiB
Typical IOPS100
EncryptionMostly disabled
AttachmentSingle-instance
Delete on terminationEnabled (root volumes)
AZ distributionMostly eu-west-1c

Observations:


Snapshot & Backup Status

ItemStatus
Automated snapshots❌ Not configured
DLM policies❌ None
Recent backups❌ 0 / 598 volumes
Snapshot originMostly AMI creation
Snapshot ageSome dating back to 2018

Important:


Risks


Recommendations (Phase 1 – Low Risk)

  1. Enable EBS encryption by default
  2. Introduce DLM snapshot policies:
  1. Identify:
  1. Tag volumes with:

Recommendations (Phase 2 – Structural)

This account contains a large number of Amazon EBS volumes, mostly small (gp2, ~8 GiB) root volumes attached to EC2 instances running Docker-based services.

Without active management, EBS volumes and snapshots can become a significant hidden cost.


Current Observations


Cost Optimization Opportunities

1. Convert gp2 → gp3 (Immediate Savings)

gp3 is the recommended default EBS volume type.

Volume TypeCost (EU-West-1)Notes
gp2HigherPerformance tied to size
gp3~33% cheaperPerformance independent of size

Benefits


17. Cost Management & Optimization

Current Cost Drivers

Cross-AZ Traffic

Elastic IP Sprawl

RDS Cost Amplification

Target Architecture (direction)

ComponentTarget Design
IngressNLB with 2–3 static EIPs, dual-stack IPv4/IPv6 (if feasible)
ComputePrivate EC2 or ECS (optional later)
OpenSearchInternal NLB + DNS, or migrate to managed service
DatabaseAurora aligned with main compute AZs (A/B or B/C depending)
CacheElastiCache co-located with compute

Expected outcomes:

18. CI/CD & Release Process

This environment uses CircleCI for build and (in some cases) deployment. Deployments are still Docker-on-EC2 (not ECS), executed via SSH to target hosts (often via a bastion).

CI/CD Platform

Build Pipeline (Docker build & push)

Job: build Runtime: cimg/base:2022.03 + setup_remote_docker Output: Docker image pushed to Docker Hub

Build behavior:

Example: feature/new-uifeature--new-ui

Risk note (important):

Using --build-arg for secrets (tokens/keys) may bake secrets into image layers if passed via ARG → ENV inside Dockerfile.

Prefer runtime secrets via SSM/Secrets Manager or Docker runtime env vars.

Build steps summary: 1) Docker Hub login (DOCKER_LOGIN/DOCKER_PASSWORD) 2) docker build with many build args 3) docker push

Slack:

Deploy Pipeline (CircleCI → Bastion → Target EC2)

Job: deploy Runtime: CircleCI machine executor (Ubuntu)

Deployment method:

Deploy steps (current): 1) Add SSH keys (CircleCI-managed fingerprint) 2) Pre-approve bastion host key via ssh-keyscan 3) Open tunnel to target via bastion 4) On target host:

Slack:

Branching / Release Policy

Workflow: build-docker-image-and-push

Note: Current deploy job appears primarily used for staging-style deployment via bastion.

Production deployments may still be performed via manual scripts or separate pipelines depending on service.

Known Gaps / Improvement Ideas

Planned improvements:


# Docker on EC2 — Operations Documentation

---

## 1. Service Info

- **Service name:** tradeit / tradeit-backend / tradeit-inventory-server
- **Environment:** prod / stg / likeprod
- **Repository:** https://github.com/zengamingx/<repo>
- **Docker image:** zengamingx/<image>
- **Runtime:** Docker + PM2
- **Deployment type:** SSH-based deployment
- **Owner:** DevOps / Platform
- **Slack channel:** #tradeit-dev

---

## 2. Hosts & Placement

| Env | Instance Name | Instance ID | AZ | Type | Public | Notes |
|---|---|---|---|---|---|---|
| prod | Login | i-0388bbadd9fb77b30 | eu-west-1a | c7a.large | Yes | User entrypoint |
| prod | Socket | i-07c992a075c2a5aeb | eu-west-1a | c7a.large | Yes | WebSockets |
| stg | TI Backend Stg | i-0089b6d6d03867c2d | eu-west-1c | t3.large | No | Backend |

---

## 3. Networking

- **Inbound path:**  
  Cloudflare → EC2 (direct EIP / Public IP)

- **Host ports:**  
  3000, 4001–4005 (domain-based routing)

- **Container ports:**  
  3000

- **Docker network mode:**  
  bridge (inventory uses host)

- **Security groups:**  
  - `sg-00219c4fb3212cea4` (base)
  - `sg-051880495a8dddc9a` (Cloudflare HTTPS)

⚠️ **Important**
- Any instance with a public IP is internet-reachable
- 0.0.0.0/0 must never be allowed unintentionally
- SSH restricted to trusted IPs only

---

## 4. Dependencies

| Dependency | Endpoint | Port | AZ Sensitive | Notes |
|---|---|---:|---|---|
| Backend API | 172.31.45.74 | 3000 | Yes | Hardcoded |
| Aurora MySQL | cluster endpoint | 3306 | Yes | Public endpoint, SG-restricted |
| Redis | ElastiCache | 6379 | Yes | Same AZ preferred |
| OpenSearch | 172.31.25.126 / 172.31.36.183 | 9200 | Yes | Hard-pinned IPs |
| External APIs | Stripe, Intercom | 443 | No | Internet |

---

## 5. Build

- **Dockerfile:**  
  `/Dockerfile`


- **Secrets used at build time:**  
  - NODE_AUTH_TOKEN  
  - DOTENV_KEY  
  - FONT_AWESOME_TOKEN  

⚠️ **Risk**
- Secrets passed via `ARG → ENV` are baked into image layers
- Migration to runtime secrets recommended

---

## 6. Deploy Procedure (Current)

### Summary
- SSH-based deploy via `deploy.sh`
- Manual approval for prod
- No health checks
- No rollback automation

Service-Level Infrastructure Documentation (TradeIT)


1. Service Role

This NGINX instance acts as a public-facing edge proxy responsible for:

This service is internet-exposed and participates in the Direct-to-Instance ingress architecture (no AWS ALB in front).


2. Ingress Architecture Position

Client ↓ Cloudflare (CDN, WAF, optional TLS) ↓ NGINX (this service, public IP / EIP) ↓ Internal services (Docker / EC2 private IPs)

Notes


3. Security Model

Network

TLS

Hardening


4. Rate Limiting Strategy

Global (Per Host)

Per-IP

Whitelisted IPs


5. Backend Routing Model


6. Ports & Exposure

PortPurpose
80Healthcheck only
443Main HTTPS entrypoint
3333Internal API v2 proxy

7. Routed Services Overview

PathTarget
/api/v2/Backend upstream (port 3000)
/socket.io/WebSockets
/static/Static service
/blog/Internal HTTPS service
/news/Internal HTTPS service
/oauth2/OAuth service
/imgproxyExternal image proxy

8. Error Handling


9. Logging & Observability


10. Operational Risks


11. Improvement Direction


12. NGINX Configuration (Source of Truth)

# For more information on configuration, see:
#   * Official English Documentation: http://nginx.org/en/docs/
#   * Official Russian Documentation: http://nginx.org/ru/docs/

user nginx;
worker_processes auto;
error_log /var/log/nginx/error.log;
pid /run/nginx.pid;
worker_rlimit_nofile 8192;

include /usr/share/nginx/modules/*.conf;

events {
    worker_connections 8192;
    multi_accept on;
    use epoll;
}

http {
    limit_req_zone $host zone=global_rate_limit:10m rate=100r/s;

    log_format rate_limit_log '$remote_addr - $host [$time_local] "$request" '
                              '$status $body_bytes_sent "$http_referer" '
                              '"$http_user_agent" "$request_time" "$upstream_response_time" '
                              '"$pipe" "$limit_req_status"';

    limit_conn_zone $binary_remote_addr zone=conn_limit_per_ip:10m;
    limit_req_zone $binary_remote_addr zone=req_limit_per_ip:10m rate=85r/s;

    resolver 8.8.8.8 ipv6=off;

    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    keepalive_timeout 65s;
    client_max_body_size 20M;
    server_tokens off;

    add_header X-Frame-Options "SAMEORIGIN" always;
    add_header X-XSS-Protection "1; mode=block" always;
    add_header X-Content-Type-Options "nosniff" always;
    add_header Referrer-Policy "strict-origin-when-cross-origin" always;
    add_header Strict-Transport-Security "max-age=63072000; includeSubDomains; preload" always;

    ssl_session_cache shared:SSL:20m;
    ssl_session_timeout 1d;
    ssl_protocols TLSv1 TLSv1.1 TLSv1.2 TLSv1.3;
    ssl_prefer_server_ciphers on;

    log_format main_ext '$remote_addr - $remote_user [$time_local] "$request" '
                        '$status $body_bytes_sent "$http_referer" '
                        '"$http_user_agent" "$http_x_forwarded_for" '
                        '"$host" reqtime=$request_time '
                        'upstream="$upstream_addr" '
                        'ratelimit=$limit_req_status '
                        'country=$http_cf_ipcountry';

    access_log /var/log/nginx/access.log main_ext;

    map $remote_addr $whitelisted {
        default 0;
        127.0.0.1 1;
        ::1 1;
        10.0.0.0/8 1;
        192.168.0.0/16 1;
    }

    upstream backend {
        least_conn;
        server ip-172-31-42-95.eu-west-1.compute.internal:3000;
        server ip-172-31-44-148.eu-west-1.compute.internal:3000;
    }

    server {
        listen 443 ssl http2 default_server;
        server_name tradeit.gg www.tradeit.gg;

        ssl_certificate /etc/nginx/tradeit.crt;
        ssl_certificate_key /etc/nginx/tradeit.key;

        limit_req zone=global_rate_limit burst=30 nodelay;
        limit_req_status 429;

        location /api/v2/ {
            proxy_pass http://backend;
            proxy_set_header Host $host;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        }

        location / {
            proxy_pass http://localhost:3000;
        }
    }
}





ops.tradeit.gg — Internal Engineering Docs