Phenom Infrastructure

Terraform infrastructure as code for deploying the Phenom application stack on AWS ECS

This section contains infrastructure documentation for the Phenom application stack. Access is restricted to infrastructure team.

Overview

Phenom Infrastructure provides Terraform infrastructure as code for deploying the complete Phenom application stack on AWS ECS. This repository contains modular Terraform configurations that create a production-ready cloud environment with security, scalability, and monitoring best practices.

Repository

GitHub Repository: Phenom-earth/phenom-infra

Architecture

The infrastructure deploys a comprehensive AWS environment including:

Core Infrastructure

  • VPC: Virtual Private Cloud with public/private/database subnets across multiple availability zones
  • ECS Fargate: Containerized application cluster with auto-scaling capabilities
  • Application Load Balancer: Traffic routing and SSL termination
  • RDS PostgreSQL: Managed database service (PostgreSQL 17.4) with automated backups
  • AWS Secrets Manager: Secure credential and configuration storage
  • AWS Cognito: User authentication and authorization with Hasura integration
  • S3 Storage: Multiple buckets for general storage and video/image uploads
  • Lambda Functions: Serverless compute for authentication hooks and file validation
  • API Gateway: REST API for secure upload workflows

Service Stack

The ECS cluster runs the following containerized services:

  1. GraphQL Service (Hasura GraphQL Engine)

    • Port: 8080
    • Provides GraphQL API and database migrations
    • Integrated with Cognito for JWT authentication
  2. Auth Service (Hasura Auth)

    • Port: 4000
    • Handles authentication and JWT token management
    • Enhanced with Cognito integration
  3. Storage Service (Hasura Storage)

    • Port: 5000
    • Manages file uploads and storage operations
    • Utilizes S3 backend
  4. Functions Service (Nhost Functions)

    • Port: 3000
    • Executes serverless functions

Video/Image Upload System (NEW)

Serverless file upload pipeline with validation and security:

  • API Gateway: REST API for pre-signed URL generation
  • Lambda: Pre-signed URL Generator: Password-protected URL generation with 1-hour expiry
  • S3 Staging Bucket: Temporary storage with 24-hour auto-cleanup
  • Lambda: File Validator: Automatic validation using magic bytes, optional virus scanning
  • S3 Final Bucket: Permanent storage for validated media organized by type
  • Client Hosting: S3-hosted upload interface

Authentication Integration (NEW)

AWS Cognito integrated with Hasura GraphQL:

  • Cognito User Pool: Email-based authentication with MFA support
  • Lambda: Token Enhancement: Adds Hasura JWT claims to Cognito tokens
  • Lambda: User Sync: Automatically syncs authenticated users to Hasura database
  • OAuth 2.0 Flow: Implicit grant with callback support

Prerequisites

Before deploying the infrastructure, ensure you have:

  • Terraform >= 1.0
  • AWS CLI configured with appropriate credentials
  • AWS Account with sufficient permissions to create resources

Quick Start

1. Configure AWS Credentials

# Option 1: AWS CLI configuration
aws configure

# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"

# Option 3: AWS Profile
export AWS_PROFILE="your-profile-name"

2. Choose Environment

cd environments/<desired-env>

# Examples:
cd environments/development
# or
cd environments/production

3. Deploy Infrastructure

# Initialize Terraform
terraform init

# Review planned changes
terraform plan

# Deploy infrastructure
terraform apply

Environment Structure

environments/
├── development/
│   ├── main.tf          # Main configuration
│   ├── locals.tf        # Environment-specific variables
│   ├── versions.tf      # Terraform and provider versions
│   ├── backend.tf       # Remote state configuration
│   └── outputs.tf       # Output values
└── production/
    └── ... (same structure)

Infrastructure Modules

Networking Module (modules/networking/)

  • VPC: 10.0.0.0/16 CIDR with Internet Gateway
  • 3-Tier Subnet Architecture:
    • Public Subnets (10.0.0.0/24, 10.0.1.0/24) - For ALB
    • Private Subnets (10.0.10.0/24, 10.0.11.0/24) - For ECS tasks
    • Database Subnets (10.0.20.0/24, 10.0.21.0/24) - For RDS
  • NAT Gateways for private subnet egress (optional)
  • Security groups for ALB, ECS tasks, and RDS with least-privilege rules
  • Outputs: VPC ID, subnet IDs, security group IDs
graph TB IGW["Internet Gateway"] subgraph "Public Tier" ALB["Application Load Balancer
Port 80/443
Path-based routing
Health checks /healthz"] NAT["NAT Gateway
Private subnet egress"] end subgraph "Private Tier - ECS" TG1["Target Group
GraphQL:8080"] TG2["Target Group
Auth:4000"] TG3["Target Group
Storage:5000"] TG4["Target Group
Functions:3000"] ECS1["ECS Task
Hasura GraphQL"] ECS2["ECS Task
Hasura Auth"] ECS3["ECS Task
Hasura Storage"] ECS4["ECS Task
Nhost Functions"] end subgraph "Database Tier" RDS["RDS PostgreSQL
db.m5.large
20GB → 100GB
Private subnet"] end subgraph "Storage & Secrets" S3["S3 Buckets
General, Staging,
Final, Hosting"] Secrets["AWS Secrets Manager
DB credentials
API keys
Passwords"] end IGW -->|Port 80/443| ALB ALB -->|Route /api/graphql| TG1 ALB -->|Route /api/auth| TG2 ALB -->|Route /api/storage| TG3 ALB -->|Route /api/functions| TG4 TG1 --> ECS1 TG2 --> ECS2 TG3 --> ECS3 TG4 --> ECS4 ECS1 -->|Query/Update| RDS ECS2 -->|Query/Update| RDS ECS3 -->|Query/Update| RDS ECS4 -->|Query/Update| RDS ECS3 -->|Upload/Download| S3 ECS1 -.->|Read| Secrets ECS2 -.->|Read| Secrets ECS3 -.->|Read| Secrets NAT -->|Egress| IGW style ALB fill:#ffd700 style RDS fill:#e1f5ff style S3 fill:#e8f5e9 style Secrets fill:#ffccbc

Reference URLs:

Application Load Balancer Module (modules/alb/)

  • ALB: Public-facing load balancer in public subnets
  • 4 Target Groups with health checks (/healthz every 30s):
    • GraphQL (port 8080)
    • Auth (port 4000)
    • Storage (port 5000)
    • Functions (port 3000)
  • HTTP listener on port 80 with path-based routing
  • Outputs: ALB DNS name, target group ARNs

ECS Module (modules/ecs/)

  • ECS Fargate Cluster with Container Insights enabled
  • 4 Task Definitions:
    • Hasura GraphQL Engine (8080)
    • Hasura Auth Service (4000)
    • Hasura Storage Service (5000)
    • Nhost Functions (3000)
  • IAM Roles: Task execution role and task role with necessary permissions
  • CloudWatch Logs: /ecs/phenom-dev log group
  • Secrets Integration: Environment variables from AWS Secrets Manager
  • Outputs: Cluster ARN, service ARNs, task definition ARNs
graph TB subgraph Cluster["ECS Fargate Cluster
(Container Insights enabled)"] GraphQL["GraphQL Service
Hasura Engine
Port 8080
2 tasks × 0.25vCPU, 0.5GB"] Auth["Auth Service
Hasura Auth
Port 4000
2 tasks × 0.25vCPU, 0.5GB"] Storage["Storage Service
Hasura Storage
Port 5000
2 tasks × 0.25vCPU, 0.5GB"] Functions["Functions Service
Nhost Functions
Port 3000
2 tasks × 0.25vCPU, 0.5GB"] end ECR["Container Images
ECR Registry"] Secrets["AWS Secrets Manager
Environment variables
Database credentials"] Logs["CloudWatch Logs
/ecs/phenom-dev"] IAM["IAM Roles
Execution & Task roles"] Alarms["CloudWatch Alarms
CPU/Memory monitoring"] ECR -->|Pull images| Cluster Secrets -->|Inject config| Cluster Cluster -->|Stream logs| Logs Cluster -.->|Assume roles| IAM Logs -->|Trigger| Alarms style GraphQL fill:#e1f5ff style Auth fill:#f3e5f5 style Storage fill:#e8f5e9 style Functions fill:#fff3e0 style Cluster fill:#f5f5f5

Note: Each service runs 2 tasks for high availability with auto-scaling capabilities.

Reference URLs:

RDS Module (modules/rds/)

  • PostgreSQL 17.4 on db.m5.large instance
  • Storage: 20GB initial with auto-scaling to 100GB
  • Backup: 7-day retention, daily 03:00-04:00 UTC
  • Maintenance: Sunday 04:00-05:00 UTC
  • Snapshot Restore: From phenom-backend-db-migration-20251018-003143
  • Security: Private (not publicly accessible), encrypted at rest
  • Outputs: Endpoint, port, database name, username ARN

S3 Module (modules/s3/)

  • General Storage Bucket: Replaces MinIO for backend storage
  • Features:
    • Versioning support
    • AES256 encryption
    • CORS configuration for API access
    • Public access blocked
    • Lifecycle rules for cleanup (incomplete multipart uploads after 7 days)
  • IAM User: phenom-storage-user with programmatic access
  • Outputs: Bucket name, bucket ARN, access key ID

Video Upload Module (modules/video-upload/) - NEW

Complete serverless file upload system with security and validation:

Components:

  1. API Gateway: REST API /upload/generate-url endpoint

    • Usage plan: 10,000 requests/day, 10 req/sec rate limit
    • CORS enabled for browser uploads
  2. Lambda: presigned-url-generator

    • Runtime: Node.js 18.x, 512 MB, 30s timeout
    • Validates password from Secrets Manager (5-min cache)
    • Validates MIME type and file size (500MB default)
    • Generates unique pre-signed URLs (1-hour expiry)
  3. Lambda: file-validator

    • Runtime: Node.js 18.x, 3008 MB, 300s timeout
    • Triggered by S3 events on staging bucket
    • Magic byte validation (prevents extension spoofing)
    • Optional ClamAV virus scanning
    • Moves valid files to final bucket, deletes invalid
  4. S3 Staging Bucket: Temporary 24-hour storage

  5. S3 Final Bucket: Organized by type (/images/, /videos/)

  6. S3 Client Hosting Bucket: Hosts upload UI

Supported File Types:

  • Videos: MP4, MPEG, QuickTime, AVI, WMV, WebM
  • Images: JPEG, PNG, GIF, WebP, SVG, TIFF, BMP

Security:

  • Password authentication via Secrets Manager
  • Time-limited pre-signed URLs
  • File type validation using magic bytes
  • Optional virus scanning
  • All buckets encrypted (AES256)
  • Rate limiting and quotas

Outputs: API endpoint, bucket names, Lambda ARNs, client website URL

Cognito Integration (NEW)

AWS Cognito User Pool with Hasura integration:

Configuration:

  • User Pool: phenom-dev with email-based authentication
  • Password Policy: 8+ chars, lowercase, uppercase, numbers, symbols
  • MFA: Configurable (currently OFF in dev)
  • OAuth 2.0: Implicit grant flow
  • Callback URLs: localhost:3000 for development

Lambda Triggers:

  1. hasura-cognito-trigger (Pre-Token Generation)

    • Adds Hasura JWT claims to Cognito tokens
    • Claims namespace: https://hasura.io/jwt/claims
    • Includes: user ID, default role, allowed roles
  2. hasura-cognito-sync-users (Post-Authentication)

    • Syncs authenticated users to Hasura database
    • GraphQL mutation: upserts user to users table
    • Retrieves GraphQL endpoint and admin secret from Secrets Manager
    • 5-minute secret caching for performance

Post-Deployment Configuration

After successful deployment:

  1. Update Database Credentials: Modify database password in AWS Secrets Manager
  2. Configure DNS: Point your domain to the ALB DNS name (provided in Terraform outputs)
  3. Monitor Services: Verify all ECS services are running healthy in AWS Console
  4. Set Video Upload Password (if using video upload module):
    aws secretsmanager update-secret \
      --secret-id "phenom-dev-video-upload-passwords" \
      --secret-string '{"passwords":["your-secure-password"]}'
    
  5. Configure Cognito OAuth (if using Cognito):
    • Update callback URLs in Cognito console for production domains
    • Configure user pool domain for hosted UI (optional)
  6. Test Upload System: Visit the video upload client URL from Terraform outputs

Using the Video Upload System

For Users

  1. Navigate to the upload client URL (from video_client_website_url output)
  2. Enter the upload password (configured in Secrets Manager)
  3. Select file(s) to upload (videos or images)
  4. Click “Upload” - files are validated and processed automatically
  5. Check S3 final bucket for validated files (organized in /images/ or /videos/)

Upload Workflow

flowchart TD A["User Browser"] -->|Password + File metadata| B["Upload Client UI"] B -->|Request pre-signed URL| C["API Gateway
/upload/generate-url"] C -->|Validates password,
MIME type, size| D["Lambda:
presigned-url-generator"] D -->|Returns pre-signed URL| E["User Browser"] E -->|S3 Direct Upload
via pre-signed URL| F["S3 Staging Bucket"] F -->|S3 Event Notification| G["Lambda:
file-validator"] G -->|Magic byte validation| H{File Valid?} H -->|Yes| I["Move to final bucket"] H -->|No| J["Delete file"] I --> K["S3 Final Bucket
/images/ or /videos/"] J --> K K -->|Organized media| L["Ready for Use"] style A fill:#e1f5ff style K fill:#c8e6c9 style L fill:#c8e6c9

Security Features

  • Password Authentication: Only users with valid password can generate upload URLs
  • Pre-signed URLs: Time-limited (1 hour), one-time use, direct to S3
  • Magic Byte Validation: Prevents extension spoofing attacks
  • File Size Limits: Configurable maximum (default 500MB)
  • Virus Scanning: Optional ClamAV integration for enhanced security
  • Auto-cleanup: Staging files deleted after 24 hours
  • Rate Limiting: API Gateway quotas prevent abuse

Terraform Outputs

The infrastructure provides these key outputs:

Core Infrastructure

  • alb_dns_name: Application Load Balancer DNS name
  • service_endpoints: Direct URLs for each deployed service (GraphQL, Auth, Storage, Functions)
  • database_endpoint: RDS PostgreSQL connection endpoint

Video Upload Module (NEW)

  • video_upload_api_endpoint: API Gateway base URL
  • video_upload_generate_url_endpoint: Full endpoint for pre-signed URL generation
  • video_staging_bucket: S3 staging bucket name
  • video_final_bucket: S3 final storage bucket name
  • video_client_hosting_bucket: S3 bucket hosting upload UI
  • video_client_website_url: Public URL for hosted upload client
  • presigned_url_lambda_arn: URL generator Lambda ARN
  • file_validator_lambda_arn: File validator Lambda ARN

Cognito Authentication (NEW)

  • cognito_user_pool_id: User pool ID
  • cognito_user_pool_arn: User pool ARN
  • cognito_app_client_id: Application client ID for OAuth flow

S3 Storage

  • s3_bucket_name: General storage bucket name
  • s3_access_key_id: IAM user access key for S3 operations

Security Best Practices

Credential Management

  • Never commit .tfstate files or .tfvars files to version control
  • Use AWS Secrets Manager for all sensitive configuration values
  • Implement least-privilege IAM permissions

Network Security

  • Private subnets for application and database tiers
  • Security groups with minimal required access
  • VPC Flow Logs for network monitoring

Operations and Monitoring

Viewing Service Logs

# Tail ECS service logs
aws logs tail /ecs/phenom-dev --follow

# Check service health status
aws ecs describe-services --cluster phenom-dev-cluster --services phenom-dev-graphql

Common Troubleshooting

Permission Issues: Verify AWS credentials have sufficient IAM permissions
Resource Conflicts: Check for existing resources created outside Terraform
Service Health: Review CloudWatch logs and database connectivity

Destroying Infrastructure

⚠️ Warning: This permanently deletes all resources and data

terraform destroy

Ensure you have backed up any critical data before proceeding.

AWS Services Provisioned

The infrastructure creates the following AWS resources:

ServiceCountPurpose
VPC1Network isolation
Subnets6Public (2), Private (2), Database (2)
Internet Gateway1External connectivity
NAT Gateway2Private subnet egress (optional)
Application Load Balancer1Traffic routing and SSL termination
Target Groups4Service routing (GraphQL, Auth, Storage, Functions)
ECS Cluster1Container orchestration
ECS Services4Containerized applications
RDS PostgreSQL Instance1Database (db.m5.large)
S3 Buckets5Storage (general), Staging, Final, Client hosting
Lambda Functions42 for video upload, 2 for Cognito
API Gateway1REST API for uploads
Secrets Manager Secrets2App secrets, Upload passwords
Cognito User Pool1Authentication
CloudWatch Log Groups5+Logging for all services
IAM Roles & Policies8+Access control

Cost Optimization

Estimated Monthly Costs (Development)

  • ECS Fargate: ~$40-60 (4 services, 0.25 vCPU, 0.5GB each)
  • RDS db.m5.large: ~$140 (20GB storage)
  • Application Load Balancer: ~$20
  • NAT Gateway: ~$30 (if enabled)
  • S3 Storage: ~$0.50-2 per GB/month (final bucket only)
  • Lambda: ~$0.20 per million invocations
  • API Gateway: ~$3.50 per million requests
  • Data Transfer: Variable (first 1GB free)

Total Estimated: $230-260/month for development environment

Cost Reduction Tips

  1. Disable NAT Gateways in development (use VPC endpoints instead)
  2. Use Fargate Spot for non-critical services (70% discount)
  3. Enable S3 Intelligent-Tiering for infrequent access storage
  4. Set CloudWatch Log Retention to 7 days for development
  5. Use RDS Reserved Instances for production (40-60% discount)
  6. Enable staging bucket lifecycle (auto-delete after 24h - already configured)

Cognito Authentication Flow

flowchart TD A["User Login Request"] --> B["Cognito User Pool
Email + Password"] B --> C["Pre-Token Generation Trigger"] C --> D["Lambda:
hasura-cognito-trigger"] D -->|Add JWT claims namespace| E["Claims Processing"] E -->|x-hasura-user-id
x-hasura-default-role
x-hasura-allowed-roles| F["Cognito Returns
JWT Token"] F -->|Token with Hasura claims| G["Post-Authentication Trigger"] G --> H["Lambda:
hasura-cognito-sync-users"] H -->|Retrieve endpoint
from Secrets Manager| I["Execute GraphQL Mutation"] I -->|Upsert user to
Hasura database| J["User Authenticated
+ Synced to Database"] style A fill:#fff3e0 style J fill:#c8e6c9 style F fill:#e1f5ff

Reference URLs:

Official AWS Documentation

Phenom Documentation

Module-Specific Documentation

  • Video Upload: See modules/video-upload/README.md and ARCHITECTURE.md in repository
  • Cognito Integration: Lambda function source in environments/development/lambda-functions/

For complete implementation details, configuration examples, and troubleshooting, refer to the GitHub repository.