Phenom Infrastructure
Categories:
This section contains infrastructure documentation for the Phenom application stack. Access is restricted to infrastructure team.
Overview
Phenom Infrastructure provides Terraform infrastructure as code for deploying the complete Phenom application stack on AWS ECS. This repository contains modular Terraform configurations that create a production-ready cloud environment with security, scalability, and monitoring best practices.
Repository
GitHub Repository: Phenom-earth/phenom-infra
Architecture
The infrastructure deploys a comprehensive AWS environment including:
Core Infrastructure
- VPC: Virtual Private Cloud with public/private/database subnets across multiple availability zones
- ECS Fargate: Containerized application cluster with auto-scaling capabilities
- Application Load Balancer: Traffic routing and SSL termination
- RDS PostgreSQL: Managed database service (PostgreSQL 17.4) with automated backups
- AWS Secrets Manager: Secure credential and configuration storage
- AWS Cognito: User authentication and authorization with Hasura integration
- S3 Storage: Multiple buckets for general storage and video/image uploads
- Lambda Functions: Serverless compute for authentication hooks and file validation
- API Gateway: REST API for secure upload workflows
Service Stack
The ECS cluster runs the following containerized services:
GraphQL Service (Hasura GraphQL Engine)
- Port: 8080
- Provides GraphQL API and database migrations
- Integrated with Cognito for JWT authentication
Auth Service (Hasura Auth)
- Port: 4000
- Handles authentication and JWT token management
- Enhanced with Cognito integration
Storage Service (Hasura Storage)
- Port: 5000
- Manages file uploads and storage operations
- Utilizes S3 backend
Functions Service (Nhost Functions)
- Port: 3000
- Executes serverless functions
Video/Image Upload System (NEW)
Serverless file upload pipeline with validation and security:
- API Gateway: REST API for pre-signed URL generation
- Lambda: Pre-signed URL Generator: Password-protected URL generation with 1-hour expiry
- S3 Staging Bucket: Temporary storage with 24-hour auto-cleanup
- Lambda: File Validator: Automatic validation using magic bytes, optional virus scanning
- S3 Final Bucket: Permanent storage for validated media organized by type
- Client Hosting: S3-hosted upload interface
Authentication Integration (NEW)
AWS Cognito integrated with Hasura GraphQL:
- Cognito User Pool: Email-based authentication with MFA support
- Lambda: Token Enhancement: Adds Hasura JWT claims to Cognito tokens
- Lambda: User Sync: Automatically syncs authenticated users to Hasura database
- OAuth 2.0 Flow: Implicit grant with callback support
Prerequisites
Before deploying the infrastructure, ensure you have:
- Terraform >= 1.0
- AWS CLI configured with appropriate credentials
- AWS Account with sufficient permissions to create resources
Quick Start
1. Configure AWS Credentials
# Option 1: AWS CLI configuration
aws configure
# Option 2: Environment variables
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export AWS_DEFAULT_REGION="us-east-1"
# Option 3: AWS Profile
export AWS_PROFILE="your-profile-name"
2. Choose Environment
cd environments/<desired-env>
# Examples:
cd environments/development
# or
cd environments/production
3. Deploy Infrastructure
# Initialize Terraform
terraform init
# Review planned changes
terraform plan
# Deploy infrastructure
terraform apply
Environment Structure
environments/
├── development/
│ ├── main.tf # Main configuration
│ ├── locals.tf # Environment-specific variables
│ ├── versions.tf # Terraform and provider versions
│ ├── backend.tf # Remote state configuration
│ └── outputs.tf # Output values
└── production/
└── ... (same structure)
Infrastructure Modules
Networking Module (modules/networking/)
- VPC: 10.0.0.0/16 CIDR with Internet Gateway
- 3-Tier Subnet Architecture:
- Public Subnets (10.0.0.0/24, 10.0.1.0/24) - For ALB
- Private Subnets (10.0.10.0/24, 10.0.11.0/24) - For ECS tasks
- Database Subnets (10.0.20.0/24, 10.0.21.0/24) - For RDS
- NAT Gateways for private subnet egress (optional)
- Security groups for ALB, ECS tasks, and RDS with least-privilege rules
- Outputs: VPC ID, subnet IDs, security group IDs
Port 80/443
Path-based routing
Health checks /healthz"] NAT["NAT Gateway
Private subnet egress"] end subgraph "Private Tier - ECS" TG1["Target Group
GraphQL:8080"] TG2["Target Group
Auth:4000"] TG3["Target Group
Storage:5000"] TG4["Target Group
Functions:3000"] ECS1["ECS Task
Hasura GraphQL"] ECS2["ECS Task
Hasura Auth"] ECS3["ECS Task
Hasura Storage"] ECS4["ECS Task
Nhost Functions"] end subgraph "Database Tier" RDS["RDS PostgreSQL
db.m5.large
20GB → 100GB
Private subnet"] end subgraph "Storage & Secrets" S3["S3 Buckets
General, Staging,
Final, Hosting"] Secrets["AWS Secrets Manager
DB credentials
API keys
Passwords"] end IGW -->|Port 80/443| ALB ALB -->|Route /api/graphql| TG1 ALB -->|Route /api/auth| TG2 ALB -->|Route /api/storage| TG3 ALB -->|Route /api/functions| TG4 TG1 --> ECS1 TG2 --> ECS2 TG3 --> ECS3 TG4 --> ECS4 ECS1 -->|Query/Update| RDS ECS2 -->|Query/Update| RDS ECS3 -->|Query/Update| RDS ECS4 -->|Query/Update| RDS ECS3 -->|Upload/Download| S3 ECS1 -.->|Read| Secrets ECS2 -.->|Read| Secrets ECS3 -.->|Read| Secrets NAT -->|Egress| IGW style ALB fill:#ffd700 style RDS fill:#e1f5ff style S3 fill:#e8f5e9 style Secrets fill:#ffccbc
Reference URLs:
Application Load Balancer Module (modules/alb/)
- ALB: Public-facing load balancer in public subnets
- 4 Target Groups with health checks (
/healthzevery 30s):- GraphQL (port 8080)
- Auth (port 4000)
- Storage (port 5000)
- Functions (port 3000)
- HTTP listener on port 80 with path-based routing
- Outputs: ALB DNS name, target group ARNs
ECS Module (modules/ecs/)
- ECS Fargate Cluster with Container Insights enabled
- 4 Task Definitions:
- Hasura GraphQL Engine (8080)
- Hasura Auth Service (4000)
- Hasura Storage Service (5000)
- Nhost Functions (3000)
- IAM Roles: Task execution role and task role with necessary permissions
- CloudWatch Logs:
/ecs/phenom-devlog group - Secrets Integration: Environment variables from AWS Secrets Manager
- Outputs: Cluster ARN, service ARNs, task definition ARNs
(Container Insights enabled)"] GraphQL["GraphQL Service
Hasura Engine
Port 8080
2 tasks × 0.25vCPU, 0.5GB"] Auth["Auth Service
Hasura Auth
Port 4000
2 tasks × 0.25vCPU, 0.5GB"] Storage["Storage Service
Hasura Storage
Port 5000
2 tasks × 0.25vCPU, 0.5GB"] Functions["Functions Service
Nhost Functions
Port 3000
2 tasks × 0.25vCPU, 0.5GB"] end ECR["Container Images
ECR Registry"] Secrets["AWS Secrets Manager
Environment variables
Database credentials"] Logs["CloudWatch Logs
/ecs/phenom-dev"] IAM["IAM Roles
Execution & Task roles"] Alarms["CloudWatch Alarms
CPU/Memory monitoring"] ECR -->|Pull images| Cluster Secrets -->|Inject config| Cluster Cluster -->|Stream logs| Logs Cluster -.->|Assume roles| IAM Logs -->|Trigger| Alarms style GraphQL fill:#e1f5ff style Auth fill:#f3e5f5 style Storage fill:#e8f5e9 style Functions fill:#fff3e0 style Cluster fill:#f5f5f5
Note: Each service runs 2 tasks for high availability with auto-scaling capabilities.
Reference URLs:
RDS Module (modules/rds/)
- PostgreSQL 17.4 on db.m5.large instance
- Storage: 20GB initial with auto-scaling to 100GB
- Backup: 7-day retention, daily 03:00-04:00 UTC
- Maintenance: Sunday 04:00-05:00 UTC
- Snapshot Restore: From
phenom-backend-db-migration-20251018-003143 - Security: Private (not publicly accessible), encrypted at rest
- Outputs: Endpoint, port, database name, username ARN
S3 Module (modules/s3/)
- General Storage Bucket: Replaces MinIO for backend storage
- Features:
- Versioning support
- AES256 encryption
- CORS configuration for API access
- Public access blocked
- Lifecycle rules for cleanup (incomplete multipart uploads after 7 days)
- IAM User:
phenom-storage-userwith programmatic access - Outputs: Bucket name, bucket ARN, access key ID
Video Upload Module (modules/video-upload/) - NEW
Complete serverless file upload system with security and validation:
Components:
API Gateway: REST API
/upload/generate-urlendpoint- Usage plan: 10,000 requests/day, 10 req/sec rate limit
- CORS enabled for browser uploads
Lambda: presigned-url-generator
- Runtime: Node.js 18.x, 512 MB, 30s timeout
- Validates password from Secrets Manager (5-min cache)
- Validates MIME type and file size (500MB default)
- Generates unique pre-signed URLs (1-hour expiry)
Lambda: file-validator
- Runtime: Node.js 18.x, 3008 MB, 300s timeout
- Triggered by S3 events on staging bucket
- Magic byte validation (prevents extension spoofing)
- Optional ClamAV virus scanning
- Moves valid files to final bucket, deletes invalid
S3 Staging Bucket: Temporary 24-hour storage
S3 Final Bucket: Organized by type (
/images/,/videos/)S3 Client Hosting Bucket: Hosts upload UI
Supported File Types:
- Videos: MP4, MPEG, QuickTime, AVI, WMV, WebM
- Images: JPEG, PNG, GIF, WebP, SVG, TIFF, BMP
Security:
- Password authentication via Secrets Manager
- Time-limited pre-signed URLs
- File type validation using magic bytes
- Optional virus scanning
- All buckets encrypted (AES256)
- Rate limiting and quotas
Outputs: API endpoint, bucket names, Lambda ARNs, client website URL
Cognito Integration (NEW)
AWS Cognito User Pool with Hasura integration:
Configuration:
- User Pool:
phenom-devwith email-based authentication - Password Policy: 8+ chars, lowercase, uppercase, numbers, symbols
- MFA: Configurable (currently OFF in dev)
- OAuth 2.0: Implicit grant flow
- Callback URLs: localhost:3000 for development
Lambda Triggers:
hasura-cognito-trigger (Pre-Token Generation)
- Adds Hasura JWT claims to Cognito tokens
- Claims namespace:
https://hasura.io/jwt/claims - Includes: user ID, default role, allowed roles
hasura-cognito-sync-users (Post-Authentication)
- Syncs authenticated users to Hasura database
- GraphQL mutation: upserts user to
userstable - Retrieves GraphQL endpoint and admin secret from Secrets Manager
- 5-minute secret caching for performance
Post-Deployment Configuration
After successful deployment:
- Update Database Credentials: Modify database password in AWS Secrets Manager
- Configure DNS: Point your domain to the ALB DNS name (provided in Terraform outputs)
- Monitor Services: Verify all ECS services are running healthy in AWS Console
- Set Video Upload Password (if using video upload module):
aws secretsmanager update-secret \ --secret-id "phenom-dev-video-upload-passwords" \ --secret-string '{"passwords":["your-secure-password"]}' - Configure Cognito OAuth (if using Cognito):
- Update callback URLs in Cognito console for production domains
- Configure user pool domain for hosted UI (optional)
- Test Upload System: Visit the video upload client URL from Terraform outputs
Using the Video Upload System
For Users
- Navigate to the upload client URL (from
video_client_website_urloutput) - Enter the upload password (configured in Secrets Manager)
- Select file(s) to upload (videos or images)
- Click “Upload” - files are validated and processed automatically
- Check S3 final bucket for validated files (organized in
/images/or/videos/)
Upload Workflow
/upload/generate-url"] C -->|Validates password,
MIME type, size| D["Lambda:
presigned-url-generator"] D -->|Returns pre-signed URL| E["User Browser"] E -->|S3 Direct Upload
via pre-signed URL| F["S3 Staging Bucket"] F -->|S3 Event Notification| G["Lambda:
file-validator"] G -->|Magic byte validation| H{File Valid?} H -->|Yes| I["Move to final bucket"] H -->|No| J["Delete file"] I --> K["S3 Final Bucket
/images/ or /videos/"] J --> K K -->|Organized media| L["Ready for Use"] style A fill:#e1f5ff style K fill:#c8e6c9 style L fill:#c8e6c9
Security Features
- Password Authentication: Only users with valid password can generate upload URLs
- Pre-signed URLs: Time-limited (1 hour), one-time use, direct to S3
- Magic Byte Validation: Prevents extension spoofing attacks
- File Size Limits: Configurable maximum (default 500MB)
- Virus Scanning: Optional ClamAV integration for enhanced security
- Auto-cleanup: Staging files deleted after 24 hours
- Rate Limiting: API Gateway quotas prevent abuse
Terraform Outputs
The infrastructure provides these key outputs:
Core Infrastructure
alb_dns_name: Application Load Balancer DNS nameservice_endpoints: Direct URLs for each deployed service (GraphQL, Auth, Storage, Functions)database_endpoint: RDS PostgreSQL connection endpoint
Video Upload Module (NEW)
video_upload_api_endpoint: API Gateway base URLvideo_upload_generate_url_endpoint: Full endpoint for pre-signed URL generationvideo_staging_bucket: S3 staging bucket namevideo_final_bucket: S3 final storage bucket namevideo_client_hosting_bucket: S3 bucket hosting upload UIvideo_client_website_url: Public URL for hosted upload clientpresigned_url_lambda_arn: URL generator Lambda ARNfile_validator_lambda_arn: File validator Lambda ARN
Cognito Authentication (NEW)
cognito_user_pool_id: User pool IDcognito_user_pool_arn: User pool ARNcognito_app_client_id: Application client ID for OAuth flow
S3 Storage
s3_bucket_name: General storage bucket names3_access_key_id: IAM user access key for S3 operations
Security Best Practices
Credential Management
- Never commit
.tfstatefiles or.tfvarsfiles to version control - Use AWS Secrets Manager for all sensitive configuration values
- Implement least-privilege IAM permissions
Network Security
- Private subnets for application and database tiers
- Security groups with minimal required access
- VPC Flow Logs for network monitoring
Operations and Monitoring
Viewing Service Logs
# Tail ECS service logs
aws logs tail /ecs/phenom-dev --follow
# Check service health status
aws ecs describe-services --cluster phenom-dev-cluster --services phenom-dev-graphql
Common Troubleshooting
Permission Issues: Verify AWS credentials have sufficient IAM permissions
Resource Conflicts: Check for existing resources created outside Terraform
Service Health: Review CloudWatch logs and database connectivity
Destroying Infrastructure
⚠️ Warning: This permanently deletes all resources and data
terraform destroy
Ensure you have backed up any critical data before proceeding.
AWS Services Provisioned
The infrastructure creates the following AWS resources:
| Service | Count | Purpose |
|---|---|---|
| VPC | 1 | Network isolation |
| Subnets | 6 | Public (2), Private (2), Database (2) |
| Internet Gateway | 1 | External connectivity |
| NAT Gateway | 2 | Private subnet egress (optional) |
| Application Load Balancer | 1 | Traffic routing and SSL termination |
| Target Groups | 4 | Service routing (GraphQL, Auth, Storage, Functions) |
| ECS Cluster | 1 | Container orchestration |
| ECS Services | 4 | Containerized applications |
| RDS PostgreSQL Instance | 1 | Database (db.m5.large) |
| S3 Buckets | 5 | Storage (general), Staging, Final, Client hosting |
| Lambda Functions | 4 | 2 for video upload, 2 for Cognito |
| API Gateway | 1 | REST API for uploads |
| Secrets Manager Secrets | 2 | App secrets, Upload passwords |
| Cognito User Pool | 1 | Authentication |
| CloudWatch Log Groups | 5+ | Logging for all services |
| IAM Roles & Policies | 8+ | Access control |
Cost Optimization
Estimated Monthly Costs (Development)
- ECS Fargate: ~$40-60 (4 services, 0.25 vCPU, 0.5GB each)
- RDS db.m5.large: ~$140 (20GB storage)
- Application Load Balancer: ~$20
- NAT Gateway: ~$30 (if enabled)
- S3 Storage: ~$0.50-2 per GB/month (final bucket only)
- Lambda: ~$0.20 per million invocations
- API Gateway: ~$3.50 per million requests
- Data Transfer: Variable (first 1GB free)
Total Estimated: $230-260/month for development environment
Cost Reduction Tips
- Disable NAT Gateways in development (use VPC endpoints instead)
- Use Fargate Spot for non-critical services (70% discount)
- Enable S3 Intelligent-Tiering for infrequent access storage
- Set CloudWatch Log Retention to 7 days for development
- Use RDS Reserved Instances for production (40-60% discount)
- Enable staging bucket lifecycle (auto-delete after 24h - already configured)
Cognito Authentication Flow
Email + Password"] B --> C["Pre-Token Generation Trigger"] C --> D["Lambda:
hasura-cognito-trigger"] D -->|Add JWT claims namespace| E["Claims Processing"] E -->|x-hasura-user-id
x-hasura-default-role
x-hasura-allowed-roles| F["Cognito Returns
JWT Token"] F -->|Token with Hasura claims| G["Post-Authentication Trigger"] G --> H["Lambda:
hasura-cognito-sync-users"] H -->|Retrieve endpoint
from Secrets Manager| I["Execute GraphQL Mutation"] I -->|Upsert user to
Hasura database| J["User Authenticated
+ Synced to Database"] style A fill:#fff3e0 style J fill:#c8e6c9 style F fill:#e1f5ff
Reference URLs:
Related Documentation
Official AWS Documentation
- AWS ECS Best Practices
- Terraform AWS Provider
- AWS Lambda Best Practices
- Amazon S3 Security Best Practices
- AWS Cognito Developer Guide
Phenom Documentation
Module-Specific Documentation
- Video Upload: See
modules/video-upload/README.mdandARCHITECTURE.mdin repository - Cognito Integration: Lambda function source in
environments/development/lambda-functions/
For complete implementation details, configuration examples, and troubleshooting, refer to the GitHub repository.
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.