Deployment Runbook¶

Day-to-day deployment operations for Reggie on AWS (Lambda + Aurora architecture).

Deploy to Staging¶

Automatic on push to main. To deploy manually:

npx sst deploy --stage staging
# or
make deploy-staging

Deploy to Production¶

Via GitHub Actions workflow dispatch:

Go to Actions > "Deploy (SST)" > Run workflow
Select production stage
Confirm

Or manually:

npx sst deploy --stage production

Post-Deploy: Run Migrations¶

CI handles this automatically. To run manually:

# Find the migrator Lambda
aws lambda list-functions --region eu-west-2 --max-items 200 \
  --query "Functions[?starts_with(FunctionName, 'reggie-staging-ReggieDbMigratorFunction')].FunctionName | [0]" \
  --output text

# Invoke it
aws lambda invoke --function-name <FUNCTION_NAME> --region eu-west-2 \
  --payload '{}' --cli-read-timeout 350 /tmp/result.json
cat /tmp/result.json | python3 -m json.tool

Verify a Deployment¶

# Health check (includes DB connectivity)
curl https://staging-api.getreggie.io/health

# Expected: {"status":"healthy","checks":{"database":"connected"}}

# Full smoke test
bash scripts/smoke-test.sh https://staging-api.getreggie.io

Rollback¶

SST has no built-in rollback. Options:

Revert and push: Revert the commit on main, push -- CI redeploys
Deploy specific commit: git checkout <sha> && npx sst deploy --stage staging

Database rollback: Alembic downgrade (manual, use with caution)

# Via migrator Lambda with custom event (requires handler modification)
# Or via sst shell for direct DB access

Fix Stale SST State¶

When SST state drifts from actual AWS resources:

# Sync state with reality
npx sst refresh --stage production

# Then redeploy
npx sst deploy --stage production

Common causes: manual AWS Console changes, partial deploys, resource deletion outside SST.

Set or Update Secrets¶

sst secret set ClerkSecretKey sk_live_new_value --stage staging
# Then redeploy for the change to take effect
npx sst deploy --stage staging

View Logs¶

# API Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieApiRouteBbovcaHandlerFunction-* \
  --follow --region eu-west-2

# Migrator Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieDbMigratorFunction-* \
  --follow --region eu-west-2

# API Gateway access logs
aws logs tail /aws/vendedlogs/apis/reggie-staging-ReggieApi-* \
  --follow --region eu-west-2

Common Issues¶

Issue	Diagnosis	Fix
Health returns 503	Database disconnected	Check DATABASE_URL, verify password URL-encoding
Migration fails (password)	SST password has special chars	Ensure `encodeURIComponent` in sst.config.ts
Deploy fails (DNS conflict)	Stale Cloudflare records	Delete conflicting CNAME, redeploy
Deploy fails (S3 bucket)	SST state drift	Run `sst refresh`, then redeploy
Deploy fails (EIP)	Stale Elastic IPs from old NAT	Delete old EIPs/NAT gateways, refresh, redeploy
Deploy fails (concurrency)	Lambda reserved concurrency too high	Remove or reduce concurrency setting
Migrator image stale	SST container build incompatible	CI builds image directly (normal behavior)

AWS Resources to Monitor¶

Resource	Where to Check	Alert On
API health	`curl api.getreggie.io/health`	`unhealthy` status
Database	RDS Console (connections, CPU)	High CPU, connection exhaustion
Lambda errors	CloudWatch metrics	Error rate > 1%
Costs	AWS Cost Explorer	Monthly spend > $120

Backup & Recovery¶

Stage	Backup Retention	Deletion Protection	Recovery
Production	14 days	Enabled	Point-in-time restore (any second)
Staging	7 days	Enabled	Point-in-time restore

To restore from a point in time:

aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier reggie-production-reggiedbcluster-XXX \
  --db-cluster-identifier reggie-production-restored \
  --restore-to-time "2026-03-14T12:00:00Z" \
  --region eu-west-2