Skip to content

Deployment Runbook

Day-to-day deployment operations for Reggie on AWS (Lambda + Aurora architecture).

Deploy to Staging

Automatic on push to main. To deploy manually:

npx sst deploy --stage staging
# or
make deploy-staging

Deploy to Production

Via GitHub Actions workflow dispatch:

  1. Go to Actions > "Deploy (SST)" > Run workflow
  2. Select production stage
  3. Confirm

Or manually:

npx sst deploy --stage production

Post-Deploy: Run Migrations

CI handles this automatically. To run manually:

# Find the migrator Lambda
aws lambda list-functions --region eu-west-2 --max-items 200 \
  --query "Functions[?starts_with(FunctionName, 'reggie-staging-ReggieDbMigratorFunction')].FunctionName | [0]" \
  --output text

# Invoke it
aws lambda invoke --function-name <FUNCTION_NAME> --region eu-west-2 \
  --payload '{}' --cli-read-timeout 350 /tmp/result.json
cat /tmp/result.json | python3 -m json.tool

Verify a Deployment

# Health check (includes DB connectivity)
curl https://staging-api.getreggie.io/health

# Expected: {"status":"healthy","checks":{"database":"connected"}}

# Full smoke test
bash scripts/smoke-test.sh https://staging-api.getreggie.io

Rollback

SST has no built-in rollback. Options:

  1. Revert and push: Revert the commit on main, push -- CI redeploys
  2. Deploy specific commit: git checkout <sha> && npx sst deploy --stage staging
  3. Database rollback: Alembic downgrade (manual, use with caution)
    # Via migrator Lambda with custom event (requires handler modification)
    # Or via sst shell for direct DB access
    

Fix Stale SST State

When SST state drifts from actual AWS resources:

# Sync state with reality
npx sst refresh --stage production

# Then redeploy
npx sst deploy --stage production

Common causes: manual AWS Console changes, partial deploys, resource deletion outside SST.

Set or Update Secrets

sst secret set ClerkSecretKey sk_live_new_value --stage staging
# Then redeploy for the change to take effect
npx sst deploy --stage staging

View Logs

# API Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieApiRouteBbovcaHandlerFunction-* \
  --follow --region eu-west-2

# Migrator Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieDbMigratorFunction-* \
  --follow --region eu-west-2

# API Gateway access logs
aws logs tail /aws/vendedlogs/apis/reggie-staging-ReggieApi-* \
  --follow --region eu-west-2

Common Issues

Issue Diagnosis Fix
Health returns 503 Database disconnected Check DATABASE_URL, verify password URL-encoding
Migration fails (password) SST password has special chars Ensure encodeURIComponent in sst.config.ts
Deploy fails (DNS conflict) Stale Cloudflare records Delete conflicting CNAME, redeploy
Deploy fails (S3 bucket) SST state drift Run sst refresh, then redeploy
Deploy fails (EIP) Stale Elastic IPs from old NAT Delete old EIPs/NAT gateways, refresh, redeploy
Deploy fails (concurrency) Lambda reserved concurrency too high Remove or reduce concurrency setting
Migrator image stale SST container build incompatible CI builds image directly (normal behavior)

AWS Resources to Monitor

Resource Where to Check Alert On
API health curl api.getreggie.io/health unhealthy status
Database RDS Console (connections, CPU) High CPU, connection exhaustion
Lambda errors CloudWatch metrics Error rate > 1%
Costs AWS Cost Explorer Monthly spend > $120

Backup & Recovery

Stage Backup Retention Deletion Protection Recovery
Production 14 days Enabled Point-in-time restore (any second)
Staging 7 days Enabled Point-in-time restore

To restore from a point in time:

aws rds restore-db-cluster-to-point-in-time \
  --source-db-cluster-identifier reggie-production-reggiedbcluster-XXX \
  --db-cluster-identifier reggie-production-restored \
  --restore-to-time "2026-03-14T12:00:00Z" \
  --region eu-west-2