Deployment Runbook¶
Day-to-day deployment operations for Reggie on AWS (Lambda + Aurora architecture).
Deploy to Staging¶
Automatic on push to main. To deploy manually:
Deploy to Production¶
Via GitHub Actions workflow dispatch:
- Go to Actions > "Deploy (SST)" > Run workflow
- Select
productionstage - Confirm
Or manually:
Post-Deploy: Run Migrations¶
CI handles this automatically. To run manually:
# Find the migrator Lambda
aws lambda list-functions --region eu-west-2 --max-items 200 \
--query "Functions[?starts_with(FunctionName, 'reggie-staging-ReggieDbMigratorFunction')].FunctionName | [0]" \
--output text
# Invoke it
aws lambda invoke --function-name <FUNCTION_NAME> --region eu-west-2 \
--payload '{}' --cli-read-timeout 350 /tmp/result.json
cat /tmp/result.json | python3 -m json.tool
Verify a Deployment¶
# Health check (includes DB connectivity)
curl https://staging-api.getreggie.io/health
# Expected: {"status":"healthy","checks":{"database":"connected"}}
# Full smoke test
bash scripts/smoke-test.sh https://staging-api.getreggie.io
Rollback¶
SST has no built-in rollback. Options:
- Revert and push: Revert the commit on
main, push -- CI redeploys - Deploy specific commit:
git checkout <sha> && npx sst deploy --stage staging - Database rollback: Alembic downgrade (manual, use with caution)
Fix Stale SST State¶
When SST state drifts from actual AWS resources:
# Sync state with reality
npx sst refresh --stage production
# Then redeploy
npx sst deploy --stage production
Common causes: manual AWS Console changes, partial deploys, resource deletion outside SST.
Set or Update Secrets¶
sst secret set ClerkSecretKey sk_live_new_value --stage staging
# Then redeploy for the change to take effect
npx sst deploy --stage staging
View Logs¶
# API Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieApiRouteBbovcaHandlerFunction-* \
--follow --region eu-west-2
# Migrator Lambda logs
aws logs tail /aws/lambda/reggie-staging-ReggieDbMigratorFunction-* \
--follow --region eu-west-2
# API Gateway access logs
aws logs tail /aws/vendedlogs/apis/reggie-staging-ReggieApi-* \
--follow --region eu-west-2
Common Issues¶
| Issue | Diagnosis | Fix |
|---|---|---|
| Health returns 503 | Database disconnected | Check DATABASE_URL, verify password URL-encoding |
| Migration fails (password) | SST password has special chars | Ensure encodeURIComponent in sst.config.ts |
| Deploy fails (DNS conflict) | Stale Cloudflare records | Delete conflicting CNAME, redeploy |
| Deploy fails (S3 bucket) | SST state drift | Run sst refresh, then redeploy |
| Deploy fails (EIP) | Stale Elastic IPs from old NAT | Delete old EIPs/NAT gateways, refresh, redeploy |
| Deploy fails (concurrency) | Lambda reserved concurrency too high | Remove or reduce concurrency setting |
| Migrator image stale | SST container build incompatible | CI builds image directly (normal behavior) |
AWS Resources to Monitor¶
| Resource | Where to Check | Alert On |
|---|---|---|
| API health | curl api.getreggie.io/health |
unhealthy status |
| Database | RDS Console (connections, CPU) | High CPU, connection exhaustion |
| Lambda errors | CloudWatch metrics | Error rate > 1% |
| Costs | AWS Cost Explorer | Monthly spend > $120 |
Backup & Recovery¶
| Stage | Backup Retention | Deletion Protection | Recovery |
|---|---|---|---|
| Production | 14 days | Enabled | Point-in-time restore (any second) |
| Staging | 7 days | Enabled | Point-in-time restore |
To restore from a point in time: