ADR-004: Split Backend into Services + Migrate API to Lambda¶
Status: Accepted Date: 2026-03-11 Supersedes: ECS Fargate backend from ADR-003
Context¶
The monolithic FastAPI backend bundled ~350MB of ML dependencies (numpy, pandas, scikit-learn, LightGBM) into every deployment, even though only one route triggered ML inference. This forced ECS Fargate for the API, which created a VPC + Cluster + ALB per stage — hitting the AWS 5-VPC limit and costing ~$30/month per idle dev stage. Developers also needed a Cloudflare API token to use sst dev.
Decision¶
Split the backend into three independent services with a shared Python package:
services/
api/ → Lambda (ZIP, ~50MB) via API Gateway V2
valuation/ → Lambda (container, ~350MB) for ML inference
scraper/ → Scheduled tasks (Lambda/ECS)
packages/
shared-python/ → Shared models, schemas, database, config
Key design choices:
- API on Lambda (ZIP) — Without ML deps, fits in a ZIP package with fast cold starts (~1-2s). Uses Mangum as ASGI adapter.
- Valuation on Lambda (container) — Heavy ML deps require container image. Called by API via direct Lambda invocation (AWS SDK, ~50ms overhead).
- Valuation is a pure function — Receives registration + market context as input, returns valuation. No database access, no VPC needed.
- VPC only for prod/staging — Dev stages use local Postgres via
sst dev. No VPC, no Cluster, no ALB. - Cloudflare provider conditional — Only loaded for prod/staging. Developers don't need the token.
- Connection pooling via env var —
CONNECTION_POOL_SIZE=0for Lambda (NullPool), standard pooling for local dev.
Consequences¶
Positive:
- Dev stages cost ~$0 (Lambda free tier) vs ~$30/month
- No VPC limit issues — dev stages don't create VPCs
- sst dev works without Cloudflare token
- ML scales independently from API
- API deploys in ~30-60s (ZIP upload) vs ~3-5min (Docker build)
- Clean code separation — each service has its own directory, requirements, and handler
Negative: - Two Lambda functions to monitor instead of one ECS service - Cold starts (~1-2s API, ~5-8s Valuation) vs always-on ECS - Lambda-to-Lambda invocation adds ~50-100ms per valuation request - Database connection pooling managed via env vars instead of standard pooling - Alembic migrations must run as a separate CI step (not at container startup)
Risks: - Valuation Lambda cold start on first plate lookup (mitigated by Provisioned Concurrency when needed) - Database connection exhaustion under high Lambda concurrency (mitigated by NullPool + future RDS Proxy)