ADR-004: Split Backend into Services + Migrate API to Lambda¶

Status: Accepted Date: 2026-03-11 Supersedes: ECS Fargate backend from ADR-003

Context¶

The monolithic FastAPI backend bundled ~350MB of ML dependencies (numpy, pandas, scikit-learn, LightGBM) into every deployment, even though only one route triggered ML inference. This forced ECS Fargate for the API, which created a VPC + Cluster + ALB per stage — hitting the AWS 5-VPC limit and costing ~$30/month per idle dev stage. Developers also needed a Cloudflare API token to use sst dev.

Decision¶

Split the backend into three independent services with a shared Python package:

services/
  api/           → Lambda (ZIP, ~50MB) via API Gateway V2
  valuation/     → Lambda (container, ~350MB) for ML inference
  scraper/       → Scheduled tasks (Lambda/ECS)
packages/
  shared-python/ → Shared models, schemas, database, config

Key design choices:

API on Lambda (ZIP) — Without ML deps, fits in a ZIP package with fast cold starts (~1-2s). Uses Mangum as ASGI adapter.
Valuation on Lambda (container) — Heavy ML deps require container image. Called by API via direct Lambda invocation (AWS SDK, ~50ms overhead).
Valuation is a pure function — Receives registration + market context as input, returns valuation. No database access, no VPC needed.
VPC only for prod/staging — Dev stages use local Postgres via sst dev. No VPC, no Cluster, no ALB.
Cloudflare provider conditional — Only loaded for prod/staging. Developers don't need the token.
Connection pooling via env var — CONNECTION_POOL_SIZE=0 for Lambda (NullPool), standard pooling for local dev.

Consequences¶

Positive: - Dev stages cost ~$0 (Lambda free tier) vs ~$30/month - No VPC limit issues — dev stages don't create VPCs - sst dev works without Cloudflare token - ML scales independently from API - API deploys in ~30-60s (ZIP upload) vs ~3-5min (Docker build) - Clean code separation — each service has its own directory, requirements, and handler

Negative: - Two Lambda functions to monitor instead of one ECS service - Cold starts (~1-2s API, ~5-8s Valuation) vs always-on ECS - Lambda-to-Lambda invocation adds ~50-100ms per valuation request - Database connection pooling managed via env vars instead of standard pooling - Alembic migrations must run as a separate CI step (not at container startup)

Risks: - Valuation Lambda cold start on first plate lookup (mitigated by Provisioned Concurrency when needed) - Database connection exhaustion under high Lambda concurrency (mitigated by NullPool + future RDS Proxy)