Skip to content

ADR-002: Use LightGBM as Primary ML Model for Plate Valuation

Status

Accepted (2026-02-02)

Context

Reggie's valuation system needed to move beyond a pure rules engine. The rules engine alone had a MAPE of 328% -- it was massively overvaluing plates due to hand-tuned multipliers that didn't reflect real market data.

Requirements for the ML model: - Train on ~17,000-26,000 DVLA auction sale records - Handle 60+ engineered features (structural, pattern, word, market context) - Work alongside the rules engine in an ensemble (not replace it) - Fast inference for real-time API responses - Interpretable feature importances for debugging

Decision

Use LightGBM as the primary ML model in a 95% ML / 5% rules ensemble, with XGBoost as a fallback.

Ensemble architecture: 1. LightGBM predicts price from 83 features (including rules_estimated_value) 2. Rules engine provides domain expertise for edge cases 3. Adaptive weighting adjusts based on training data availability per plate type 4. Confidence scoring with agreement boost when models align

Why LightGBM over alternatives: - Faster training than XGBoost on this dataset size - Handles categorical features natively - Lower memory usage - Strong performance on tabular data (the dominant ML paradigm for this problem) - Built-in feature importance for model interpretability

Consequences

Easier: - MAPE dropped from 328% (rules only) to 42.6% (ensemble) - Accuracy within +/-30% improved from 23% to 46.5% - Data-driven multiplier learning replaced hand-tuned guesswork - Feature importances guide future improvement efforts

Harder: - ML model requires periodic retraining as new auction data arrives - Rules engine must stay calibrated (ML uses rules_estimated_value as a feature) - Model file (~10MB) must be included in Docker image for production - Price tier compression: premium/luxury plates systematically undervalued (known limitation)

Current metrics (Phase 3.6): See .claude/skills/plate-valuation/SKILL.md for latest performance data.

Related: docs/features/valuation.md (system deep-dive)