Valuation System¶

Reggie uses an ML-enhanced ensemble system combining machine learning with rule-based domain expertise.

Current Status (Phase 3.6 -- 2026-03-02)¶

Metric	Baseline	Current	Target
ML MAPE	328%	42.6%	<30%
Within +/-30%	23%	46.5%	60%
Budget MAPE	--	58.3%	<50%
Rules Median Ratio	2.18	1.004	1.0

Architecture¶

User requests valuation
        |
   Ensemble Engine
        |
+-------------------------------+
|  95% ML (LightGBM)           |  <- Predicts price from 83 features
|  +                            |
|   5% Rules Engine             |  <- base x length x pattern x word multipliers
+-------------------------------+
        |
   Final valuation + confidence

How it works: 1. ML Model (LightGBM): Trained on 26,226 DVLA auction sales 2. Rules Engine: Domain expertise with data-driven multipliers 3. Ensemble: Weighted combination with confidence scoring 4. Adaptive Weighting: Adjusts based on training data availability

Training Data Available	ML Weight	Rules Weight
>= 1,000 samples	95%	5%
200-999 samples	70%	30%
< 200 samples	40%	60%

Key insight: ML uses rules_estimated_value as one of its 83 input features. Better rules lead to better ML predictions.

Training Data¶


Total records	~26,226 market_prices
Price range	GBP 250 -- GBP 500,000
Primary source	DVLA auction sales (weight 1.0)
Other sources	DVLA fixed price (0.95), dealer asking (0.65)
Splits	70% train, 10% validation, 20% test (LOCKED)

Top Features (by Importance)¶

length_multiplier (30%) -- shorter plates worth more
plate_type_current (15%) -- current vs classic
length (12%) -- direct length impact
word_count (8%) -- name/brand detection
pattern_count (6%) -- special patterns

Key Findings (Phase 3.6)¶

Price tier compression is the #1 problem: premium/luxury plates systematically undervalued (2x), budget plates over-predicted
Suffix and prefix plates have the worst MAPE by plate type
Word detection coverage has minimal impact on MAPE -- the bottleneck is price prediction range, not word lists

File Structure¶

backend/app/services/ml_models/
  ensemble_engine.py            # 95% ML + 5% rules combination
  lightgbm_predictor.py         # LightGBM model (primary)
  model_predictor.py            # XGBoost fallback
  feature_engineering.py        # 83 features from registrations
  trained/
    lightgbm_v1.pkl             # Trained model

backend/app/services/
  valuation.py                  # Rules engine
  valuation_lookups.py          # Word dictionaries, name frequencies
  market_comparables.py         # DVLA auction matching

API Usage¶

GET /api/v1/plates/BOB1

Returns valuation with ensemble breakdown, feature details, market comparables, and confidence score. See API docs at /api/v1/docs for the full response schema.

Analysis & Improvement¶

For the valuation improvement workflow, multiplier tuning, and retraining guide, see .claude/skills/plate-valuation/SKILL.md.

Key scripts: | Script | Purpose | |--------|---------| | scripts/analyze_mape_distribution.py | Ensemble MAPE by dimension (diagnostic) | | scripts/analyze_multiplier_accuracy.py | Rules engine multiplier accuracy | | scripts/detect_rule_gaps.py | Find missing patterns | | scripts/train_lightgbm.py | Train/retrain ML model |

Phase History¶

Phase	Date	Key Achievement
1	2026-01	Baseline analysis (MAPE 328%)
2	2026-02-02	Learned multipliers, phonetic detection (MAPE 76%)
3	2026-02-05	LightGBM ensemble, market_prices table (MAPE 43%)
3.4	2026-02-08	Rules calibration, ML retraining (MAPE 42.9%)
3.6	2026-03-02	Word list expansion (2,100+ words), MAPE distribution analysis (MAPE 42.6%)

Next: Address premium/luxury undervaluation (price tier compression)

ADR-002: LightGBM
.claude/skills/plate-valuation/SKILL.md (analysis commands, detailed workflow)
Architecture