Valuation System¶
Reggie uses an ML-enhanced ensemble system combining machine learning with rule-based domain expertise.
Current Status (Phase 3.6 -- 2026-03-02)¶
| Metric | Baseline | Current | Target |
|---|---|---|---|
| ML MAPE | 328% | 42.6% | <30% |
| Within +/-30% | 23% | 46.5% | 60% |
| Budget MAPE | -- | 58.3% | <50% |
| Rules Median Ratio | 2.18 | 1.004 | 1.0 |
Architecture¶
User requests valuation
|
Ensemble Engine
|
+-------------------------------+
| 95% ML (LightGBM) | <- Predicts price from 83 features
| + |
| 5% Rules Engine | <- base x length x pattern x word multipliers
+-------------------------------+
|
Final valuation + confidence
How it works: 1. ML Model (LightGBM): Trained on 26,226 DVLA auction sales 2. Rules Engine: Domain expertise with data-driven multipliers 3. Ensemble: Weighted combination with confidence scoring 4. Adaptive Weighting: Adjusts based on training data availability
| Training Data Available | ML Weight | Rules Weight |
|---|---|---|
| >= 1,000 samples | 95% | 5% |
| 200-999 samples | 70% | 30% |
| < 200 samples | 40% | 60% |
Key insight: ML uses rules_estimated_value as one of its 83 input features. Better rules lead to better ML predictions.
Training Data¶
| Total records | ~26,226 market_prices |
| Price range | GBP 250 -- GBP 500,000 |
| Primary source | DVLA auction sales (weight 1.0) |
| Other sources | DVLA fixed price (0.95), dealer asking (0.65) |
| Splits | 70% train, 10% validation, 20% test (LOCKED) |
Top Features (by Importance)¶
length_multiplier(30%) -- shorter plates worth moreplate_type_current(15%) -- current vs classiclength(12%) -- direct length impactword_count(8%) -- name/brand detectionpattern_count(6%) -- special patterns
Key Findings (Phase 3.6)¶
- Price tier compression is the #1 problem: premium/luxury plates systematically undervalued (2x), budget plates over-predicted
- Suffix and prefix plates have the worst MAPE by plate type
- Word detection coverage has minimal impact on MAPE -- the bottleneck is price prediction range, not word lists
File Structure¶
backend/app/services/ml_models/
ensemble_engine.py # 95% ML + 5% rules combination
lightgbm_predictor.py # LightGBM model (primary)
model_predictor.py # XGBoost fallback
feature_engineering.py # 83 features from registrations
trained/
lightgbm_v1.pkl # Trained model
backend/app/services/
valuation.py # Rules engine
valuation_lookups.py # Word dictionaries, name frequencies
market_comparables.py # DVLA auction matching
API Usage¶
Returns valuation with ensemble breakdown, feature details, market comparables, and confidence score. See API docs at /api/v1/docs for the full response schema.
Analysis & Improvement¶
For the valuation improvement workflow, multiplier tuning, and retraining guide, see .claude/skills/plate-valuation/SKILL.md.
Key scripts:
| Script | Purpose |
|--------|---------|
| scripts/analyze_mape_distribution.py | Ensemble MAPE by dimension (diagnostic) |
| scripts/analyze_multiplier_accuracy.py | Rules engine multiplier accuracy |
| scripts/detect_rule_gaps.py | Find missing patterns |
| scripts/train_lightgbm.py | Train/retrain ML model |
Phase History¶
| Phase | Date | Key Achievement |
|---|---|---|
| 1 | 2026-01 | Baseline analysis (MAPE 328%) |
| 2 | 2026-02-02 | Learned multipliers, phonetic detection (MAPE 76%) |
| 3 | 2026-02-05 | LightGBM ensemble, market_prices table (MAPE 43%) |
| 3.4 | 2026-02-08 | Rules calibration, ML retraining (MAPE 42.9%) |
| 3.6 | 2026-03-02 | Word list expansion (2,100+ words), MAPE distribution analysis (MAPE 42.6%) |
Next: Address premium/luxury undervaluation (price tier compression)
Related¶
- ADR-002: LightGBM
.claude/skills/plate-valuation/SKILL.md(analysis commands, detailed workflow)- Architecture