Skip to content

Valuation System

Reggie uses an ML-enhanced ensemble system combining machine learning with rule-based domain expertise.

Current Status (Phase 3.6 -- 2026-03-02)

Metric Baseline Current Target
ML MAPE 328% 42.6% <30%
Within +/-30% 23% 46.5% 60%
Budget MAPE -- 58.3% <50%
Rules Median Ratio 2.18 1.004 1.0

Architecture

User requests valuation
        |
   Ensemble Engine
        |
+-------------------------------+
|  95% ML (LightGBM)           |  <- Predicts price from 83 features
|  +                            |
|   5% Rules Engine             |  <- base x length x pattern x word multipliers
+-------------------------------+
        |
   Final valuation + confidence

How it works: 1. ML Model (LightGBM): Trained on 26,226 DVLA auction sales 2. Rules Engine: Domain expertise with data-driven multipliers 3. Ensemble: Weighted combination with confidence scoring 4. Adaptive Weighting: Adjusts based on training data availability

Training Data Available ML Weight Rules Weight
>= 1,000 samples 95% 5%
200-999 samples 70% 30%
< 200 samples 40% 60%

Key insight: ML uses rules_estimated_value as one of its 83 input features. Better rules lead to better ML predictions.

Training Data

Total records ~26,226 market_prices
Price range GBP 250 -- GBP 500,000
Primary source DVLA auction sales (weight 1.0)
Other sources DVLA fixed price (0.95), dealer asking (0.65)
Splits 70% train, 10% validation, 20% test (LOCKED)

Top Features (by Importance)

  1. length_multiplier (30%) -- shorter plates worth more
  2. plate_type_current (15%) -- current vs classic
  3. length (12%) -- direct length impact
  4. word_count (8%) -- name/brand detection
  5. pattern_count (6%) -- special patterns

Key Findings (Phase 3.6)

  • Price tier compression is the #1 problem: premium/luxury plates systematically undervalued (2x), budget plates over-predicted
  • Suffix and prefix plates have the worst MAPE by plate type
  • Word detection coverage has minimal impact on MAPE -- the bottleneck is price prediction range, not word lists

File Structure

backend/app/services/ml_models/
  ensemble_engine.py            # 95% ML + 5% rules combination
  lightgbm_predictor.py         # LightGBM model (primary)
  model_predictor.py            # XGBoost fallback
  feature_engineering.py        # 83 features from registrations
  trained/
    lightgbm_v1.pkl             # Trained model

backend/app/services/
  valuation.py                  # Rules engine
  valuation_lookups.py          # Word dictionaries, name frequencies
  market_comparables.py         # DVLA auction matching

API Usage

GET /api/v1/plates/BOB1

Returns valuation with ensemble breakdown, feature details, market comparables, and confidence score. See API docs at /api/v1/docs for the full response schema.

Analysis & Improvement

For the valuation improvement workflow, multiplier tuning, and retraining guide, see .claude/skills/plate-valuation/SKILL.md.

Key scripts: | Script | Purpose | |--------|---------| | scripts/analyze_mape_distribution.py | Ensemble MAPE by dimension (diagnostic) | | scripts/analyze_multiplier_accuracy.py | Rules engine multiplier accuracy | | scripts/detect_rule_gaps.py | Find missing patterns | | scripts/train_lightgbm.py | Train/retrain ML model |

Phase History

Phase Date Key Achievement
1 2026-01 Baseline analysis (MAPE 328%)
2 2026-02-02 Learned multipliers, phonetic detection (MAPE 76%)
3 2026-02-05 LightGBM ensemble, market_prices table (MAPE 43%)
3.4 2026-02-08 Rules calibration, ML retraining (MAPE 42.9%)
3.6 2026-03-02 Word list expansion (2,100+ words), MAPE distribution analysis (MAPE 42.6%)

Next: Address premium/luxury undervaluation (price tier compression)