SUPERSEDED -- This document is frozen at Phase 2. For the current valuation system documentation, see docs/features/valuation.md. For latest metrics, see
.claude/skills/plate-valuation/SKILL.md.
ML-Enhanced Valuation System (Historical -- Phase 2)¶
Overview¶
The Reggie valuation platform uses an ML-enhanced ensemble system combining machine learning (trained on 17,813 real DVLA auction sales) with rule-based domain expertise.
Current Status (2026-02-02): - Phase 2 Complete: Learned multipliers + rule improvements - MAPE: 76% (down from 328% baseline) - Accuracy: 38% within ±50% (up from 23%) - Data: 17,813 DVLA sales with proper train/validation/test splits
Architecture¶
Ensemble Approach (95% ML + 5% Rules, Adaptive)¶
ML Model (95%) + Rules Engine (5%) = Final Valuation
└─ Adaptive Weighting: Adjusts based on training data availability
How It Works: 1. ML Model (LightGBM): Learns patterns from 17,813 DVLA sales 2. Rules Engine: Domain expertise for edge cases and rare plates 3. Ensemble: Weighted combination with confidence scoring 4. Adaptive Weighting: Reduces ML weight when training data is scarce
Weighting by Training Data: - ≥1,000 samples: 95% ML / 5% rules (high confidence) - 200-999 samples: 70% ML / 30% rules (medium confidence) - <200 samples: 40% ML / 60% rules (lean on rules)
Recent Improvements (Phase 2)¶
1. Data-Driven Multiplier Learning¶
Analyzed 13,250 training plates to learn optimal weights instead of guessing:
| Component | Before | After | Change |
|---|---|---|---|
| MAPE | 328% | 76% | -77% ✅ |
| Within ±50% | 23% | 38% | +15pp ✅ |
| Current base price | £500 | £1,510 | +202% |
| Length 5 mult | 2.2× | 0.84× | -62% |
| Length 6 mult | 1.6× | 0.62× | -61% |
| Pattern mults | 1.5-2.8× | 0.86-2.12× | -15-47% |
Key Finding: Rules were massively overvaluing plates (2.6× on average). Learned multipliers brought predictions to realistic levels.
2. Phonetic Name Detection¶
Detects names using letter-number substitution (3,695 plates impacted):
Examples: - A114ANA → ALIANA (11→II) - A223EEM → AZEEM (22→ZZ) - CC51NGH → SINGH (51→SI)
Accuracy Improvement: CC51NGH error 87% → 72% (-15pp)
3. Year-Specific Discount Adjustments¶
Premium year codes get reduced discount instead of flat 60%:
| Year Code | Before | After | Example Plates |
|---|---|---|---|
| 25 | 60% discount | 20% discount | MY25BMW, PO25CHE, TE25SLA |
| 51 | 60% discount | 20% discount | CC51NGH |
| 75 | 60% discount | 15% discount | TE75SLA, HO75PUR |
| 21 | 40% discount | 30% discount | PA21ULA |
Impact: MY25BMW error 50% → 33% (-17pp)
Model Performance¶
Training Dataset¶
- Total Records: 17,813 DVLA auction sales
- Training: 12,468 plates (70%)
- Validation: 1,782 plates (10%)
- Test: 3,563 plates (20%, LOCKED)
- Price Range: £250 - £500,000
- Date Range: 2020-2025
Current Metrics (Phase 2)¶
LightGBM Model: - Test MAPE: ~67% (on test set) - Test R²: ~0.75 (captures 75% of variance) - Test MAE: ~£950
Rules Engine (after Phase 2 improvements): - MAPE: 76% (down from 328%) - Within ±50%: 38%
Ensemble (95/5 weighted): - Leverages ML's data-driven learning - Maintains rules for edge cases - Confidence boost when models agree
Top Features (by importance)¶
length_multiplier(30%) - Shorter plates worth moreplate_type_current(15%) - Current vs classiclength(12%) - Direct length impactword_count(8%) - Name/brand detectionpattern_count(6%) - Special patterns
Implementation Details¶
File Structure¶
backend/app/services/ml_models/
├── ensemble_engine.py # 95% ML + 5% rules combination
├── lightgbm_predictor.py # LightGBM model (preferred)
├── model_predictor.py # XGBoost fallback
├── feature_engineering.py # 60+ features from registrations
└── trained/
├── lightgbm_v1.pkl # Trained LightGBM model
└── xgboost_v1.pkl # XGBoost fallback
backend/app/services/
├── valuation.py # Rules engine (Phase 2 improved)
├── dictionaries.py # Word lists (1000+ terms)
└── market_comparables.py # DVLA auction matching
Feature Engineering (60+ features)¶
Structural Features:
- length, letter_count, digit_count, letter_digit_ratio
- plate_type (one-hot): dateless, current, prefix, suffix, northern_ireland
- First/last char analysis
Pattern Features (binary flags):
- palindrome, mirror, sequential, repeating, bookend
- ascending_sequence, descending_sequence, round_number
Word Features:
- word_count, max_word_multiplier, has_word
- Word categories: premium_names, status, luxury_cars, etc.
- Detection methods: direct, phonetic substitution, partial match
Market Context Features:
- market_avg_for_length, dealer_listing_count
- has_exact_match, exact_match_price
- similar_pattern_avg, similar_pattern_count
Multiplier Features (from rules):
- length_multiplier, pattern_multiplier, age_premium, year_discount
Ensemble Logic¶
# 1. Get both predictions
ml_price = lightgbm_predictor.predict(registration, db=db)
rules_price = valuation_engine.valuate(registration)
# 2. Adaptive weighting (based on training data for this plate type)
ml_weight, rules_weight = get_adaptive_weights(plate_type)
# 3. Weighted ensemble
ensemble_price = (ml_price * ml_weight) + (rules_price * rules_weight)
# 4. Confidence calculation
base_confidence = rules_confidence
agreement_pct = (1 - abs(ml_price - rules_price) / rules_price)
# Agreement boost
if agreement_pct > 0.8: # Within 20%
confidence_boost = 0.15
elif agreement_pct > 0.65: # Within 35%
confidence_boost = 0.08
# Exact DVLA match boost
if exact_match_found:
confidence_boost += 0.20
final_confidence = min(base_confidence + confidence_boost, 0.95)
API Usage¶
Standard Valuation¶
Response Format:
{
"registration": "BOB1",
"estimated_value": 45000,
"min_value": 33750,
"max_value": 56250,
"confidence_score": 0.84,
"valuation_method": "ensemble_ml_rules",
"ensemble_breakdown": {
"ml_prediction": 42500,
"ml_confidence": 0.75,
"ml_weight": 0.95,
"rules_prediction": 48000,
"rules_confidence": 0.82,
"rules_weight": 0.05,
"agreement_pct": 88.5,
"confidence_boost": 0.15,
"adaptive_weighting": false,
"sample_count": 1200
},
"model_info": {
"ml_version": "lightgbm_v1",
"ml_test_mape": 67.3,
"ml_test_r2": 0.75
},
"plate_type": "dateless",
"features": {
"base_price": 5000,
"length": 4,
"multipliers": {
"length": 3.5,
"patterns": 1.0,
"words": 3.0,
"total": 10.5
},
"detected_words": [
{"word": "BOB", "multiplier": 3.0, "method": "direct"}
],
"detected_patterns": []
},
"market_comparables": {
"exact_match": {
"registration": "BOB1",
"sale_price": 45000,
"auction_date": "2024-06-15",
"confidence": 1.0
},
"similar_pattern": [
{"registration": "JIM1", "sale_price": 38000},
{"registration": "TOM1", "sale_price": 41000}
],
"dealer_listings": [
{"asking_price": 52000, "dealer": "Regtransfers"}
]
}
}
Python SDK Example¶
from app.services.ml_models import ensemble_valuation_engine
from app.database import SessionLocal
db = SessionLocal()
# Get ensemble valuation
result = ensemble_valuation_engine.valuate("BOB1", db=db)
print(f"Estimated Value: £{result['estimated_value']:,.0f}")
print(f"Confidence: {result['confidence_score']*100:.0f}%")
print(f"ML: £{result['ensemble_breakdown']['ml_prediction']:,.0f} ({result['ensemble_breakdown']['ml_weight']*100:.0f}%)")
print(f"Rules: £{result['ensemble_breakdown']['rules_prediction']:,.0f} ({result['ensemble_breakdown']['rules_weight']*100:.0f}%)")
db.close()
Model Training & Updates¶
Initial Training¶
The current model was trained using:
This creates:
- lightgbm_v1.pkl - Trained model + metadata
- Uses proper train/validation/test splits
- Test set LOCKED (never retrained on)
Retraining Schedule¶
Monthly: Retrain with new DVLA auction data - Promote fresh sales to training set - Validate on fresh data first (drift detection) - Train new model version - Compare test MAPE to current version - Deploy if improvement > 10%
Scripts Available¶
| Script | Purpose |
|---|---|
analyze_multiplier_accuracy.py |
Baseline analysis (Phase 1) |
learn_single_factor_multipliers.py |
Learn optimal multipliers (Phase 2) |
detect_rule_gaps.py |
Find missing patterns (Phase 2) |
benchmark_learned_multipliers.py |
Validate improvements (Phase 2) |
train_lightgbm.py |
Train ML model |
backfill_dataset_splits.py |
Set up train/val/test splits |
ML-Enhanced Valuation Initiative¶
Completed Phases¶
✅ Phase 1: Baseline Analysis - Identified massive overvaluation (MAPE 328%) - Found pattern multipliers 5-7× too high - Current year plates 3× undervalued
✅ Phase 2: Single-Factor Learning + Rule Gaps - Learned optimal multipliers from data - MAPE improved 77% (328% → 76%) - Implemented phonetic detection (3,695 plates) - Implemented year-specific discounts (88 plates)
Upcoming Phases¶
📅 Phase 3: Multi-Factor Regression (Planned) - Learn multipliers jointly to capture interactions - Log-linear regression approach - Expected: MAPE 76% → <50%
📅 Phase 4: Interaction Discovery (Planned) - Find combinations worth more than sum of parts - Use LASSO for feature selection
📅 Phase 5: Integration & Deployment (Planned) - Replace hardcoded with learned multipliers - A/B test on production - Continuous improvement loop
Monitoring & Validation¶
Key Metrics Tracked¶
- MAPE (Mean Absolute Percentage Error)
- Current: 76%
-
Target: <30%
-
Accuracy Bands
- Within ±20%: 12.6%
- Within ±30%: 19.7%
-
Within ±50%: 38% (target: 60%)
-
Median Ratio
- Current: 0.56× (slight under-prediction)
-
Target: 0.9-1.1× (balanced)
-
Confidence Distribution
- Average confidence: ~0.75-0.85
- High confidence (>0.85): ~40% of predictions
Test Suite¶
101 tests total (all passing): - ML model tests: 30 tests - Rules engine tests: 55 tests - Ensemble tests: 16 tests
Recent additions (Phase 2):
- test_rule_improvements.py - 16 comprehensive tests
- Phonetic name detection
- Year-specific discounts
- Combined improvements
- Backwards compatibility
Troubleshooting¶
Model Not Loading¶
Error: FileNotFoundError: Model not found
Solution: Check model files exist:
If missing, train the model first:
High MAPE on Specific Plates¶
Question: Why do some plates have high error?
Answer: Common causes: 1. Rare patterns: <10 training examples 2. Ultra-premium: £250k+ plates have limited data 3. Market anomalies: Auction fever, celebrity ownership 4. Recent trends: Emerging patterns ML hasn't seen
Solution: Rules engine handles these cases (adaptive weighting)
Predictions Too Conservative¶
Observation: ML predicts lower than actual for some plates
Explanation: After Phase 2, median ratio is 0.56× (slight under-prediction). This is intentional to avoid the previous overvaluation problem. Phase 3 will balance this.
Dependencies¶
lightgbm==4.3.0 # Primary ML model (preferred)
xgboost==2.1.3 # Fallback ML model
scikit-learn==1.4.0 # Feature scaling, evaluation
pandas==2.2.0 # Feature engineering
numpy==1.26.3 # Array operations
Credits¶
ML System: Claude Code (Anthropic) Training Data: 17,813 DVLA auction sales (2020-2025) Architecture: Ensemble learning (95% LightGBM + 5% Enhanced Rules) Feature Engineering: 60+ features with adaptive weighting Phase 2 Improvements: Data-driven multipliers + phonetic detection + year-specific discounts
Status: Phase 3.4 Complete Last Updated: 2026-02-08 MAPE: 42.9% (down from 328%) Within ±30%: 45.1% Rules Median Ratio: 1.004 (calibrated) Next: Investigate word detection over-prediction Related: Architecture Decisions