SUPERSEDED -- This document is frozen at Phase 2. For the current valuation system documentation, see docs/features/valuation.md. For latest metrics, see .claude/skills/plate-valuation/SKILL.md.

ML-Enhanced Valuation System (Historical -- Phase 2)¶

Overview¶

The Reggie valuation platform uses an ML-enhanced ensemble system combining machine learning (trained on 17,813 real DVLA auction sales) with rule-based domain expertise.

Current Status (2026-02-02): - Phase 2 Complete: Learned multipliers + rule improvements - MAPE: 76% (down from 328% baseline) - Accuracy: 38% within ±50% (up from 23%) - Data: 17,813 DVLA sales with proper train/validation/test splits

Architecture¶

Ensemble Approach (95% ML + 5% Rules, Adaptive)¶

ML Model (95%) + Rules Engine (5%) = Final Valuation
└─ Adaptive Weighting: Adjusts based on training data availability

How It Works: 1. ML Model (LightGBM): Learns patterns from 17,813 DVLA sales 2. Rules Engine: Domain expertise for edge cases and rare plates 3. Ensemble: Weighted combination with confidence scoring 4. Adaptive Weighting: Reduces ML weight when training data is scarce

Weighting by Training Data: - ≥1,000 samples: 95% ML / 5% rules (high confidence) - 200-999 samples: 70% ML / 30% rules (medium confidence) - <200 samples: 40% ML / 60% rules (lean on rules)

Recent Improvements (Phase 2)¶

1. Data-Driven Multiplier Learning¶

Analyzed 13,250 training plates to learn optimal weights instead of guessing:

Component	Before	After	Change
MAPE	328%	76%	-77% ✅
Within ±50%	23%	38%	+15pp ✅
Current base price	£500	£1,510	+202%
Length 5 mult	2.2×	0.84×	-62%
Length 6 mult	1.6×	0.62×	-61%
Pattern mults	1.5-2.8×	0.86-2.12×	-15-47%

Key Finding: Rules were massively overvaluing plates (2.6× on average). Learned multipliers brought predictions to realistic levels.

2. Phonetic Name Detection¶

Detects names using letter-number substitution (3,695 plates impacted):

Examples: - A114ANA → ALIANA (11→II) - A223EEM → AZEEM (22→ZZ) - CC51NGH → SINGH (51→SI)

Accuracy Improvement: CC51NGH error 87% → 72% (-15pp)

3. Year-Specific Discount Adjustments¶

Premium year codes get reduced discount instead of flat 60%:

Year Code	Before	After	Example Plates
25	60% discount	20% discount	MY25BMW, PO25CHE, TE25SLA
51	60% discount	20% discount	CC51NGH
75	60% discount	15% discount	TE75SLA, HO75PUR
21	40% discount	30% discount	PA21ULA

Impact: MY25BMW error 50% → 33% (-17pp)

Model Performance¶

Training Dataset¶

Total Records: 17,813 DVLA auction sales
Training: 12,468 plates (70%)
Validation: 1,782 plates (10%)
Test: 3,563 plates (20%, LOCKED)
Price Range: £250 - £500,000
Date Range: 2020-2025

Current Metrics (Phase 2)¶

LightGBM Model: - Test MAPE: ~67% (on test set) - Test R²: ~0.75 (captures 75% of variance) - Test MAE: ~£950

Rules Engine (after Phase 2 improvements): - MAPE: 76% (down from 328%) - Within ±50%: 38%

Ensemble (95/5 weighted): - Leverages ML's data-driven learning - Maintains rules for edge cases - Confidence boost when models agree

Top Features (by importance)¶

length_multiplier (30%) - Shorter plates worth more
plate_type_current (15%) - Current vs classic
length (12%) - Direct length impact
word_count (8%) - Name/brand detection
pattern_count (6%) - Special patterns

Implementation Details¶

File Structure¶

backend/app/services/ml_models/
├── ensemble_engine.py            # 95% ML + 5% rules combination
├── lightgbm_predictor.py         # LightGBM model (preferred)
├── model_predictor.py            # XGBoost fallback
├── feature_engineering.py        # 60+ features from registrations
└── trained/
    ├── lightgbm_v1.pkl          # Trained LightGBM model
    └── xgboost_v1.pkl           # XGBoost fallback

backend/app/services/
├── valuation.py                   # Rules engine (Phase 2 improved)
├── dictionaries.py               # Word lists (1000+ terms)
└── market_comparables.py         # DVLA auction matching

Feature Engineering (60+ features)¶

Structural Features: - length, letter_count, digit_count, letter_digit_ratio - plate_type (one-hot): dateless, current, prefix, suffix, northern_ireland - First/last char analysis

Pattern Features (binary flags): - palindrome, mirror, sequential, repeating, bookend - ascending_sequence, descending_sequence, round_number

Word Features: - word_count, max_word_multiplier, has_word - Word categories: premium_names, status, luxury_cars, etc. - Detection methods: direct, phonetic substitution, partial match

Market Context Features: - market_avg_for_length, dealer_listing_count - has_exact_match, exact_match_price - similar_pattern_avg, similar_pattern_count

Multiplier Features (from rules): - length_multiplier, pattern_multiplier, age_premium, year_discount

Ensemble Logic¶

# 1. Get both predictions
ml_price = lightgbm_predictor.predict(registration, db=db)
rules_price = valuation_engine.valuate(registration)

# 2. Adaptive weighting (based on training data for this plate type)
ml_weight, rules_weight = get_adaptive_weights(plate_type)

# 3. Weighted ensemble
ensemble_price = (ml_price * ml_weight) + (rules_price * rules_weight)

# 4. Confidence calculation
base_confidence = rules_confidence
agreement_pct = (1 - abs(ml_price - rules_price) / rules_price)

# Agreement boost
if agreement_pct > 0.8:  # Within 20%
    confidence_boost = 0.15
elif agreement_pct > 0.65:  # Within 35%
    confidence_boost = 0.08

# Exact DVLA match boost
if exact_match_found:
    confidence_boost += 0.20

final_confidence = min(base_confidence + confidence_boost, 0.95)

API Usage¶

Standard Valuation¶

GET /api/v1/plates/BOB1

Response Format:

{
  "registration": "BOB1",
  "estimated_value": 45000,
  "min_value": 33750,
  "max_value": 56250,
  "confidence_score": 0.84,
  "valuation_method": "ensemble_ml_rules",

  "ensemble_breakdown": {
    "ml_prediction": 42500,
    "ml_confidence": 0.75,
    "ml_weight": 0.95,
    "rules_prediction": 48000,
    "rules_confidence": 0.82,
    "rules_weight": 0.05,
    "agreement_pct": 88.5,
    "confidence_boost": 0.15,
    "adaptive_weighting": false,
    "sample_count": 1200
  },

  "model_info": {
    "ml_version": "lightgbm_v1",
    "ml_test_mape": 67.3,
    "ml_test_r2": 0.75
  },

  "plate_type": "dateless",
  "features": {
    "base_price": 5000,
    "length": 4,
    "multipliers": {
      "length": 3.5,
      "patterns": 1.0,
      "words": 3.0,
      "total": 10.5
    },
    "detected_words": [
      {"word": "BOB", "multiplier": 3.0, "method": "direct"}
    ],
    "detected_patterns": []
  },

  "market_comparables": {
    "exact_match": {
      "registration": "BOB1",
      "sale_price": 45000,
      "auction_date": "2024-06-15",
      "confidence": 1.0
    },
    "similar_pattern": [
      {"registration": "JIM1", "sale_price": 38000},
      {"registration": "TOM1", "sale_price": 41000}
    ],
    "dealer_listings": [
      {"asking_price": 52000, "dealer": "Regtransfers"}
    ]
  }
}

Python SDK Example¶

from app.services.ml_models import ensemble_valuation_engine
from app.database import SessionLocal

db = SessionLocal()

# Get ensemble valuation
result = ensemble_valuation_engine.valuate("BOB1", db=db)

print(f"Estimated Value: £{result['estimated_value']:,.0f}")
print(f"Confidence: {result['confidence_score']*100:.0f}%")
print(f"ML: £{result['ensemble_breakdown']['ml_prediction']:,.0f} ({result['ensemble_breakdown']['ml_weight']*100:.0f}%)")
print(f"Rules: £{result['ensemble_breakdown']['rules_prediction']:,.0f} ({result['ensemble_breakdown']['rules_weight']*100:.0f}%)")

db.close()

Model Training & Updates¶

Initial Training¶

The current model was trained using:

cd backend
source venv/bin/activate
python scripts/train_lightgbm.py

This creates: - lightgbm_v1.pkl - Trained model + metadata - Uses proper train/validation/test splits - Test set LOCKED (never retrained on)

Retraining Schedule¶

Monthly: Retrain with new DVLA auction data - Promote fresh sales to training set - Validate on fresh data first (drift detection) - Train new model version - Compare test MAPE to current version - Deploy if improvement > 10%

Scripts Available¶

Script	Purpose
`analyze_multiplier_accuracy.py`	Baseline analysis (Phase 1)
`learn_single_factor_multipliers.py`	Learn optimal multipliers (Phase 2)
`detect_rule_gaps.py`	Find missing patterns (Phase 2)
`benchmark_learned_multipliers.py`	Validate improvements (Phase 2)
`train_lightgbm.py`	Train ML model
`backfill_dataset_splits.py`	Set up train/val/test splits

ML-Enhanced Valuation Initiative¶

Completed Phases¶

✅ Phase 1: Baseline Analysis - Identified massive overvaluation (MAPE 328%) - Found pattern multipliers 5-7× too high - Current year plates 3× undervalued

✅ Phase 2: Single-Factor Learning + Rule Gaps - Learned optimal multipliers from data - MAPE improved 77% (328% → 76%) - Implemented phonetic detection (3,695 plates) - Implemented year-specific discounts (88 plates)

Upcoming Phases¶

📅 Phase 3: Multi-Factor Regression (Planned) - Learn multipliers jointly to capture interactions - Log-linear regression approach - Expected: MAPE 76% → <50%

📅 Phase 4: Interaction Discovery (Planned) - Find combinations worth more than sum of parts - Use LASSO for feature selection

📅 Phase 5: Integration & Deployment (Planned) - Replace hardcoded with learned multipliers - A/B test on production - Continuous improvement loop

Monitoring & Validation¶

Key Metrics Tracked¶

MAPE (Mean Absolute Percentage Error)
Current: 76%
Target: <30%
Accuracy Bands
Within ±20%: 12.6%
Within ±30%: 19.7%
Within ±50%: 38% (target: 60%)
Median Ratio
Current: 0.56× (slight under-prediction)
Target: 0.9-1.1× (balanced)
Confidence Distribution
Average confidence: ~0.75-0.85
High confidence (>0.85): ~40% of predictions

Test Suite¶

101 tests total (all passing): - ML model tests: 30 tests - Rules engine tests: 55 tests - Ensemble tests: 16 tests

Recent additions (Phase 2): - test_rule_improvements.py - 16 comprehensive tests - Phonetic name detection - Year-specific discounts - Combined improvements - Backwards compatibility

Troubleshooting¶

Model Not Loading¶

Error: FileNotFoundError: Model not found

Solution: Check model files exist:

ls backend/app/services/ml_models/trained/

If missing, train the model first:

python scripts/train_lightgbm.py

High MAPE on Specific Plates¶

Question: Why do some plates have high error?

Answer: Common causes: 1. Rare patterns: <10 training examples 2. Ultra-premium: £250k+ plates have limited data 3. Market anomalies: Auction fever, celebrity ownership 4. Recent trends: Emerging patterns ML hasn't seen

Solution: Rules engine handles these cases (adaptive weighting)

Predictions Too Conservative¶

Observation: ML predicts lower than actual for some plates

Explanation: After Phase 2, median ratio is 0.56× (slight under-prediction). This is intentional to avoid the previous overvaluation problem. Phase 3 will balance this.

Dependencies¶

lightgbm==4.3.0        # Primary ML model (preferred)
xgboost==2.1.3         # Fallback ML model
scikit-learn==1.4.0    # Feature scaling, evaluation
pandas==2.2.0          # Feature engineering
numpy==1.26.3          # Array operations

Credits¶

ML System: Claude Code (Anthropic) Training Data: 17,813 DVLA auction sales (2020-2025) Architecture: Ensemble learning (95% LightGBM + 5% Enhanced Rules) Feature Engineering: 60+ features with adaptive weighting Phase 2 Improvements: Data-driven multipliers + phonetic detection + year-specific discounts

Status: Phase 3.4 Complete Last Updated: 2026-02-08 MAPE: 42.9% (down from 328%) Within ±30%: 45.1% Rules Median Ratio: 1.004 (calibrated) Next: Investigate word detection over-prediction Related: Architecture Decisions