Motor Claim Triage and QC Evaluator - Deterministic Triage Scored on Real Fraud

📋 Project Overview & Problem Statement

Challenge: Motor claims arrive in a flood, and most are honest. A handler has to do two jobs at once: send each claim to the right desk, and catch the rare fraud hiding in the pile. Rush and you pay the fraud; over-investigate and honest customers wait weeks. And when the board asks how good the fraud triage really is, there is no measured answer, only a vendor's black-box score that was never tested on this insurer's own claims, and that often gives the same claim two different verdicts.

Solution: A transparent, deterministic yardstick sorts every claim into one of four handling categories from the facts that actually predict fraud in the data, then the fraud flags are measured against the recorded outcome over the full book of 15,420 claims. The same claim always lands in the same category, whether it is reviewed alone or in the whole batch, because the category is arithmetic, not a guess.

Key Benefits

Same answer every time: the category is a fixed point score over fixed fields, so a claim never gets two verdicts; batch and single-claim always agree
Data-grounded, not intuition: the scoring signals were chosen because they actually separate fraud in this dataset, not from generic "late report" folklore that the data contradicts
Honest scope: the four-way routing is operational; the fraud flags are graded against the real label, with a 95% confidence interval on every figure
Full population, no rebalancing: classification is free arithmetic, so all 15,420 claims are scored, at the true 6% base rate, with no sampling caveat
Governance built in: no model, no API key, no auto-denial; a human owns every flag and every repudiation

🖥️ Application Features

🗂️ Four-Category Triage Console

Every claim sorted into Fast track, Approve, Investigate, or Repudiate by its risk-point total, with a headline percentage cleared and a filterable, paginated view of each category across the whole 15,420-claim book.

✅ Fraud QC Scorecard

Catch rate, flag accuracy, F1, false-alarm rate, and a four-box confusion matrix, all checked against the real FraudFound_P label over the full population, each carrying a tight 95% confidence interval. Plain-English first, the technical figure underneath.

🎚️ Review Threshold

The four categories are fixed; this dial is a separate question for the fraud team: at what risk-point level should a claim be flagged for review? Move it and watch catch rate, false alarms, and workload trade off live.

📐 Calibration and Segments

A fraud-rate-by-risk-point chart that shows the score genuinely separates fraud (it climbs monotonically), plus a fraud-rate-by-accident-area sense check against the truth.

📄 Check One Claim

Type any claim ID, or draw one at random, and see exactly how the rule classifies it: the points it scored and why, its category, the clear-versus-human outcome, and whether it was really fraud. The identical rule the whole batch uses, so it always agrees.

🧮 How the Classification Works

There is no model in the classification path. A claim scores points from the signals the data proves predict fraud, and the total decides the category. The rule is published, so any claim can be checked by hand.

Signal in the claim	Points
Policy holder at fault (vs third party, near-zero fraud)	+2
All Perils policy (Collision +1, Liability +0)	+2
Recent address change at claim (under 6 months or 2 to 3 years)	+2
Accident at policy start (zero days policy-to-accident)	+2
Rural accident	+1
Vehicle price at an extreme (under 20k or over 69k)	+1
Vehicle 0 to 4 years old	+1

Total points	Category	Action
0 to 2	Fast track	auto-clear, minimal effort
3	Approve	pay after standard processing
4 to 5	Investigate	refer to the fraud unit
6 or more	Repudiate	recommend denial, a person decides

🔁 Same Claim, Same Category

The total is a fixed sum over fixed fields, so a claim scores the same every time and never gets two verdicts. This was the explicit fix for an earlier version that classified probabilistically and could contradict itself.

📊 Signals Chosen from the Data

Fault, policy type, recent address change, and zero-day-policy accidents are the real fraud drivers here. Intuitive signals like "prior-claims pattern" were dropped because the data shows them weak or inverse.

🎯 The Truth Is Held Back

The fraud label is never an input to the rule. It is revealed only on the scorecard, to measure how often the flags are right. That keeps the evaluation honest.

🚦 Flags Are a Review Trigger

Investigate and Repudiate are the fraud flags, scored against the label. Repudiate means strong fraud grounds; this public dataset has no coverage field, so coverage-based repudiation is noted as out of scope.

📦 The QC Pack: From Rule to Operating Procedure

Running the yardstick inside a claims team needs the operating layer around it. The companion QC Pack documents the rule and how a team uses it day to day.

📋 Classification Yardstick

The full point rule and the four cut points, with how each signal was justified by the data. What it is for: giving every handler one fixed, auditable definition of each category, so triage is identical from desk to desk.

🧾 Worked Examples

Sample claims scored point by point into each category. What it is for: onboarding a new handler and showing exactly why a claim lands where it does.

📈 How the Signals Were Chosen

The fraud-rate-by-field analysis behind the weights, including the intuitive signals that were dropped. What it is for: defending the rule to a compliance reviewer and re-deriving it on a new book.

✅ QC SOP

The daily review-queue steps and the weekly scorecard the team maintains. What it is for: running the queue and keeping the rule honest over time.

📦 Open the QC Pack

🛠️ Technical Architecture & Implementation

Frontend Stack

Single-file HTML Vanilla JavaScript Inline SVG charts No build step

Classification & Data

Deterministic rule (fixed point score) No model, no API key Full 15,420-claim book Python + pandas (data build)

Deployment & Infrastructure

GitHub Pages Dictionary-encoded asset (claims-full.js) No backend

System Architecture

No backend, no model, no cost: classification is arithmetic that runs entirely in the browser; nothing is sent anywhere
Real data as a compact asset: a Python script dictionary-encodes all 15,420 real Kaggle claims into data/claims-full.js (about 520 KB), loaded via <script src> so it works on GitHub Pages and offline
Deterministic by construction: the same fixed point rule classifies every claim, so the category is invariant across batch and single-claim views
Honest scoring: the fraud label travels with each claim but is never an input to the rule; it is used only to score the flags
Self-contained: classification, scoring, charts, the review-threshold dial, and single-claim lookup all run in one HTML file

📖 Setup & How to Run

Prerequisites

A modern browser. No install, no key, no Node, no Python needed to run the demo.

Run the Demo

# Open the live demo, or run locally:
git clone https://github.com/lyven81/ai-project.git
cd ai-project/projects/motor-claim-evaluator

# Open demo.html in a browser. All 15,420 claims are classified
# instantly by the fixed yardstick. No key, no run button.
            

Rebuild the Data Asset (optional)

# Re-encode all real claims from the source dataset
pip install pandas
python build_full_data.py
# -> writes data/claims-full.js and data/meta.json
            

🚀 Deployment

# Fully static. Deployed on GitHub Pages, no server, no key.
# Live at:
# https://lyven81.github.io/ai-project/projects/motor-claim-evaluator/demo.html
            

Production Notes

The yardstick runs over the full demo book in the browser; in production it becomes a SQL CASE or a service that classifies the live claims feed
The rule is fixed and published, so results reproduce exactly and an auditor can re-derive any category by hand
Frame as decision support: the rule triages, a human owns the final call on any flag, and no claim is ever auto-denied

📊 Key Metrics

15,420

Real Claims Classified (Full Population)

4

Handling Categories (Deterministic)

67%

Fraud Catch Rate at the Default Threshold

0

Models, API Keys, or Backend Servers

Business Value

Consistency you can trust: the same claim always lands in the same category, with no model drift or contradictory verdicts
Trust through evidence: proves the catch rate on the insurer's own full labelled history, not a vendor's black box
Less manual review: 44% of claims fast-track at a 0.4% fraud rate, so assessors spend time only where it matters
Audit-ready: a published, point-by-point rule and a measured scorecard a compliance reviewer could re-derive
Closes the evaluation gap: demonstrates a transparent rule measured against ground truth, not just produced

🔁 Potential Use Cases

Strip away the motor-insurance specifics and the pattern reusable here is build-a-rule-then-measure-it: score each record on the signals that actually predict the rare outcome in the data, sort it into a fixed category, and grade the flags against a known answer key. It fits wherever two things hold:

A high-volume queue with a rare flag: most records are routine, a small fraction need scrutiny, and the cost of missing the flag is high.
An answer key for that one flag: past records eventually record whether the flag was real, so the rule's flags can be scored.

Two examples that fit, each with the catch that decides how far to trust the scorecard:

🏥 Medical Billing Review

Score each submitted claim line on the fields that predict recovery, sort into pay / review / deny, and grade the flags against which lines were later recovered or written off.

Condition to reuse the framework: you hold historical adjudication outcomes. The catch is that the answer key is partial for lines no one ever audited, so use it as decision support, not an automatic denial.

📦 Warranty Claim Triage

Score each warranty claim on the signals that separate valid from abuse, sort into fast-track / review / reject, and grade against which claims were ultimately honoured.

Condition to reuse the framework: you define a concrete outcome up front, for example honoured at final assessment. The rule is only as honest as that definition, so it must be stated plainly.

The rule of thumb: a transparent rule scored against a real answer key beats a confident black box. Motor fraud is close to ideal because the label is recorded for every claim, even though it is rare.

🚗 Motor Claim Triage and QC Evaluator