๐Ÿš— Motor Claim Triage and QC Evaluator

Deterministic claim triage for insurers, scored on real fraud

Deterministic Yardstick Four-Category Triage Fraud QC Scorecard Vanilla JS No Backend, No Key GitHub Pages

๐Ÿ“‹ Project Overview & Problem Statement

Challenge: Motor claims arrive in a flood, and most are honest. A handler has to do two jobs at once: send each claim to the right desk, and catch the rare fraud hiding in the pile. Rush and you pay the fraud; over-investigate and honest customers wait weeks. And when the board asks how good the fraud triage really is, there is no measured answer, only a vendor's black-box score that was never tested on this insurer's own claims, and that often gives the same claim two different verdicts.

Solution: A transparent, deterministic yardstick sorts every claim into one of four handling categories from the facts that actually predict fraud in the data, then the fraud flags are measured against the recorded outcome over the full book of 15,420 claims. The same claim always lands in the same category, whether it is reviewed alone or in the whole batch, because the category is arithmetic, not a guess.

Key Benefits

๐Ÿ–ฅ๏ธ Application Features

๐Ÿ—‚๏ธ Four-Category Triage Console

Every claim sorted into Fast track, Approve, Investigate, or Repudiate by its risk-point total, with a headline percentage cleared and a filterable, paginated view of each category across the whole 15,420-claim book.

โœ… Fraud QC Scorecard

Catch rate, flag accuracy, F1, false-alarm rate, and a four-box confusion matrix, all checked against the real FraudFound_P label over the full population, each carrying a tight 95% confidence interval. Plain-English first, the technical figure underneath.

๐ŸŽš๏ธ Review Threshold

The four categories are fixed; this dial is a separate question for the fraud team: at what risk-point level should a claim be flagged for review? Move it and watch catch rate, false alarms, and workload trade off live.

๐Ÿ“ Calibration and Segments

A fraud-rate-by-risk-point chart that shows the score genuinely separates fraud (it climbs monotonically), plus a fraud-rate-by-accident-area sense check against the truth.

๐Ÿ“„ Check One Claim

Type any claim ID, or draw one at random, and see exactly how the rule classifies it: the points it scored and why, its category, the clear-versus-human outcome, and whether it was really fraud. The identical rule the whole batch uses, so it always agrees.

๐Ÿงฎ How the Classification Works

There is no model in the classification path. A claim scores points from the signals the data proves predict fraud, and the total decides the category. The rule is published, so any claim can be checked by hand.

Signal in the claimPoints
Policy holder at fault (vs third party, near-zero fraud)+2
All Perils policy (Collision +1, Liability +0)+2
Recent address change at claim (under 6 months or 2 to 3 years)+2
Accident at policy start (zero days policy-to-accident)+2
Rural accident+1
Vehicle price at an extreme (under 20k or over 69k)+1
Vehicle 0 to 4 years old+1
Total pointsCategoryAction
0 to 2Fast trackauto-clear, minimal effort
3Approvepay after standard processing
4 to 5Investigaterefer to the fraud unit
6 or moreRepudiaterecommend denial, a person decides

๐Ÿ” Same Claim, Same Category

The total is a fixed sum over fixed fields, so a claim scores the same every time and never gets two verdicts. This was the explicit fix for an earlier version that classified probabilistically and could contradict itself.

๐Ÿ“Š Signals Chosen from the Data

Fault, policy type, recent address change, and zero-day-policy accidents are the real fraud drivers here. Intuitive signals like "prior-claims pattern" were dropped because the data shows them weak or inverse.

๐ŸŽฏ The Truth Is Held Back

The fraud label is never an input to the rule. It is revealed only on the scorecard, to measure how often the flags are right. That keeps the evaluation honest.

๐Ÿšฆ Flags Are a Review Trigger

Investigate and Repudiate are the fraud flags, scored against the label. Repudiate means strong fraud grounds; this public dataset has no coverage field, so coverage-based repudiation is noted as out of scope.

๐Ÿ“ฆ The QC Pack: From Rule to Operating Procedure

Running the yardstick inside a claims team needs the operating layer around it. The companion QC Pack documents the rule and how a team uses it day to day.

๐Ÿ“‹ Classification Yardstick

The full point rule and the four cut points, with how each signal was justified by the data. What it is for: giving every handler one fixed, auditable definition of each category, so triage is identical from desk to desk.

๐Ÿงพ Worked Examples

Sample claims scored point by point into each category. What it is for: onboarding a new handler and showing exactly why a claim lands where it does.

๐Ÿ“ˆ How the Signals Were Chosen

The fraud-rate-by-field analysis behind the weights, including the intuitive signals that were dropped. What it is for: defending the rule to a compliance reviewer and re-deriving it on a new book.

โœ… QC SOP

The daily review-queue steps and the weekly scorecard the team maintains. What it is for: running the queue and keeping the rule honest over time.

๐Ÿ› ๏ธ Technical Architecture & Implementation

Frontend Stack

Single-file HTML Vanilla JavaScript Inline SVG charts No build step

Classification & Data

Deterministic rule (fixed point score) No model, no API key Full 15,420-claim book Python + pandas (data build)

Deployment & Infrastructure

GitHub Pages Dictionary-encoded asset (claims-full.js) No backend

System Architecture

๐Ÿ“– Setup & How to Run

Prerequisites

Run the Demo

# Open the live demo, or run locally: git clone https://github.com/lyven81/ai-project.git cd ai-project/projects/motor-claim-evaluator # Open demo.html in a browser. All 15,420 claims are classified # instantly by the fixed yardstick. No key, no run button.

Rebuild the Data Asset (optional)

# Re-encode all real claims from the source dataset pip install pandas python build_full_data.py # -> writes data/claims-full.js and data/meta.json

๐Ÿš€ Deployment

# Fully static. Deployed on GitHub Pages, no server, no key. # Live at: # https://lyven81.github.io/ai-project/projects/motor-claim-evaluator/demo.html

Production Notes

๐Ÿ“Š Key Metrics

15,420
Real Claims Classified (Full Population)
4
Handling Categories (Deterministic)
67%
Fraud Catch Rate at the Default Threshold
0
Models, API Keys, or Backend Servers

Business Value

๐Ÿ” Potential Use Cases

Strip away the motor-insurance specifics and the pattern reusable here is build-a-rule-then-measure-it: score each record on the signals that actually predict the rare outcome in the data, sort it into a fixed category, and grade the flags against a known answer key. It fits wherever two things hold:

Two examples that fit, each with the catch that decides how far to trust the scorecard:

๐Ÿฅ Medical Billing Review

Score each submitted claim line on the fields that predict recovery, sort into pay / review / deny, and grade the flags against which lines were later recovered or written off.

Condition to reuse the framework: you hold historical adjudication outcomes. The catch is that the answer key is partial for lines no one ever audited, so use it as decision support, not an automatic denial.

๐Ÿ“ฆ Warranty Claim Triage

Score each warranty claim on the signals that separate valid from abuse, sort into fast-track / review / reject, and grade against which claims were ultimately honoured.

Condition to reuse the framework: you define a concrete outcome up front, for example honoured at final assessment. The rule is only as honest as that definition, so it must be stated plainly.

The rule of thumb: a transparent rule scored against a real answer key beats a confident black box. Motor fraud is close to ideal because the label is recorded for every claim, even though it is rare.