Deterministic claim triage for insurers, scored on real fraud
Challenge: Motor claims arrive in a flood, and most are honest. A handler has to do two jobs at once: send each claim to the right desk, and catch the rare fraud hiding in the pile. Rush and you pay the fraud; over-investigate and honest customers wait weeks. And when the board asks how good the fraud triage really is, there is no measured answer, only a vendor's black-box score that was never tested on this insurer's own claims, and that often gives the same claim two different verdicts.
Solution: A transparent, deterministic yardstick sorts every claim into one of four handling categories from the facts that actually predict fraud in the data, then the fraud flags are measured against the recorded outcome over the full book of 15,420 claims. The same claim always lands in the same category, whether it is reviewed alone or in the whole batch, because the category is arithmetic, not a guess.
Every claim sorted into Fast track, Approve, Investigate, or Repudiate by its risk-point total, with a headline percentage cleared and a filterable, paginated view of each category across the whole 15,420-claim book.
Catch rate, flag accuracy, F1, false-alarm rate, and a four-box confusion matrix, all checked against the real FraudFound_P label over the full population, each carrying a tight 95% confidence interval. Plain-English first, the technical figure underneath.
The four categories are fixed; this dial is a separate question for the fraud team: at what risk-point level should a claim be flagged for review? Move it and watch catch rate, false alarms, and workload trade off live.
A fraud-rate-by-risk-point chart that shows the score genuinely separates fraud (it climbs monotonically), plus a fraud-rate-by-accident-area sense check against the truth.
Type any claim ID, or draw one at random, and see exactly how the rule classifies it: the points it scored and why, its category, the clear-versus-human outcome, and whether it was really fraud. The identical rule the whole batch uses, so it always agrees.
There is no model in the classification path. A claim scores points from the signals the data proves predict fraud, and the total decides the category. The rule is published, so any claim can be checked by hand.
| Signal in the claim | Points |
|---|---|
| Policy holder at fault (vs third party, near-zero fraud) | +2 |
| All Perils policy (Collision +1, Liability +0) | +2 |
| Recent address change at claim (under 6 months or 2 to 3 years) | +2 |
| Accident at policy start (zero days policy-to-accident) | +2 |
| Rural accident | +1 |
| Vehicle price at an extreme (under 20k or over 69k) | +1 |
| Vehicle 0 to 4 years old | +1 |
| Total points | Category | Action |
|---|---|---|
| 0 to 2 | Fast track | auto-clear, minimal effort |
| 3 | Approve | pay after standard processing |
| 4 to 5 | Investigate | refer to the fraud unit |
| 6 or more | Repudiate | recommend denial, a person decides |
The total is a fixed sum over fixed fields, so a claim scores the same every time and never gets two verdicts. This was the explicit fix for an earlier version that classified probabilistically and could contradict itself.
Fault, policy type, recent address change, and zero-day-policy accidents are the real fraud drivers here. Intuitive signals like "prior-claims pattern" were dropped because the data shows them weak or inverse.
The fraud label is never an input to the rule. It is revealed only on the scorecard, to measure how often the flags are right. That keeps the evaluation honest.
Investigate and Repudiate are the fraud flags, scored against the label. Repudiate means strong fraud grounds; this public dataset has no coverage field, so coverage-based repudiation is noted as out of scope.
Running the yardstick inside a claims team needs the operating layer around it. The companion QC Pack documents the rule and how a team uses it day to day.
The full point rule and the four cut points, with how each signal was justified by the data. What it is for: giving every handler one fixed, auditable definition of each category, so triage is identical from desk to desk.
Sample claims scored point by point into each category. What it is for: onboarding a new handler and showing exactly why a claim lands where it does.
The fraud-rate-by-field analysis behind the weights, including the intuitive signals that were dropped. What it is for: defending the rule to a compliance reviewer and re-deriving it on a new book.
The daily review-queue steps and the weekly scorecard the team maintains. What it is for: running the queue and keeping the rule honest over time.
data/claims-full.js (about 520 KB), loaded via <script src> so it works on GitHub Pages and offlineCASE or a service that classifies the live claims feedStrip away the motor-insurance specifics and the pattern reusable here is build-a-rule-then-measure-it: score each record on the signals that actually predict the rare outcome in the data, sort it into a fixed category, and grade the flags against a known answer key. It fits wherever two things hold:
Two examples that fit, each with the catch that decides how far to trust the scorecard:
Score each submitted claim line on the fields that predict recovery, sort into pay / review / deny, and grade the flags against which lines were later recovered or written off.
Condition to reuse the framework: you hold historical adjudication outcomes. The catch is that the answer key is partial for lines no one ever audited, so use it as decision support, not an automatic denial.
Score each warranty claim on the signals that separate valid from abuse, sort into fast-track / review / reject, and grade against which claims were ultimately honoured.
Condition to reuse the framework: you define a concrete outcome up front, for example honoured at final assessment. The rule is only as honest as that definition, so it must be stated plainly.
The rule of thumb: a transparent rule scored against a real answer key beats a confident black box. Motor fraud is close to ideal because the label is recorded for every claim, even though it is rare.