๐Ÿš— Motor Claim Triage and QC Evaluator

Source Code & Deterministic Yardstick

Vanilla JS No Model, No Key Python + pandas Self-Contained

๐Ÿ” About This Code Showcase

These snippets show the parts that make this both a triage tool and an honest evaluator: the fixed yardstick that decides the category, the one-pass classification of the whole book, the full-population metrics, and the Wilson interval.

There is no language model in the classification path. The category is a fixed point score over fixed fields, computed in the browser, so the same claim always lands in the same category. The fraud label is used only to score the flags, never to decide them.

๐Ÿ“ Project Structure

projects/motor-claim-evaluator/ โ”œโ”€โ”€ demo.html ← the whole app: triage, scorecard, review dial, check one claim โ”œโ”€โ”€ data/ โ”‚ โ”œโ”€โ”€ claims-full.js ← all 15,420 real claims, dictionary-encoded (window.MC) โ”‚ โ””โ”€โ”€ meta.json ← population + per-category counts โ”œโ”€โ”€ build_full_data.py ← encodes dataset.csv -> claims-full.js โ””โ”€โ”€ dataset.csv ← 15,420 real Kaggle claims (source)

๐Ÿงฎ The Fixed Yardstick (the whole classifier)

A claim scores points from the signals the data proves predict fraud, and the total decides the category. This is the entire decision: no model, no temperature, no randomness. The same facts always give the same points and the same category.

๐Ÿ“„ demo.html ยท yardstick(claim)
function yardstick(o){ let s=0; if(o.fault==="Policy Holder") s+=2; if(o.base_policy==="All Perils") s+=2; else if(o.base_policy==="Collision") s+=1; if(o.address_change_claim==="under 6 months" || o.address_change_claim==="2 to 3 years") s+=2; if(o.days_policy_accident==="none") s+=2; if(o.accident_area==="Rural") s+=1; if(o.vehicle_price==="less than 20000" || o.vehicle_price==="more than 69000") s+=1; if(o.age_of_vehicle==="new" || o.age_of_vehicle==="3 years" || o.age_of_vehicle==="4 years") s+=1; const cat = s<=2 ? "fast" : s===3 ? "approve" : s<=5 ? "investigate" : "repudiate"; return {points:s, cat}; }

The same rule as a governed SQL CASE, for a production claims database:

๐Ÿ“„ the equivalent fixed query
SELECT claim_id, risk_points, CASE WHEN risk_points<=2 THEN 'Fast track' WHEN risk_points=3 THEN 'Approve' WHEN risk_points<=5 THEN 'Investigate' ELSE 'Repudiate' END AS category FROM ( /* risk_points = the same CASE sum as above */ );

โš™๏ธ One-Pass Classification of the Whole Book

On load, the dictionary-encoded data is decoded and every claim is classified once. A small histogram by risk points is all the scorecard needs, so the full population is scored instantly with no per-claim cost.

๐Ÿ“„ demo.html ยท classifyAll()
function decode(i){ const r=MC.rows[i], o={id:"CLM-"+(1000+i), isFraud:MC.fraud[i]}; MC.fields.forEach((f,j)=> o[f]=MC.legend[f][r[j]]); return o; } const byPoints = Array.from({length:12}, ()=>({n:0, fraud:0})); for(let i=0; i<N; i++){ const y = yardstick(decode(i)); catCount[y.cat]++; byPoints[y.points].n++; byPoints[y.points].fraud += MC.fraud[i]; // label only counted here, after the rule }

๐Ÿ“Š Full-Population Metrics from the Histogram

The review threshold flags a claim when its risk points are at or above the dial. Because the histogram already holds the fraud count at every points level, precision, recall, and the false-alarm rate over all 15,420 claims are a quick sum, recomputed instantly as the dial moves.

๐Ÿ“„ demo.html ยท metricsAt(flag)
function metricsAt(flag){ let TP=0, FP=0, FN=0, TN=0; for(let p=0; p<12; p++){ const b=byPoints[p], flagged = p>=flag; if(flagged){ TP+=b.fraud; FP+=b.n-b.fraud; } else{ FN+=b.fraud; TN+=b.n-b.fraud; } } const prec=TP/(TP+FP||1), rec=TP/(TP+FN||1), fpr=FP/(FP+TN||1); return {TP,FP,FN,TN, prec, rec, fpr, recCI:wilson(TP,TP+FN), precCI:wilson(TP,TP+FP)}; }

๐Ÿ“ The Confidence Interval (Wilson score)

Every headline number carries a 95% interval. Over the full 15,420-claim population these ranges are very tight, which is the point of scoring everything rather than a sample.

๐Ÿ“„ demo.html ยท wilson(k, n)
// 95% Wilson score interval for a proportion k/n function wilson(k, n, z){ z = z || 1.96; if(n===0) return [0,0]; const p=k/n, d=1+z*z/n; const c=(p+z*z/(2*n))/d; const h=(z*Math.sqrt(p*(1-p)/n + z*z/(4*n*n)))/d; return [clamp(c-h,0,1), clamp(c+h,0,1)]; }

๐Ÿ The Compact Data Build

A small Python script keeps only the fields the yardstick and the display need, then dictionary-encodes every claim so all 15,420 fit in about 520 KB and load via a single script tag, no fetch, no backend.

๐Ÿ“„ build_full_data.py ยท excerpt
# list each field's distinct values once; encode each claim as indices into them legend = {f: sorted(df[f].unique().tolist()) for f in fields} index = {f: {v:i for i,v in enumerate(legend[f])} for f in fields} rows = [[index[f][r[f]] for f in fields] for _, r in df.iterrows()] # fraud label travels separately; read only when scoring, never an input to the rule fraud = df["FraudFound_P"].astype(int).tolist() # -> window.MC = { fields, legend, rows, fraud }