📊 Borrower Risk Evaluator

Source Code & Evaluation Engine

Vanilla JS Gemini AI Python + pandas Self-Contained

🔍 About This Code Showcase

These snippets show the parts that make this an evaluation project: the typed Gemini verdict, the honest mapping that hides the label from the model, the metrics engine, and the confidence interval.

The full page is a single self-contained HTML file with no backend. Real borrowers are prepared offline by a keyless Python script and shipped as a static JavaScript asset.

📁 Project Structure

projects/borrower-risk-evaluator/ ├── index.html ← the whole app: classify, score, charts, safety dial ├── demo.html ← same file, the launch target ├── data/ │ ├── borrowers.js ← 150 real held-out borrowers (window.BORROWERS) │ └── meta.json ← headline counts ├── prep_data.py ← keyless: clean dataset.csv -> sample -> emit borrowers.js ├── dataset.csv ← 32,577 real labelled loans (source) ├── system-prompt.txt ← the single documented classification prompt + taxonomy ├── problem-statement.md ├── project-outline.md └── user-guide.md

🧠 The Typed Verdict (Gemini, structured output)

The classifier asks Gemini for a strict JSON object: status, confidence, and reason. A response schema forces the shape, and temperature 0 makes the verdict consistent. The borrower's facts go in; the true label never does.

📄 index.html — classifyOne()
const SCHEMA = {type:"OBJECT", properties:{ status:{type:"STRING", enum:["DEFAULT","NO DEFAULT"]}, confidence:{type:"NUMBER"}, reason:{type:"STRING"} }, required:["status","confidence","reason"]}; async function classifyOne(f, key){ const body = {contents:[{role:"user", parts:[{text:buildPrompt(f)}]}], generationConfig:{temperature:0, responseMimeType:"application/json", responseSchema:SCHEMA}}; const res = await fetch(url, {method:"POST", headers:{...}, body:JSON.stringify(body)}); const out = JSON.parse(text); // "NO DEFAULT" contains the word DEFAULT, so test for "NO" first const status = out.status.toUpperCase().includes("NO DEFAULT") ? "NO DEFAULT" : "DEFAULT"; return {status, confidence:clamp(conf, 0.5, 1), reason:out.reason}; }

🔒 The Honest Mapping (label used only for scoring)

The model's status and confidence become a default probability p. The real outcome truth is attached for scoring, but it is never part of the prompt. This is what makes the scorecard trustworthy.

📄 index.html — toRow()
function toRow(rec, verdict){ const f = rec.features; // status + confidence -> probability of default const p = verdict.status === "DEFAULT" ? verdict.confidence : 1 - verdict.confidence; return { id:rec.id, grade:f.loan_grade, intent:f.loan_intent, intRate:f.interest_rate, lti:f.loan_to_income_pct/100, p:p, // the prediction truth:rec.true_label === "DEFAULT" ? 1 : 0, // scored AFTER, never shown to the model reason:verdict.reason }; }

📊 The Metrics Engine

One pass over the predictions builds the confusion matrix, precision, recall, F1, and the high-confidence accuracy and coverage at the current safety-dial threshold. This is the evaluation harness that ports straight to production.

📄 index.html — compute(threshold)
function compute(t){ let TP=0, FP=0, FN=0, TN=0, hcC=0, hcT=0; for(const r of ROWS){ const pred = r.p >= 0.5 ? 1 : 0, conf = Math.max(r.p, 1-r.p); if(pred && r.truth) TP++; else if(pred) FP++; else if(r.truth) FN++; else TN++; if(conf >= t){ hcT++; if(pred === r.truth) hcC++; } // cleared at this confidence } const prec = TP/(TP+FP||1), rec = TP/(TP+FN||1); return {acc:(TP+TN)/ROWS.length, prec, rec, f1:2*prec*rec/((prec+rec)||1), hcAcc:hcT?hcC/hcT:1, cov:hcT/ROWS.length, TP, FP, FN, TN, hcC, hcT}; }

📐 The Confidence Interval (Wilson score)

Every headline number carries a 95% interval, so the audience knows the margin of error and can watch it tighten as the sample grows. This is what answers "you only tested 150, is that enough?".

📄 index.html — wilson()
// 95% Wilson score interval for a proportion k/n function wilson(k, n){ if(!n) return [0,0]; const z=1.96, p=k/n, d=1+z*z/n; const c=(p+z*z/(2*n))/d; const m=(z*Math.sqrt(p*(1-p)/n + z*z/(4*n*n)))/d; return [clamp(c-m,0,1), clamp(c+m,0,1)]; }

🐍 The Keyless Data Prep

A small Python script cleans the real dataset, parses the currency loan amounts, derives the explainability ratios, and emits a stratified sample of real borrowers as a JavaScript asset the page loads directly.

📄 prep_data.py — excerpt
def parse_amount(val): # "£35,000.00" -> 35000.0 s = re.sub(r"[^0-9.]", "", str(val)) return float(s) if s else None # keep the real ~21% default rate in the sample df["loan_to_income_pct"] = (df["loan_amount"] / df["customer_income"] * 100).round(1) sample = stratified(df, SAMPLE_N, SEED) # -> window.BORROWERS = [ {id, features, true_label}, ... ]