🔍 About This Code Showcase
These snippets show the parts that make this an evaluation project: the typed Gemini verdict, the honest mapping that hides the label from the model, the metrics engine, and the confidence interval.
The full page is a single self-contained HTML file with no backend. Real borrowers are prepared offline by a keyless Python script and shipped as a static JavaScript asset.
📁 Project Structure
projects/borrower-risk-evaluator/
├── index.html ← the whole app: classify, score, charts, safety dial
├── demo.html ← same file, the launch target
├── data/
│ ├── borrowers.js ← 150 real held-out borrowers (window.BORROWERS)
│ └── meta.json ← headline counts
├── prep_data.py ← keyless: clean dataset.csv -> sample -> emit borrowers.js
├── dataset.csv ← 32,577 real labelled loans (source)
├── system-prompt.txt ← the single documented classification prompt + taxonomy
├── problem-statement.md
├── project-outline.md
└── user-guide.md
🧠 The Typed Verdict (Gemini, structured output)
The classifier asks Gemini for a strict JSON object: status, confidence, and reason. A response schema forces the shape, and temperature 0 makes the verdict consistent. The borrower's facts go in; the true label never does.
const SCHEMA = {type:"OBJECT", properties:{
status:{type:"STRING", enum:["DEFAULT","NO DEFAULT"]},
confidence:{type:"NUMBER"},
reason:{type:"STRING"}
}, required:["status","confidence","reason"]};
async function classifyOne(f, key){
const body = {contents:[{role:"user", parts:[{text:buildPrompt(f)}]}],
generationConfig:{temperature:0, responseMimeType:"application/json", responseSchema:SCHEMA}};
const res = await fetch(url, {method:"POST", headers:{...}, body:JSON.stringify(body)});
const out = JSON.parse(text);
// "NO DEFAULT" contains the word DEFAULT, so test for "NO" first
const status = out.status.toUpperCase().includes("NO DEFAULT") ? "NO DEFAULT" : "DEFAULT";
return {status, confidence:clamp(conf, 0.5, 1), reason:out.reason};
}
🔒 The Honest Mapping (label used only for scoring)
The model's status and confidence become a default probability p. The real outcome truth is attached for scoring, but it is never part of the prompt. This is what makes the scorecard trustworthy.
function toRow(rec, verdict){
const f = rec.features;
// status + confidence -> probability of default
const p = verdict.status === "DEFAULT" ? verdict.confidence : 1 - verdict.confidence;
return {
id:rec.id, grade:f.loan_grade, intent:f.loan_intent, intRate:f.interest_rate,
lti:f.loan_to_income_pct/100,
p:p, // the prediction
truth:rec.true_label === "DEFAULT" ? 1 : 0, // scored AFTER, never shown to the model
reason:verdict.reason
};
}
📊 The Metrics Engine
One pass over the predictions builds the confusion matrix, precision, recall, F1, and the high-confidence accuracy and coverage at the current safety-dial threshold. This is the evaluation harness that ports straight to production.
function compute(t){
let TP=0, FP=0, FN=0, TN=0, hcC=0, hcT=0;
for(const r of ROWS){
const pred = r.p >= 0.5 ? 1 : 0, conf = Math.max(r.p, 1-r.p);
if(pred && r.truth) TP++; else if(pred) FP++; else if(r.truth) FN++; else TN++;
if(conf >= t){ hcT++; if(pred === r.truth) hcC++; } // cleared at this confidence
}
const prec = TP/(TP+FP||1), rec = TP/(TP+FN||1);
return {acc:(TP+TN)/ROWS.length, prec, rec, f1:2*prec*rec/((prec+rec)||1),
hcAcc:hcT?hcC/hcT:1, cov:hcT/ROWS.length, TP, FP, FN, TN, hcC, hcT};
}
📐 The Confidence Interval (Wilson score)
Every headline number carries a 95% interval, so the audience knows the margin of error and can watch it tighten as the sample grows. This is what answers "you only tested 150, is that enough?".
// 95% Wilson score interval for a proportion k/n
function wilson(k, n){
if(!n) return [0,0];
const z=1.96, p=k/n, d=1+z*z/n;
const c=(p+z*z/(2*n))/d;
const m=(z*Math.sqrt(p*(1-p)/n + z*z/(4*n*n)))/d;
return [clamp(c-m,0,1), clamp(c+m,0,1)];
}
🐍 The Keyless Data Prep
A small Python script cleans the real dataset, parses the currency loan amounts, derives the explainability ratios, and emits a stratified sample of real borrowers as a JavaScript asset the page loads directly.
def parse_amount(val): # "£35,000.00" -> 35000.0
s = re.sub(r"[^0-9.]", "", str(val))
return float(s) if s else None
# keep the real ~21% default rate in the sample
df["loan_to_income_pct"] = (df["loan_amount"] / df["customer_income"] * 100).round(1)
sample = stratified(df, SAMPLE_N, SEED)
# -> window.BORROWERS = [ {id, features, true_label}, ... ]