Loan-default classifier that proves its own accuracy for lenders
Challenge: A credit officer at a small lender cannot carefully review every loan application, so they either rush (and approve defaults that cost real money) or over-review (and waste days on safe applicants). They have no defensible way to state how accurate their triage actually is. The market sells a black-box credit score that is never measured on the lender's own loan book, or a heavyweight machine-learning platform with no per-record human-in-the-loop workflow.
Solution: Borrower Risk Evaluator classifies each borrower as default or no default with a confidence score, routes them auto-approve, auto-flag, or send-to-human, and then proves its accuracy live against the known repayment outcomes. The headline skill is the evaluation: a quality scorecard measured against ground truth, with a 95% confidence interval on every number.
The headline. Accuracy, precision, recall, F1, a confusion matrix, and accuracy-at-high-confidence, all measured against the real Current_loan_status label, each carrying a 95% confidence interval so the audience knows exactly how much to trust the number.
A confidence-threshold slider that re-splits the auto-pass and escalate piles live and redraws an accuracy-versus-coverage tradeoff curve. Raise the bar and the cleared pile gets more accurate but fewer cases clear.
Every borrower routed into three lanes: auto-approve, auto-flag, and a human review queue. Only the uncertain cases reach the queue. A by-loan-type breakdown shows how each purpose splits.
A dedicated tab covering how accuracy is verified, why results stay consistent, and what changes to scale the demo into a system handling 10,000+ borrowers.
Each borrower is classified by Google Gemini using structured output: a typed verdict of status, confidence, and reason. The prompt sees the borrower's facts only, never the true label, so the scoring is honest.
The genuinely new work: every prediction is compared to the borrower's recorded outcome to produce a real, statistical scorecard. Most AI demos report a confidence number that is never checked. This one checks it.
A short, deterministic rule, not an LLM, reads the confidence and routes the borrower auto-approve, auto-flag, or send-to-human. Not every component needs a model; the gate is auditable by design.
Language-model confidence is famously miscalibrated, which is exactly why measuring it matters. The accuracy-at-confidence view shows whether the model's certainty can actually be trusted.
data/borrowers.js, loaded via <script src> so it works on GitHub Pages and offlineaistudio.google.com/apikey)The same design framework, classify each record, score its confidence, auto-clear the confident cases, escalate the uncertain ones, and prove accuracy against a known answer key, is not specific to lending. It transfers to any problem that meets two conditions:
Two examples that fit, each with the condition that makes the framework work:
Classify each candidate-and-role pair as a match or not, with a confidence score. Auto-shortlist the confident matches, send the borderline ones to a recruiter, and score the model against who was actually hired and how they performed.
Condition to reuse the framework: you hold historical hiring outcomes (hired, retained, rated). The catch is selection bias: you only observe outcomes for people who were actually hired, so the answer key is partial. Use it as decision support with a fairness check, never as an automatic reject.
Classify each pair of people as compatible or not, with a confidence score. Auto-suggest the confident matches, hold the uncertain ones, and score against what actually happened between matched pairs.
Condition to reuse the framework: you define a concrete proxy for success up front, for example a mutual like within seven days, or still together at six months. The label here is softer, sparser, and slower than a loan outcome, so the scorecard is honest only when the proxy is stated plainly.
The rule of thumb: the stronger and more complete the answer key, the more trustworthy the scorecard. Loan default is close to ideal because the outcome is objective and eventually known for every record. Job matching and couple matching work too, with clear notes on where their labels are biased or soft.