Borrower Risk Evaluator
Loan-default classifier that measures its own accuracy
Most AI demos report a confidence score that nobody ever checks. Borrower Risk Evaluator checks it.
Classifying records is the easy half of an AI build. The hard half, usually skipped, is evaluation: proving with evidence that the output is any good. This project keeps the classifier deliberately simple and spends its effort on the proof.
The data is a public lending set of 32,577 labelled loans, about 21% of them ending in default. The pieces:
- The brain: Google Gemini (
gemini-2.5-flash) returns a typed verdict for each borrower, astatus, aconfidence, and a one-linereason, through structured output. The borrower's facts go in; the real outcome never does. - The gate: a short deterministic rule, not a model, routes each borrower to auto-approve, auto-flag, or send-to-human based on how sure the model is. Not every part needs an LLM.
- The proof: every prediction is scored against the real
Current_loan_statuslabel, producing accuracy, precision, recall, F1, a confusion matrix, and accuracy on the cases the model cleared on its own. Each number carries a 95% confidence interval, so the margin of error is visible rather than hidden.
A lazy model that calls every borrower "no default" already scores about 79% accuracy here while catching zero defaults. That is exactly why precision and recall, not accuracy, carry the story, and why measuring confidence matters when language-model confidence is so often miscalibrated.
The demo classifies 150 held-out borrowers live with your own Gemini key, and a built-in guide explains what changes to run this on thousands. Framed as decision support, not auto-rejection: a human owns the final call.