📊 Borrower Risk Evaluator

Loan-default classifier that proves its own accuracy for lenders

Gemini AI Classification + Evaluation Vanilla JS Self-Contained BYO-Key GitHub Pages

📋 Project Overview & Problem Statement

Challenge: A credit officer at a small lender cannot carefully review every loan application, so they either rush (and approve defaults that cost real money) or over-review (and waste days on safe applicants). They have no defensible way to state how accurate their triage actually is. The market sells a black-box credit score that is never measured on the lender's own loan book, or a heavyweight machine-learning platform with no per-record human-in-the-loop workflow.

Solution: Borrower Risk Evaluator classifies each borrower as default or no default with a confidence score, routes them auto-approve, auto-flag, or send-to-human, and then proves its accuracy live against the known repayment outcomes. The headline skill is the evaluation: a quality scorecard measured against ground truth, with a 95% confidence interval on every number.

Key Benefits

🖥️ Application Features

📊 Evaluation Scorecard

The headline. Accuracy, precision, recall, F1, a confusion matrix, and accuracy-at-high-confidence, all measured against the real Current_loan_status label, each carrying a 95% confidence interval so the audience knows exactly how much to trust the number.

🎚️ Safety Dial

A confidence-threshold slider that re-splits the auto-pass and escalate piles live and redraws an accuracy-versus-coverage tradeoff curve. Raise the bar and the cleared pile gets more accurate but fewer cases clear.

🗂️ Triage Console

Every borrower routed into three lanes: auto-approve, auto-flag, and a human review queue. Only the uncertain cases reach the queue. A by-loan-type breakdown shows how each purpose splits.

🧭 Portfolio to Production

A dedicated tab covering how accuracy is verified, why results stay consistent, and what changes to scale the demo into a system handling 10,000+ borrowers.

🤖 AI Integration & Intelligence

🧠 Typed Verdict (Gemini AI)

Each borrower is classified by Google Gemini using structured output: a typed verdict of status, confidence, and reason. The prompt sees the borrower's facts only, never the true label, so the scoring is honest.

✅ Measured Against Ground Truth

The genuinely new work: every prediction is compared to the borrower's recorded outcome to produce a real, statistical scorecard. Most AI demos report a confidence number that is never checked. This one checks it.

🚦 Deterministic Escalation Gate

A short, deterministic rule, not an LLM, reads the confidence and routes the borrower auto-approve, auto-flag, or send-to-human. Not every component needs a model; the gate is auditable by design.

🎯 Calibration as the Point

Language-model confidence is famously miscalibrated, which is exactly why measuring it matters. The accuracy-at-confidence view shows whether the model's certainty can actually be trusted.

🛠️ Technical Architecture & Implementation

Frontend Stack

Single-file HTML Vanilla JavaScript Inline SVG charts No build step

AI & Data

Google Gemini (gemini-2.5-flash) Structured JSON output BYO-key, client-side Python + pandas (data prep)

Deployment & Infrastructure

GitHub Pages Static asset (borrowers.js) No backend

System Architecture

📖 Setup & How to Run

Prerequisites

Run the Demo

# Open the live demo, or run locally: git clone https://github.com/lyven81/ai-project.git cd ai-project/projects/borrower-risk-evaluator # Open index.html in a browser, then: # 1. Paste your Gemini API key # 2. Choose how many borrowers to test (50 / 100 / 150) # 3. Click "Run evaluation"

Regenerate the Data Asset (optional)

# Re-sample real borrowers from the source dataset (keyless) pip install pandas python prep_data.py # -> writes data/borrowers.js and data/meta.json

🚀 Deployment

# Fully static. Deployed on GitHub Pages, no server. # Live at: # https://lyven81.github.io/ai-project/projects/borrower-risk-evaluator/demo.html

Production Notes

📊 Key Metrics

32,577
Real Labelled Borrowers in Source Data
150
Held-Out Borrowers Scored Live
95%
Confidence Interval on Every Metric
0
Backend Servers (Fully Client-Side)

Business Value

🔁 Potential Use Cases

The same design framework, classify each record, score its confidence, auto-clear the confident cases, escalate the uncertain ones, and prove accuracy against a known answer key, is not specific to lending. It transfers to any problem that meets two conditions:

Two examples that fit, each with the condition that makes the framework work:

🧑‍💼 Job Matching

Classify each candidate-and-role pair as a match or not, with a confidence score. Auto-shortlist the confident matches, send the borderline ones to a recruiter, and score the model against who was actually hired and how they performed.

Condition to reuse the framework: you hold historical hiring outcomes (hired, retained, rated). The catch is selection bias: you only observe outcomes for people who were actually hired, so the answer key is partial. Use it as decision support with a fairness check, never as an automatic reject.

💞 Couple Matching

Classify each pair of people as compatible or not, with a confidence score. Auto-suggest the confident matches, hold the uncertain ones, and score against what actually happened between matched pairs.

Condition to reuse the framework: you define a concrete proxy for success up front, for example a mutual like within seven days, or still together at six months. The label here is softer, sparser, and slower than a loan outcome, so the scorecard is honest only when the proxy is stated plainly.

The rule of thumb: the stronger and more complete the answer key, the more trustworthy the scorecard. Loan default is close to ideal because the outcome is objective and eventually known for every record. Job matching and couple matching work too, with clear notes on where their labels are biased or soft.