Multi-Agent Inventory Planner for Malaysian Bookstores
Challenge: Malaysian SME bookstore owners sit on tens of thousands of sales rows but rarely turn the data into stocking decisions. Pareto analysis, dead-stock detection, seasonal pre-stocking, and aging clearance each require hours of pivot-table work. Most owners run on intuition and miss obsolete syllabi (UPSR was abolished in 2021), dormant religious titles, and pre-school-season restocking windows.
Solution: Bookshelf is a 4-agent decision-support system where the owner asks any business question in plain Malaysian English and receives a focused brief grounded in their actual sales data. A deterministic pandas tool computes exact metrics; a Pydantic-typed Judge gates the research; a Content Builder turns the validated metrics into a 600-word actionable brief.
Tools like Google NotebookLM let an owner upload a CSV and ask questions. They are fast, polished, and free. Bookshelf is slower (~140 seconds per question vs sub-10 seconds) and operationally heavier. The trade is intentional โ chat-on-document tools hit a wall on numerical aggregation and quality validation. Bookshelf is built for decisions worth thousands of ringgit, where exact numbers and a quality gate matter more than speed.
| Question type | Chat-on-document tool (e.g. NotebookLM) | Bookshelf multi-agent system |
|---|---|---|
| Exact aggregates ("RM impact of dropping these 8 SKUs") | Approximates from chunks โ often wrong by 5โ20% | pandas computes exactly |
| Ranked lists by computed metric ("top 10 by velocity ร margin") | Cannot sort the whole 100k rows in one prompt | data_tool sorts deterministically before LLM sees it |
| Quality validation before recommendations | Single LLM call, no validation | Judge agent gates the brief; loops up to 3ร on fail |
| Speed for casual exploration | Sub-10s response | ~140s end-to-end |
| Multimodal output (audio overview, mind map, quizzes) | Built-in | Not in Phase 1 |
The real value of building a system isn't beating NotebookLM at chat. It's that a system can do four things NotebookLM cannot:
The Phase 1 build proves the architecture. The wedges above (Phase 4+) are where the system pays back the complexity premium.
JudgeVerdict (status / issues / confidence / feedback) with structured-output decoding โ no string parsing./.well-known/agent-card.json).One text input. Owner types any inventory question โ broad ("what should I do?") or specific ("should I drop Naruto Vol 5?"). No menu navigation, no upload step.
pandas computes per-SKU revenue, margin, velocity, Pareto rank, last-sale date, aging class, seasonal indices, channel breakdown. Output JSON-serialised so the Judge can validate it byte-for-byte.
Judge runs after Researcher and emits status="pass" or "fail" with structured feedback. EscalationChecker breaks the loop on pass; on fail, Researcher reruns with the feedback. Max 3 iterations.
Content Builder classifies SKUs into push / hold / drop / restock-seasonal / discontinue / source-similar โ not just "good" vs "bad". Each action maps to a different shop-floor decision.
Built-in awareness of KSSR/KSSM workbook tiers, SPM/UPSR exam syllabi (UPSR abolished 2021), Ramadan dip, January back-to-school spike, school-bulk vs in-store channel mix.
Each SKU gets an aging_class (fresh / slowing / stale / stuck) based on days since last sale. Aging-stale SKUs are preserved in the LLM context even when they're mid-revenue, so dormant titles surface.
Five questions captured live from the running pipeline against 101,990 rows of Malaysian bookstore sales data (Jan 2024 โ Dec 2025). Click Launch Demo to browse them.
Top 6 SKUs are SPM/Form-5 workbooks & the Casio fx-570 calculator โ RM 693k revenue, 39.6% margin on the top SKU. Surfaced as push actions.
Caught the entire UPSR-syllabus cluster (4 SKUs, RM 16,980 stuck) as discontinue. UPSR was abolished in Malaysia in 2021.
SPM workbooks peak January (index 1.31) and December (1.18). Form 5 KSSM peaks January (1.59) and December (1.38). Pre-stock by November.
3 stale SKUs surfaced โ Tuhan Manusia (152 days dormant), Pakej UPSR Lengkap (140 days), ่ฌ่ฅๅฟ็ป (96 days). RM 8,447 to clear.
Modul SPM Matematik โ 43.7% margin, RM 418k revenue, 546 units/month. Bundle with Add Math & Sciences (all 42โ43% margin).
/.well-known/agent-card.json describing its capabilities. The orchestrator's RemoteA2aAgent loads the card and proxies calls.after_agent_callback writes each agent's output (research_findings, judge_feedback) into session.state, so downstream agents read it without re-passing it.authenticated_httpx. Locally, auth is bypassed (localhost detection)..envPYTHONUTF8=1 globally (em-dash encoding fix on Windows)Bookshelf Phase 1 is a portfolio demonstration of the multi-agent architecture, not a production tool. These are the gaps you would close to take it to a paying SME client:
Currently reads a bundled dataset.xlsx. To be useful daily, the Researcher would need a connector to the shop's live POS (Square, SAP-BO BS1, daily Excel export, or AlloyDB).
The brief can flag "this category has a sourcing gap" but cannot recommend specific products to add. Phase 3 would add a Trend Spotter agent with google_search for Malaysian distributor leads.
~140 seconds per question because the pipeline runs 4 LLM calls + the pandas read. Acceptable for daily/weekly briefs, too slow for casual exploration. NotebookLM wins here.
The owner has to open the app and ask. Phase 4 would push the Monday morning brief to WhatsApp / email automatically โ that's the real workflow win.
No multi-shop tenant model, no auth on the web app, no data isolation. Cloud Run deployment for Phase 1 is planned but not in scope tonight.
The LLM sees top 30 + bottom 30 + all aging-stale SKUs (~71 of 385). A specific question about a mid-tier non-aging SKU may get a generic "this is a hold" answer. Phase 2 would add a query_sku(name) direct lookup.