Voice advisor for busy Malaysian kedai runcit owners
Challenge: Small Malaysian kedai runcit (sundry shop) operators stand behind the counter from 7am to 10pm. Their POS records every sale, but there's no practical way to query that data during the working day — dashboards require stopping work to read a screen. Reorder decisions, loyalty program judgments, and category-level pricing all end up being made on gut feel because the data is locked behind a dashboard the owner never opens.
Solution: Sundry Shop Assistant is a mobile web advisor that answers the owner's business questions in natural Malay, backed by ten analytical tools that run over the shop's POS dataset. The owner asks — "kategori paling laku?", "member atau visitor spend lebih?", "cash atau card lebih banyak?" — and gets concrete numbers with recommendations in seconds, without having to read a dashboard.
Ten purpose-built tools query the sales dataset directly: total sales, top day, weekly breakdown, category ranking, slowest movers, member vs visitor, gender spend, payment mix, payment by customer type, and basket statistics.
Independent toggles for input (voice or text) and output (voice or text). Switch any time — voice when hands are free, text when a customer stands within earshot. No restart needed between modes.
The agent (Adam) speaks conversational Bahasa Malaysia, not formal bahasa baku. Uses "tak", "dah", "ni", "Pak", and code-switches to English for brand names and business terms — the way Malaysian SME owners actually talk.
Every question and answer appears as a scrollable text log — doubles as verification (catches mishearings) and end-of-day review. Owner can scroll up to revisit any earlier question.
The agent never invents numbers. Every figure comes from a tool call against the actual sales dataset. If the data can't answer (e.g. asking about a period outside the dataset window), the agent says so clearly.
The pandas-backed tool layer returns answers in under 100ms, keeping conversational flow natural. Tool responses are streamed back during the turn so the agent can speak without awkward silences.
Powered by gemini-3.1-flash-live-preview. Native audio understanding and generation in over 70 languages, including natural-sounding Bahasa Malaysia — no separate speech-to-text or text-to-speech layer.
All ten tools are declared as Gemini function declarations. During a session, the model autonomously chooses which tool to call based on the question, receives the result, and speaks the answer — all mid-conversation.
Tools follow an MCP-style contract (name, description, schema, handler). The bridge can be swapped from in-process pandas queries to a real MCP server without changing the Gemini-side declarations.
Each tool's description is crafted to be crisp and non-overlapping — a deliberate design choice because in a voice conversation, wrong tool selection creates awkward silence that's harder to recover from than in text chat.
Responses stream back as the model generates them. The owner can interrupt at any point — audio playback stops immediately, recognition takes over, and the new question is processed without losing conversation context.
The agent's system prompt explicitly enforces casual Malay register, short spoken answers (under 30 seconds), tool-grounded numbers only, and honest acknowledgment of dataset limits — no corporate filler, no hallucinated figures.
/ws endpoint that bridges the browser to Gemini Live via the google-genai SDK. 16kHz PCM flows in from the mic; 24kHz PCM flows back from the model.mcp_tools.py defines ten pandas-backed functions over dataset.csv. Each returns a JSON-serializable dict; results are sent to the model via session.send_tool_response().tool_bridge.py translates the MCP tool set into Gemini FunctionDeclaration objects registered in LiveConnectConfig.?mode= query parameter — switching modes reconnects the session in ~500ms.--timeout 3600 is required because WebSocket connections are long-lived--session-affinity ensures a user's audio stream stays pinned to one instance--min-instances 0 scales to zero when idle — no standing cost--max-instances 3 caps concurrent instances as a cost guardrailgemini-3.1-flash-live-preview) means version-pin google-genai in requirements.txt and expect occasional schema shifts before GA