Sundry Shop Assistant - Voice Advisor for Busy Malaysian Kedai Runcit Owners

📋 Project Overview & Problem Statement

Challenge: Small Malaysian kedai runcit (sundry shop) operators stand behind the counter from 7am to 10pm. Their POS records every sale, but there's no practical way to query that data during the working day — dashboards require stopping work to read a screen. Reorder decisions, loyalty program judgments, and category-level pricing all end up being made on gut feel because the data is locked behind a dashboard the owner never opens.

Solution: Sundry Shop Assistant is a mobile web advisor that answers the owner's business questions in natural Malay, backed by ten analytical tools that run over the shop's POS dataset. The owner asks — "kategori paling laku?", "member atau visitor spend lebih?", "cash atau card lebih banyak?" — and gets concrete numbers with recommendations in seconds, without having to read a dashboard.

Key Benefits

Data access without stopping work: Ask questions aloud or by text — answers arrive in seconds
Conversational Malay, not corporate: Santai Bahasa Malaysia — the tone of a trusted younger family member, not a government circular
Ten specialized analysis tools: Category ranking, member vs visitor, payment mix, basket stats, gender breakdown, weekly trends, and more
Four I/O modes: Switch between voice/text input and voice/text output any time — matches changing shop conditions through the day
MCP-style tool architecture: Swappable for a real POS data source; the agent never invents numbers
Always-visible transcript: Every question and answer logged for end-of-day review

🖥️ Application Features

📊 Ten Analytical Tools

Ten purpose-built tools query the sales dataset directly: total sales, top day, weekly breakdown, category ranking, slowest movers, member vs visitor, gender spend, payment mix, payment by customer type, and basket statistics.

🎙️ Four I/O Modes

Independent toggles for input (voice or text) and output (voice or text). Switch any time — voice when hands are free, text when a customer stands within earshot. No restart needed between modes.

💬 Santai Malay Register

The agent (Adam) speaks conversational Bahasa Malaysia, not formal bahasa baku. Uses "tak", "dah", "ni", "Pak", and code-switches to English for brand names and business terms — the way Malaysian SME owners actually talk.

📝 Transcript & Verification

Every question and answer appears as a scrollable text log — doubles as verification (catches mishearings) and end-of-day review. Owner can scroll up to revisit any earlier question.

🎯 Dataset-Grounded Answers

The agent never invents numbers. Every figure comes from a tool call against the actual sales dataset. If the data can't answer (e.g. asking about a period outside the dataset window), the agent says so clearly.

⚡ Sub-Second Tool Latency

The pandas-backed tool layer returns answers in under 100ms, keeping conversational flow natural. Tool responses are streamed back during the turn so the agent can speak without awkward silences.

Ten Analytical Tools

get_total_sales

get_top_day

get_weekly_summary

get_sales_by_category

get_slowest_category

compare_member_vs_visitor

compare_gender

get_payment_mix

get_payment_by_customer_type

get_basket_stats

🤖 AI Integration & Intelligence

🧠 Gemini Live API (Preview)

Powered by gemini-3.1-flash-live-preview. Native audio understanding and generation in over 70 languages, including natural-sounding Bahasa Malaysia — no separate speech-to-text or text-to-speech layer.

🔧 Native Function Calling

All ten tools are declared as Gemini function declarations. During a session, the model autonomously chooses which tool to call based on the question, receives the result, and speaks the answer — all mid-conversation.

📡 MCP-Style Tool Bridge

Tools follow an MCP-style contract (name, description, schema, handler). The bridge can be swapped from in-process pandas queries to a real MCP server without changing the Gemini-side declarations.

🎯 Non-Overlapping Tool Schemas

Each tool's description is crafted to be crisp and non-overlapping — a deliberate design choice because in a voice conversation, wrong tool selection creates awkward silence that's harder to recover from than in text chat.

🎚️ Streaming with Barge-In

Responses stream back as the model generates them. The owner can interrupt at any point — audio playback stops immediately, recognition takes over, and the new question is processed without losing conversation context.

🗣️ Santai Persona Prompt

The agent's system prompt explicitly enforces casual Malay register, short spoken answers (under 30 seconds), tool-grounded numbers only, and honest acknowledgment of dataset limits — no corporate filler, no hallucinated figures.

🛠️ Technical Architecture & Implementation

Frontend Stack

Vanilla JavaScript Web Audio API AudioWorklet WebSocket Client HTML5 + CSS3 Poppins (Google Fonts)

Backend Stack

FastAPI 0.128 Uvicorn (ASGI) google-genai 1.70 pandas 2.3 Python 3.11 WebSockets

Deployment & Infrastructure

Google Cloud Run Docker asia-southeast1 (Singapore) Session Affinity Ephemeral Tokens

System Architecture

Persistent WebSocket Backend: FastAPI hosts a /ws endpoint that bridges the browser to Gemini Live via the google-genai SDK. 16kHz PCM flows in from the mic; 24kHz PCM flows back from the model.
MCP Tool Layer: mcp_tools.py defines ten pandas-backed functions over dataset.csv. Each returns a JSON-serializable dict; results are sent to the model via session.send_tool_response().
Tool Bridge: tool_bridge.py translates the MCP tool set into Gemini FunctionDeclaration objects registered in LiveConnectConfig.
Session Modality: Response modality (AUDIO or TEXT) is set per session via a ?mode= query parameter — switching modes reconnects the session in ~500ms.
Audio Pipeline: Browser AudioWorklet captures raw PCM from the microphone, downsamples to 16kHz Int16, ships bytes over WebSocket. Incoming audio is decoded to 24kHz Float32 and scheduled for playback with barge-in support.
Santai System Prompt: A 3,686-character system instruction loaded at session start enforces casual Malay tone, short answers, tool-grounded numbers, and polite admission of dataset limits.

📖 Development Setup & Installation Guide

Prerequisites

Python 3.11+
Gemini API Key from Google AI Studio
Chrome or Edge (for Web Audio API support)

Quick Start Installation

# Clone the repository
git clone https://github.com/lyven81/ai-project.git
cd ai-project/projects/sundry-shop-assistant

# Copy the environment template and add your API key
cp .env.example .env
# Edit .env and paste your GEMINI_API_KEY

# Double-click run-local.bat (Windows)
# It creates the venv, installs deps, starts the server, opens the browser

Environment Configuration

# Required
GEMINI_API_KEY=your_gemini_api_key_here

# Optional overrides
MODEL=gemini-3.1-flash-live-preview
VOICE_NAME=Puck
PORT=8000

Manual Run (if not using the batch file)

python -m venv .venv
.venv\Scripts\activate
pip install -r backend/requirements.txt
cd backend
python main.py
# Open http://localhost:8000

🚀 Deployment on Google Cloud Run

gcloud run deploy sundry-shop-assistant \
    --source . \
    --region asia-southeast1 \
    --allow-unauthenticated \
    --set-env-vars "GEMINI_API_KEY=...,MODEL=gemini-3.1-flash-live-preview,VOICE_NAME=Puck" \
    --memory 1Gi \
    --cpu 1 \
    --timeout 3600 \
    --concurrency 10 \
    --min-instances 0 \
    --max-instances 3 \
    --session-affinity

Production Notes

--timeout 3600 is required because WebSocket connections are long-lived
--session-affinity ensures a user's audio stream stays pinned to one instance
--min-instances 0 scales to zero when idle — no standing cost
--max-instances 3 caps concurrent instances as a cost guardrail
Gemini Live API bills per audio second — set a Cloud Run budget alert before sharing the URL publicly
Preview model (gemini-3.1-flash-live-preview) means version-pin google-genai in requirements.txt and expect occasional schema shifts before GA

📊 Key Metrics

10

MCP Analytical Tools

4

I/O Modes

150

Row Demo Dataset

< 1s

Tool Call Latency

Business Value

Operational visibility without downtime: Owner queries the business mid-shift without stopping work — the data advisor is available at spoken or tap-to-speak latency
Vernacular-first interface: Santai Bahasa Malaysia removes the language friction that keeps older SME owners out of English-only POS dashboards
Reorder decisions grounded in data: Category ranking and slow-mover tools replace gut-feel reordering with real figures from actual sales
Loyalty program evaluation: Member-vs-visitor comparisons surface whether the stamp card is actually moving basket size — a question most kedai runcit owners never quantify
Payment strategy insight: Payment mix tools reveal the true cash vs digital split by customer segment, informing e-wallet acceptance and terminal placement decisions
Template-ready architecture: The MCP + Live API pattern generalizes to other hands-busy SME verticals (car workshop, biscuit factory, tuition center) with config-only changes

🏪 Sundry Shop Assistant