🏪 Sundry Shop Assistant

Voice advisor for busy Malaysian kedai runcit owners

Gemini Live API MCP Tools FastAPI WebSocket pandas Cloud Run

📋 Project Overview & Problem Statement

Challenge: Small Malaysian kedai runcit (sundry shop) operators stand behind the counter from 7am to 10pm. Their POS records every sale, but there's no practical way to query that data during the working day — dashboards require stopping work to read a screen. Reorder decisions, loyalty program judgments, and category-level pricing all end up being made on gut feel because the data is locked behind a dashboard the owner never opens.

Solution: Sundry Shop Assistant is a mobile web advisor that answers the owner's business questions in natural Malay, backed by ten analytical tools that run over the shop's POS dataset. The owner asks — "kategori paling laku?", "member atau visitor spend lebih?", "cash atau card lebih banyak?" — and gets concrete numbers with recommendations in seconds, without having to read a dashboard.

Key Benefits

🖥️ Application Features

📊 Ten Analytical Tools

Ten purpose-built tools query the sales dataset directly: total sales, top day, weekly breakdown, category ranking, slowest movers, member vs visitor, gender spend, payment mix, payment by customer type, and basket statistics.

🎙️ Four I/O Modes

Independent toggles for input (voice or text) and output (voice or text). Switch any time — voice when hands are free, text when a customer stands within earshot. No restart needed between modes.

💬 Santai Malay Register

The agent (Adam) speaks conversational Bahasa Malaysia, not formal bahasa baku. Uses "tak", "dah", "ni", "Pak", and code-switches to English for brand names and business terms — the way Malaysian SME owners actually talk.

📝 Transcript & Verification

Every question and answer appears as a scrollable text log — doubles as verification (catches mishearings) and end-of-day review. Owner can scroll up to revisit any earlier question.

🎯 Dataset-Grounded Answers

The agent never invents numbers. Every figure comes from a tool call against the actual sales dataset. If the data can't answer (e.g. asking about a period outside the dataset window), the agent says so clearly.

⚡ Sub-Second Tool Latency

The pandas-backed tool layer returns answers in under 100ms, keeping conversational flow natural. Tool responses are streamed back during the turn so the agent can speak without awkward silences.

Ten Analytical Tools

get_total_sales
get_top_day
get_weekly_summary
get_sales_by_category
get_slowest_category
compare_member_vs_visitor
compare_gender
get_payment_mix
get_payment_by_customer_type
get_basket_stats

🤖 AI Integration & Intelligence

🧠 Gemini Live API (Preview)

Powered by gemini-3.1-flash-live-preview. Native audio understanding and generation in over 70 languages, including natural-sounding Bahasa Malaysia — no separate speech-to-text or text-to-speech layer.

🔧 Native Function Calling

All ten tools are declared as Gemini function declarations. During a session, the model autonomously chooses which tool to call based on the question, receives the result, and speaks the answer — all mid-conversation.

📡 MCP-Style Tool Bridge

Tools follow an MCP-style contract (name, description, schema, handler). The bridge can be swapped from in-process pandas queries to a real MCP server without changing the Gemini-side declarations.

🎯 Non-Overlapping Tool Schemas

Each tool's description is crafted to be crisp and non-overlapping — a deliberate design choice because in a voice conversation, wrong tool selection creates awkward silence that's harder to recover from than in text chat.

🎚️ Streaming with Barge-In

Responses stream back as the model generates them. The owner can interrupt at any point — audio playback stops immediately, recognition takes over, and the new question is processed without losing conversation context.

🗣️ Santai Persona Prompt

The agent's system prompt explicitly enforces casual Malay register, short spoken answers (under 30 seconds), tool-grounded numbers only, and honest acknowledgment of dataset limits — no corporate filler, no hallucinated figures.

🛠️ Technical Architecture & Implementation

Frontend Stack

Vanilla JavaScript Web Audio API AudioWorklet WebSocket Client HTML5 + CSS3 Poppins (Google Fonts)

Backend Stack

FastAPI 0.128 Uvicorn (ASGI) google-genai 1.70 pandas 2.3 Python 3.11 WebSockets

Deployment & Infrastructure

Google Cloud Run Docker asia-southeast1 (Singapore) Session Affinity Ephemeral Tokens

System Architecture

📖 Development Setup & Installation Guide

Prerequisites

Quick Start Installation

# Clone the repository git clone https://github.com/lyven81/ai-project.git cd ai-project/projects/sundry-shop-assistant # Copy the environment template and add your API key cp .env.example .env # Edit .env and paste your GEMINI_API_KEY # Double-click run-local.bat (Windows) # It creates the venv, installs deps, starts the server, opens the browser

Environment Configuration

# Required GEMINI_API_KEY=your_gemini_api_key_here # Optional overrides MODEL=gemini-3.1-flash-live-preview VOICE_NAME=Puck PORT=8000

Manual Run (if not using the batch file)

python -m venv .venv .venv\Scripts\activate pip install -r backend/requirements.txt cd backend python main.py # Open http://localhost:8000

🚀 Deployment on Google Cloud Run

gcloud run deploy sundry-shop-assistant \ --source . \ --region asia-southeast1 \ --allow-unauthenticated \ --set-env-vars "GEMINI_API_KEY=...,MODEL=gemini-3.1-flash-live-preview,VOICE_NAME=Puck" \ --memory 1Gi \ --cpu 1 \ --timeout 3600 \ --concurrency 10 \ --min-instances 0 \ --max-instances 3 \ --session-affinity

Production Notes

📊 Key Metrics

10
MCP Analytical Tools
4
I/O Modes
150
Row Demo Dataset
< 1s
Tool Call Latency

Business Value