πŸ“ Market Research Agent

Governed Queries, Cached Maps & Agent Routing

Python FastAPI SQLite Google Maps (cached) Cloud Run

πŸ” About This Code Showcase

These curated snippets show the three pieces that make the agent cheap and safe: the governed query layer (the agent never writes SQL), the router-and-phrase agent, and the scheduled Google Maps cache refresh.

Synthetic-data generation and deployment config are omitted for clarity. The full project lives in projects/market-research-agent/.

πŸ—‚οΈ Project Structure

πŸ“ projects/market-research-agent/
data/ # the 7 CSV tables (generated) setup/ generate_data.py # builds the Klang Valley tong shui data pack build_db.py # CSVs -> market_research.db (SQLite) app/ queries.py # the 10 approved, parameterized queries router.py # free text -> one of the 10 query ids agent.py # route -> run query -> phrase answer main.py # FastAPI service (/ask, serves the UI) cache/ refresh_maps_cache.py # fills competitors/reviews/suppliers from Maps market-research-agent.html # chat UI front end Dockerfile # Cloud Run container

πŸ›‘οΈ Core: The Governed Query Layer

The heart of the design. The agent may only call one of these named tools with whitelisted parameters. Every SQL string is fixed; only the bound parameters change. The agent can never author SQL, list tables, or dump records.

πŸ“„ app/queries.py:two of the 10 approved queries
# Q3 Where + Competition: high evening traffic, few competitors (cached) def q3_opportunity_gap(business_type=BUSINESS, min_evening_traffic=50, limit=5): sql = """ SELECT d.neighborhood, d.zip_code, ft.foot_traffic_score AS evening_traffic, COUNT(c.competitor_id) AS shop_count, ROUND(ft.foot_traffic_score * 1.0 / (COUNT(c.competitor_id) + 1), 1) AS opportunity_ratio FROM demographics d JOIN foot_traffic ft ON ft.zip_code = d.zip_code AND ft.time_of_day = 'evening' LEFT JOIN competitors c ON c.zip_code = d.zip_code AND c.business_type = ? GROUP BY d.zip_code HAVING evening_traffic >= ? ORDER BY opportunity_ratio DESC LIMIT ?""" return _rows(sql, (business_type, min_evening_traffic, limit)) # Q4 Competition: saturation in one area (cached) def q4_saturation(zip_code=SS2, business_type=BUSINESS): sql = """ SELECT COUNT(*) AS shop_count, ROUND(AVG(rating), 2) AS avg_rating FROM competitors WHERE zip_code = ? AND business_type = ?""" return _rows(sql, (zip_code, business_type))

🧭 The Agent: Route, Run, Phrase

The no-ADK equivalent of an LLM agent. It selects one governed tool, runs it, and turns the rows into a short business answer. Out-of-scope questions are refused, not guessed.

πŸ“„ app/agent.py:orchestration
def answer(question: str) -> dict: qid = route(question) if qid is None: return {"id": None, "cost": "out_of_scope", "answer": ("That sits outside the 10 questions I am set up " "to answer for this pack, so I will not guess at it.")} fn, title, src, cost = Q.REGISTRY[qid] # light, whitelisted parameter extraction for area-specific queries if qid == 4: z, _ = _zip_from(question, Q.SS2); data = Q.q4_saturation(zip_code=z) elif qid == 10: z, _ = _zip_from(question, Q.KEPONG); data = Q.q10_launch_decision(zip_code=z) else: data = fn() return {"id": qid, "title": title, "source": src, "cost": cost, "data": data, "answer": phrase(qid, data)}

πŸ—ΊοΈ The Only Paid Call: Scheduled Maps Cache

Google Maps is touched only here, on a schedule, with a field mask so the API bills only the fields used. User questions read the cached tables, never the live API.

πŸ“„ cache/refresh_maps_cache.py:Places Text Search with field masking
def search_places(query, lat, lng): import requests headers = { "Content-Type": "application/json", "X-Goog-Api-Key": API_KEY, # field mask = pay only for what we use "X-Goog-FieldMask": ("places.id,places.displayName,places.location," "places.rating,places.userRatingCount,places.reviews"), } body = {"textQuery": query, "locationBias": {"circle": {"center": {"latitude": lat, "longitude": lng}, "radius": 2500.0}}, "maxResultCount": MAX_PER_AREA} r = requests.post(PLACES_SEARCH, headers=headers, json=body, timeout=30) r.raise_for_status() return r.json().get("places", [])

βš™οΈ Why This Design

Key Decisions