📁 Project Structure
projects/ecommerce-analyst/
├── demo.html Self-contained static demo (answers inlined)
├── data_prep.py Cleans the CSV, localizes regions, regenerates delivery times
├── precompute_answers.py Runs 6 quick-actions + Monday brief + examples on full data
├── problem-statement.md What we're building and why
├── project-outline.md Compiled plan (features, tech, screens, demo)
├── user-guide.md Plain-language how-to for the seller
├── system-prompt.txt LLM system prompt for live v1.5
└── data/
├── sample-150.csv Random 150-row sample (representative view)
└── answers.json Pre-computed demo responses (inlined into demo.html)
🔧 Key Code Snippets
The three core files each do one thing. Together they take a raw 34,500-row e-commerce CSV and produce a self-contained static demo page.
data_prep.py — Localize regions and delivery times
The raw dataset has generic region values (West, South, etc.) and uniform random delivery times that all collapse to the same mean per state. This script fixes both by remapping regions to all 16 Malaysian states with realistic population-weighted distribution, then regenerating delivery times with a per-state logistics mean (1 day Klang Valley, 6–7 days East Malaysia).
STATE_DELIVERY_MEAN = {
"Selangor": 1.2, "Kuala Lumpur": 1.1, "Putrajaya": 1.3,
"Negeri Sembilan": 2.2, "Melaka": 2.4, "Johor": 2.6,
"Perak": 2.8, "Penang": 2.5,
"Kedah": 3.5, "Pahang": 3.8,
"Perlis": 4.5, "Terengganu": 4.8, "Kelantan": 5.0,
"Sabah": 6.2, "Sarawak": 6.4, "Labuan": 6.8
}
if "region" in df.columns:
rng = np.random.default_rng(seed=42)
df["region"] = rng.choice(MALAYSIAN_STATES, size=len(df), p=STATE_WEIGHTS)
if "delivery_time_days" in df.columns:
rng = np.random.default_rng(seed=7)
means = df["region"].map(STATE_DELIVERY_MEAN).values
noise = rng.normal(0, 0.6, size=len(df))
df["delivery_time_days"] = np.clip(np.round(means + noise), 1, 9).astype(int)
precompute_answers.py — Monday brief auto-generation
The Monday brief is not a static string — it's computed every time from the last 7 days of data. This snippet shows how the "3 worries" entries are built: the script finds the state with the highest return rate this week, the weakest category by margin, and the discount band that's dragging margin down — each worded in plain language with specific numbers.
max_date = df["order_date"].max()
week_start = max_date - pd.Timedelta(days=6)
this_week = df[df["order_date"] >= week_start]
this_state_returns = this_week.groupby("region")["returned"].apply(
lambda s: (s == "Yes").mean() * 100
).sort_values(ascending=False)
worst_state = this_state_returns.index[0]
answers["monday_brief"] = {
"week_of": week_start.strftime("%d %b %Y") + " - " + max_date.strftime("%d %b %Y"),
"worries": [
f"**{worst_state}** return rate is {this_state_returns.iloc[0]:.1f}% this week "
f"— well above your average.",
f"**{this_cat.index[-1]}** category margin dropped to "
f"{fmt_rm(this_cat.iloc[-1])} — worst performer of the 7.",
f"Discount orders (≥15%) averaged {disc_margin:.2f} margin per order — "
f"check if the volume lift is worth it."
],
"decision": (
f"**Pause or reprice your weakest SKUs in {worst_state} this week.** "
f"Shipping plus returns is eating the margin — better to skip those orders "
f"than ship them at a loss."
)
}
demo.html — Rendering pre-computed answers
The demo page is fully self-contained. It imports Chart.js from a CDN, inlines the answers JSON directly as a JavaScript object, and renders the 6 quick-action buttons, Monday brief card, and ask-anything chat with no backend at all. Each quick-action click clears the chat area and renders the pre-computed answer with its table and Chart.js bar chart.
const QUICK_KEYS = [
'q1_profit_by_category', 'q2_state_shipping', 'q3_returns',
'q4_discount_roi', 'q5_top_cohort', 'q6_delivery'
];
function showAnswer(key) {
document.getElementById('chatMessages').innerHTML = '';
const a = ANSWERS[key];
addMessage('user', a.question);
const tableHtml = a.table ? renderTable(a.table) : '';
const chartId = 'chart-' + Math.random().toString(36).slice(2, 8);
const chartHtml = a.chart
? `<div class="chart-wrap"><canvas id="${chartId}" height="120"></canvas></div>`
: '';
addMessage('agent', `<p>${renderMd(a.summary)}</p>${chartHtml}${tableHtml}`);
if (a.chart) setTimeout(() => drawChart(chartId, a.chart), 50);
}
system-prompt.txt — AI character for live v1.5
The live version uses Gemini 2.5 Flash with this system prompt. Two constraints that matter for a Malaysian SME audience: (1) never mention specific platforms by name (keeps it generic), (2) never hedge with "it depends" when the data gives a clear answer — name a number, name a state, name a SKU.
You are the Ecommerce Analyst, an AI assistant inside the Ecommerce
Analyst app for mid-size Malaysian e-commerce sellers.
Your job is to act like a private, trusted analyst who works only
for this seller — answering questions about their sales data in
plain language, pointing out blind spots they didn't know to ask
about, and writing a short Monday morning brief every week.
Always respond in a tone that is warm, direct, Malay-flavored
English — like a trusted friend who happens to understand numbers.
Short sentences. Concrete numbers. One actionable takeaway per
answer.
Never:
- Share the seller's data with any third party
- Mention any specific e-commerce platform by name
- Hedge with "it depends" when the data gives a clear answer —
name a number, name a state, name a SKU