📋 Project Overview & Problem Statement
Challenge: Business analysts spend hours writing repetitive Pandas code to answer ad-hoc questions from stakeholders. Each new question requires custom scripts, leading to slow insights, bottlenecks, and dependency on technical teams for simple data exploration.
Solution: Business Intelligence Agent uses Google Gemini 2.0 Flash to convert any natural language question into executable Pandas/NumPy/Matplotlib code. It auto-detects dataset schema, generates production-ready analytics code, and creates visualizations automatically - enabling business users to explore data without coding expertise.
Key Benefits
- Instant Insights: Ask questions in plain English, get answers with charts in 5-10 seconds
- Zero Coding: Business users operate independently without technical bottlenecks
- Exploratory Freedom: Ad-hoc analysis without pre-built dashboards
- Auto Visualization: Bar charts, pie charts, scatter plots generated automatically
- Read-Only Safety: Analysis only - no data mutations or security risks
🛠️ Technical Architecture & Implementation
AI & Code Generation
Google Gemini 2.0 Flash
Code-as-Plan Pattern
Natural Language Processing
Schema Auto-Detection
Safe Code Execution
Data Analytics Stack
Pandas DataFrames
NumPy Calculations
Matplotlib Charts
Seaborn Styling
Statistical Analysis
Deployment Platform
Google Colab
Jupyter Notebook
Cloud Storage
Secrets Management
Code Generation Pattern
# All generated code follows this structure:
STATUS = "success"
try:
# 1. Filter data with conditions
filtered_df = df[(df['category'] == 'Electronics') &
(df['discount'] > 0)]
# 2. Perform calculations
avg_profit = filtered_df['profit_margin'].mean()
# 3. Create visualization (if requested)
plt.figure(figsize=(10, 6))
filtered_df.groupby('product')['profit'].sum().plot(kind='bar')
plt.title('Profit by Product')
plt.show()
# 4. Set answer
answer_text = f"Found {len(filtered_df):,} orders with {avg_profit:.2f}% avg profit"
except Exception as e:
STATUS = "error"
answer_text = f"Error: {str(e)}"
📖 Development Setup & Installation Guide
Quick Start with Google Colab (Recommended)
- Open Colab Notebook: Click "Launch in Google Colab" button above
- Add API Key: Click 🔑 icon in left sidebar → Add Secret → Name: GEMINI_API_KEY
- Run Setup Cells: Install dependencies (Gemini, Pandas, Matplotlib)
- Upload CSV: Upload your e-commerce or business dataset
- Ask Questions: Use filter_agent() function with natural language queries
Dataset Requirements
Works with any CSV containing e-commerce or business data. Common columns:
- Product Data: Product_Name, Category, Quantity, Unit_Price
- Financial Data: Total_Amount, Profit_Margin, Discount
- Customer Data: Customer_ID, Age, Gender
- Logistics Data: Shipping_Cost, Delivery_Time, Region
- Returns Data: Returned (Yes/No), Payment_Method
Local Installation (Alternative)
# Clone the repository
git clone https://github.com/lyven81/ai-project.git
cd ai-project/projects/business-intelligence-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Add your Gemini API key to .env
# Run in Jupyter
jupyter notebook business_intelligent_agent.py
Environment Configuration
# Required API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
# Optional Model Settings
MODEL_NAME=gemini-2.0-flash-exp
TEMPERATURE=0.2
🚀 Deployment Options & Usage Workflow
Google Colab Usage Example
# 1. Upload dataset CSV
from google.colab import files
uploaded = files.upload()
df_original = pd.read_csv('dataset.csv')
# 2. Auto-generate schema
SCHEMA = build_schema_block(df_original)
# Shows: column types, unique values, ranges, samples
# 3. Ask questions in natural language
result = filter_agent(
query="Which product category is making a loss because of discount?",
df=df_original,
schema=SCHEMA
)
# 4. View results
# - Shows generated Python code
# - Executes safely on DataFrame copy
# - Returns STATUS + answer_text
# - Displays visualizations if requested
Workflow Steps
- Step 1: Upload CSV with your business data
- Step 2: System auto-detects schema (columns, types, values)
- Step 3: Ask questions in plain English
- Step 4: AI generates Pandas/Matplotlib code
- Step 5: Code executes safely with error handling
- Step 6: Results displayed with visualizations
Transparency Features
- Generated Code: See exact Pandas operations before execution
- Execution Logs: Debug output for transparency
- Status Tracking: SUCCESS or ERROR with detailed messages
- Safe Operations: Read-only analysis, no data mutations