📋 Project Overview & Problem Statement
Challenge: Business analysts and data scientists spend significant time writing repetitive data analysis code, translating business questions into SQL/Python queries, and creating visualizations. This manual process is time-consuming, error-prone, and requires deep technical expertise that many stakeholders lack.
Solution: Time Series Analysis Agent leverages Google Gemini 2.0's code generation capabilities to automatically convert natural language questions into executable Python/Pandas code, perform analysis, and generate insights with visualizations - reducing analysis time from hours to seconds.
Key Benefits
- Natural Language Interface: Ask questions in plain English - no coding knowledge required
- Automated Code Generation: Gemini AI generates Python/Pandas code from your questions
- Instant Insights: Get answers with charts and tables in seconds, not hours
- Self-Correcting: Validates inputs, suggests alternatives for invalid queries
- Business Intelligence: Revenue trends, product rankings, customer segmentation, profit analysis
🛠️ Technical Architecture & Implementation
AI & Analytics Stack
Google Gemini 2.0 Flash
Python 3.8+
Pandas 2.0+
NumPy
Matplotlib
Seaborn
Code Generation Framework
Natural Language Processing
Code Extraction (Regex)
Sandboxed Execution
Error Handling
Input Validation
Deployment Options
Google Colab
Jupyter Notebook
Streamlit (Optional)
Local Python
System Architecture
Analysis Pipeline:
- Data Upload: CSV file upload with schema auto-detection
- Schema Generation: Extract column info, data types, sample values
- Prompt Engineering: Build comprehensive prompt with dataset context
- Gemini API Call: Send question + schema to Gemini 2.0 Flash
- Code Extraction: Parse Python code from AI response
- Safe Execution: Run in controlled namespace with helper functions
- Result Presentation: Display text insights, tables, and charts
Helper Functions Provided
# Date filtering
get_date_range(df, start, end) -> DataFrame
get_quarter_data(df, year, quarter) -> DataFrame
# Analytics
calculate_profit_margin(df) -> Series
get_top_n(df, column, metric, n) -> DataFrame
detect_outliers(series, threshold) -> Series
# Visualization
plt.figure(figsize=(12, 6))
plt.plot() / plt.bar() / plt.scatter()
plt.show()
📖 Development Setup & Usage Guide
Quick Start with Google Colab (Recommended)
- Open Colab Notebook: Click "Launch in Google Colab" button above
- Add API Key: Add GEMINI_API_KEY to Colab Secrets (🔑 icon in sidebar)
- Run Setup Cells: Install dependencies and import libraries
- Upload CSV: Upload your e-commerce transaction dataset
- Ask Questions: Use time_series_agent() function with your questions
Local Installation
# Clone the repository
git clone https://github.com/lyven81/ai-project.git
cd ai-project/projects/time-series-analysis-agent
# Create virtual environment
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Set up environment variables
cp .env.example .env
# Add your Gemini API key to .env
# Run in Jupyter
jupyter notebook time_series_analysis_agent.ipynb
Environment Configuration
# Required API Configuration
GEMINI_API_KEY=your_gemini_api_key_here
# Optional Model Settings
MODEL_NAME=gemini-2.0-flash-exp
TEMPERATURE=0.2
MAX_OUTPUT_TOKENS=8192
Dataset Requirements
Your CSV file should include these columns:
- Order_Date (datetime): Transaction timestamp in MM-DD-YY format
- Revenue (float): Sales amount in dollars
- Profit (float): Profit amount in dollars
- Category (string): Product category (e.g., Electronics, Clothing)
- Product_Name (string): Specific product identifier
- Region (string): Geographic region (e.g., West, East, North, South)
- Customer_Name (string): Customer identifier
- Quantity (int): Number of items sold
🎯 Advanced Features & Error Handling
Self-Correction System
The agent uses status codes to handle various scenarios:
- success: Analysis completed successfully with insights
- no_data: No records match filters → suggests alternative values
- invalid_date: Date parsing failed → specifies correct format
- invalid_filter: Unknown category/region → lists available options
- ambiguous_request: Unclear intent → asks clarifying questions
- error: Unexpected error → provides debug information
Visualization Capabilities
- Line Charts: Time series trends, multiple category comparisons
- Bar Charts: Top products, customer rankings, category comparisons
- Scatter Plots: Correlation analysis, frequency vs. value
- Custom Styling: Automatic colors, labels, legends, and formatting
Time Series Operations
# Monthly resampling (uses 'ME' for month-end)
monthly = df.set_index('Order_Date').resample('ME')['Revenue'].sum()
# Quarter filtering
q4_2024 = get_quarter_data(df, 2024, 4)
# Year-over-year comparison
yoy_growth = ((revenue_2024 - revenue_2023) / revenue_2023 * 100)
# Rolling average
df['Revenue_MA7'] = df['Revenue'].rolling(window=7).mean()
🚀 Example Workflow
Step-by-Step Usage
# 1. Upload your CSV file
uploaded = files.upload()
df = pd.read_csv('your_file.csv')
# 2. Ask a question
result = time_series_agent(
question="Show me the monthly revenue trend for Electronics in 2024",
df=df,
temperature=0.3
)
# 3. Get results automatically:
# - Generated Python/Pandas code
# - Execution logs
# - Text insights
# - Visualizations (if requested)
# - Data tables
Sample Output
✅ Analysis Result (Status: success)
In 2024, Electronics generated $2.3M in revenue across 1,245 orders,
showing 15% growth vs 2023. Top product was MacBook Air ($450K).
The line chart above shows consistent upward monthly trend with
peak in Q4 2024.
[Line chart displayed automatically]