🌍 Public Sentiment Collection Agent

AI-powered geographic sentiment analysis with credibility tracking and source diversity assessment

Python 3.9+ Gemini 2.0 Flash Tavily Search API Pandas & NumPy Multi-Agent System

📋 Project Overview & Problem Statement

Challenge: Traditional sentiment analysis tools lump all geographic regions together, producing misleading global averages that mask critical cultural and regional differences. For example, analyzing "public opinion on alcohol consumption" globally would show a mixed sentiment, completely missing that Saudi Arabia has 95% negative sentiment (religious/cultural) while Germany has 70% positive sentiment (beer culture).

Solution: Public Sentiment Collection Agent uses geographic segmentation, source diversity tracking, and credibility scoring to provide accurate, context-aware sentiment analysis. The system automatically detects data quality issues like echo chamber bias and single-domain concentration.

🚨 Why Geographic Filtering Matters

Example: "Public opinion on alcohol consumption"

WITHOUT geographic filtering: 60% negative, 30% neutral, 10% positive (MISLEADING - lumps all regions together)

WITH geographic filtering:

  • Saudi Arabia: 95% negative (religious/cultural context)
  • Germany: 70% positive (beer culture)
  • USA: 50/50 split (health concerns vs. social acceptance)

Key Benefits

🤖 AI Capabilities & 5-Agent Architecture

🌍 Geographic Social Listening Agent

Collects sentiment data with location-specific filtering using Tavily Search API. Tracks source diversity and issues quality warnings.

🧠 Comparative Sentiment Analysis Agent

Processes data separately for each location using Gemini AI. Calculates credibility scores: (Diversity × 0.6) + (Sample Size × 0.4)

📊 Comparative Visualization Designer Agent

Creates 4 professional charts: regional sentiment comparison, credibility dashboard, source diversity, and theme frequency.

💾 Data Export Agent

Exports 5 CSV files: sentiment distribution, emotion frequency, theme comparison, source attribution, and credibility metrics.

📝 Enhanced Packaging Agent

Generates executive-ready markdown reports with embedded visualizations, data tables, and methodological limitations.

AI Processing Pipeline

🔍 Source Diversity & Credibility Features

Source Type Classification

The system automatically classifies sources into categories:

Automatic Bias Warnings

The system issues warnings when data quality is compromised:

⚠️ Over 70% sources are social media (potential echo chamber bias) ⚠️ 60% of sources from single domain: reddit.com ⚠️ Only 4 unique sources (low diversity)

Credibility Score Calculation

Credibility Score (0-100) = Source Diversity Score × 0.6 + Sample Size Score × 0.4 🟢 70-100: High confidence (diverse sources, adequate sample) 🟡 50-69: Medium confidence (some limitations present) 🔴 0-49: Low confidence (significant data quality concerns)

📊 Output Package (10 Files Per Analysis)

1 Markdown Report

4 Visualization Charts (PNG)

5 CSV Data Exports

🛠️ Technical Architecture & Implementation

AI & Analytics Stack

Google Gemini 2.0 Flash Tavily Search API Python 3.9+ Pandas 2.0+ Matplotlib Seaborn

Multi-Agent Framework

5 Specialized Agents Web Search Integration NLP Sentiment Analysis Data Quality Scoring Auto Visualization

Deployment Options

Google Colab Jupyter Notebook Local Python Streamlit (Optional)

System Architecture

Pipeline Flow: 1. Geographic Listening → Web search with location filters 2. Source Analysis → Diversity tracking & bias detection 3. Sentiment Analysis → Gemini AI with cultural context 4. Credibility Scoring → Quality assessment (0-100) 5. Visualization → 4 professional charts 6. Data Export → 5 CSV files for Excel/Sheets 7. Report Packaging → Executive markdown report

📖 Development Setup & Usage Guide

Quick Start with Google Colab (Recommended)

  1. Open Colab Notebook: Click "Launch in Google Colab" button above
  2. Add API Keys: Add GOOGLE_API_KEY and TAVILY_API_KEY to Colab Secrets (🔑 icon)
  3. Run Setup Cells: Install dependencies and configure APIs
  4. Run Analysis: Execute run_enhanced_sentiment_pipeline() with your topic and locations
  5. Download Results: Get markdown report, 4 charts, and 5 CSV files

Example Usage

# Example: Analyze firecracker ban opinions across cultures results = run_enhanced_sentiment_pipeline( issue_keyword="Should firecrackers and fireworks be banned?", locations=["Malaysia", "Germany", "USA", "India"], num_sources_per_location=15, output_dir="." ) # Output: # - comparative_report_20251024_123456.md # - regional_comparison_20251024_123456.png # - credibility_dashboard_20251024_123456.png # - source_diversity_20251024_123456.png # - theme_frequency_20251024_123456.png # - sentiment_distribution_20251024_123456.csv # - emotion_frequency_20251024_123456.csv # - theme_comparison_20251024_123456.csv # - source_attribution_20251024_123456.csv # - credibility_metrics_20251024_123456.csv

Required API Keys

📊 Performance Metrics & Business Impact

10-15 min
Full Analysis Time
10
Files Generated
5
Specialized Agents
0-100
Credibility Score

Business Value Demonstration

Use Cases

⚠️ Limitations & Disclaimers

Data Collection Limitations

Geographic Filtering Challenges

Recommended Use

Good for: Directional insights, trend detection, hypothesis generation

⚠️ Caution for: Policy decisions, legal proceedings, precise measurement

Not for: Statistical inference about entire populations