🔍 About This Code Showcase
This curated code snippet demonstrates how the Public Sentiment Collection Agent performs geographic sentiment analysis with advanced credibility tracking and source diversity assessment.
Full deployment scripts, API credentials, and proprietary details are omitted for clarity and security. This showcase highlights the core multi-agent orchestration and sentiment analysis algorithms.
📖 Core Algorithm: Geographic Social Listening
The foundation of the Public Sentiment Agent is its ability to collect sentiment data with geographic segmentation and track source diversity for credibility assessment:
def geographic_listening_agent(
issue_keyword: str,
locations: list = ["global"],
num_sources_per_location: int = 15
) -> dict:
"""
Collect sentiment data with geographic segmentation and source diversity tracking.
This agent addresses a critical flaw in traditional sentiment analysis:
lumping all geographic regions together without considering cultural context.
Example: "Public opinion on alcohol consumption"
- WITHOUT geographic filtering: Misleading global average
- WITH geographic filtering: Reveals cultural nuances (Saudi Arabia vs Germany)
Args:
issue_keyword: Topic to research
locations: List of countries/regions (e.g., ["USA", "Germany", "Saudi Arabia"])
num_sources_per_location: Sources per location
Returns:
dict: Data organized by location with diversity metrics
"""
log_agent_title_html("Geographic Social Listening Agent", "🌍")
location_data = {}
for location in locations:
log_tool_call_html("tavily_search_geographic", f"location={location}")
search_result = tavily_search_geographic(
query=issue_keyword,
location=location,
max_results=num_sources_per_location
)
if 'error' not in search_result:
results = search_result['results']
diversity = analyze_source_diversity(results)
location_data[location] = {
'snippets': [r['content'] for r in results if r.get('content')],
'sources': [{'title': r['title'], 'url': r['url'], 'domain': r['domain']}
for r in results],
'diversity': diversity
}
if diversity['warnings']:
for warning in diversity['warnings']:
log_warning_html(f"{location}: {warning}")
return {
'issue': issue_keyword,
'location_data': location_data,
'collection_date': datetime.now().strftime('%Y-%m-%d %H:%M:%S')
}
🔍 Source Diversity Analysis Engine
The system automatically detects data quality issues by analyzing source diversity and concentration bias:
def analyze_source_diversity(results: list) -> dict:
"""
Analyze diversity of information sources and detect bias patterns.
This prevents the "echo chamber problem" where all data comes from
a single source type (e.g., 90% social media) or domain.
Returns:
dict: Diversity metrics and quality warnings
"""
domains = [r.get('domain', 'unknown') for r in results]
domain_counts = Counter(domains)
source_types = []
for domain in domains:
if any(x in domain for x in ['reddit.com', 'twitter.com', 'facebook.com']):
source_types.append('social_media')
elif any(x in domain for x in ['news', 'times', 'post', 'bbc', 'cnn']):
source_types.append('news')
elif any(x in domain for x in ['.gov', '.edu']):
source_types.append('institutional')
else:
source_types.append('other')
type_counts = Counter(source_types)
unique_domains = len(domain_counts)
total_sources = len(domains)
diversity_score = min(100, (unique_domains / max(1, total_sources)) * 150)
warnings = []
most_common_domain, max_count = domain_counts.most_common(1)[0] if domain_counts else ('', 0)
if max_count > total_sources * 0.4:
warnings.append(
f"⚠️ {(max_count/total_sources)*100:.0f}% of sources from single domain: {most_common_domain}"
)
if type_counts.get('social_media', 0) > total_sources * 0.7:
warnings.append("⚠️ Over 70% sources are social media (potential echo chamber bias)")
if unique_domains < 5:
warnings.append(f"⚠️ Only {unique_domains} unique sources (low diversity)")
return {
'diversity_score': round(diversity_score, 1),
'unique_domains': unique_domains,
'total_sources': total_sources,
'source_type_distribution': dict(type_counts),
'top_domains': dict(domain_counts.most_common(5)),
'warnings': warnings
}
🧠 Comparative Sentiment Analysis Engine
The sentiment analysis engine processes data separately for each location and calculates credibility scores:
def comparative_sentiment_agent(listening_data: dict) -> dict:
"""
Analyze sentiment separately for each location with credibility tracking.
Credibility Score = (Source Diversity * 0.6) + (Sample Size * 0.4)
This ensures findings are weighted by data quality, not just sentiment counts.
"""
log_agent_title_html("Comparative Sentiment Analysis Agent", "🧠")
issue = listening_data['issue']
location_data = listening_data['location_data']
location_sentiments = {}
for location, data in location_data.items():
snippets = data['snippets']
if not snippets:
log_warning_html(f"No data for {location}, skipping...")
continue
analyses = analyze_sentiment_with_context(snippets, issue)
sentiments = [a.get('sentiment', 'neutral') for a in analyses if 'error' not in a]
emotions = [a.get('emotion', 'neutral') for a in analyses if 'error' not in a]
sentiment_counts = Counter(sentiments)
total = len(sentiments) if sentiments else 1
sentiment_dist = {
'positive': (sentiment_counts.get('positive', 0) / total) * 100,
'negative': (sentiment_counts.get('negative', 0) / total) * 100,
'neutral': (sentiment_counts.get('neutral', 0) / total) * 100
}
diversity_score = data['diversity']['diversity_score']
sample_size_score = min(100, (len(snippets) / 20) * 100)
credibility_score = (diversity_score * 0.6 + sample_size_score * 0.4)
location_sentiments[location] = {
'sentiment_distribution': sentiment_dist,
'emotion_counts': dict(Counter(emotions)),
'sample_size': len(sentiments),
'credibility_score': round(credibility_score, 1),
'diversity_metrics': data['diversity']
}
log_tool_result_html(
f"{location}: Pos={sentiment_dist['positive']:.0f}% "
f"Neg={sentiment_dist['negative']:.0f}% | Credibility: {credibility_score:.0f}/100"
)
return {
'issue': issue,
'location_sentiments': location_sentiments
}
⚙️ Technical Implementation Notes
Key Algorithms & Innovations
- Geographic Segmentation: Prevents misleading averages by analyzing regions separately
- Source Diversity Tracking: Detects echo chamber bias and single-domain concentration
- Credibility Scoring: Weights findings by data quality (diversity + sample size)
- Cultural Context Detection: Identifies cultural/geographic patterns in sentiment
- Automatic Bias Warnings: Real-time alerts when data quality is compromised
Why This Approach Works
- Addresses Cultural Nuance: Shows regional differences instead of misleading global averages
- Transparent Quality Metrics: Users see exactly how trustworthy each regional analysis is
- Multi-Agent Architecture: Specialized agents for collection, analysis, visualization, and reporting
- Production-Ready Output: Generates 10 files per analysis (1 report, 4 charts, 5 CSV exports)
Real-World Example
Topic: "Public opinion on alcohol consumption"
- ❌ Without geographic filtering: 60% negative globally (misleading)
- ✅ With geographic filtering:
- Saudi Arabia: 95% negative (religious/cultural context)
- Germany: 70% positive (beer culture)
- USA: 50/50 split (health concerns vs social acceptance)