Dataset Overview
Most ESG datasets rely on corporate self-disclosures — companies grading their own homework. This dataset takes a fundamentally different approach. Every score is derived from adversarial sources that companies cannot control: court filings, regulatory fines, investigative journalism, and NGO reports.
The dataset contains integrity scores for all S&P 500 companies, scored across 11 ethical dimensions on a -100 to +100 scale, where -100 represents the worst possible conduct and +100 represents industry-leading ethical performance.
Fields
Each row represents one S&P 500 company. The key fields include:
-
Company information: ticker symbol, company name, stock exchange, industry sector (ISIC classification)
-
Overall rating: Categorical assessment (Excellent, Good, Mixed, Bad, Very Bad)
-
11 dimension scores (-100 to +100):
-
planet_friendly_business — emissions, pollution, environmental stewardship
-
honest_fair_business — transparency, anti-corruption, fair practices
-
no_war_no_weapons — arms industry involvement, conflict zone exposure
-
fair_pay_worker_respect — labour rights, wages, working conditions
-
better_health_for_all — public health impact, product safety
-
safe_smart_tech — data privacy, AI ethics, technology safety
-
kind_to_animals — animal welfare, testing practices
-
respect_cultures_communities — indigenous rights, community impact
-
fair_money_economic_opportunity — financial inclusion, economic equity
-
fair_trade_ethical_sourcing — supply chain ethics, sourcing practices
-
zero_waste_sustainable_products — circular economy, waste reduction
What Makes This Different from Traditional ESG Data
Traditional ESG providers (MSCI, Sustainalytics, Morningstar) rely heavily on corporate sustainability reports — documents written by the companies themselves. This creates an inherent conflict of interest where companies with better PR departments score higher, regardless of actual conduct.
This dataset is built using NLP analysis of 50,000+ source documents including:
-
Court records and legal proceedings
-
Regulatory enforcement actions and fines
-
Investigative journalism from local and international outlets
-
Reports from NGOs, watchdogs, and advocacy organisations
The result is 11 independent scores that reflect what external evidence says about a company, not what the company says about itself.
Use Cases
-
Alternative ESG analysis — compare these scores against traditional ESG ratings to find discrepancies
-
Ethical portfolio screening — identify S&P 500 holdings with poor conduct in specific dimensions
-
Factor research — explore correlations between ethical conduct and financial performance
-
Sector analysis — compare industries across all 11 dimensions
-
ML/NLP research — use as labelled data for corporate ethics classification tasks
-
ESG score comparison — benchmark against MSCI, Sustainalytics, or Refinitiv scores
Methodology
Scores are generated by Mashini Investments using AI-driven analysis of adversarial source documents.
Each company is evaluated against detailed KPIs within each of the 11 dimensions.
Coverage
– 500 companies — S&P 500 constituents
– 11 dimensions — 5,533 individual scores
– Score range — -100 (worst) to +100 (best)
CC BY-NC-SA 4.0 licence.
submitted by /u/RevolutionaryGate742
[link] [comments]