7,000 News Articles Metadata: 22 NLP Metrics for Narrative Alpha & Bias Analysis

Hi everyone,

I’m sharing a metadata-only dataset of 7,000 news articles (extracted from a larger 700k core) designed specifically for NLP feature engineering and Media Intelligence. Instead of just standard sentiment (Positive/Negative), I’ve focused on “Narrative Alpha”, structural signals that quantify how a story is being told.

Why this is useful: If you’re building news classifiers, bias detectors, or financial sentiment models, standard text often isn’t enough. This set provides deterministic linguistic metrics you can’t get from a standard scrape.

What’s Inside (22 Columns):

Structural Metrics: Passive Voice Ratio, Sentence/Word Counts.
Narrative Signals: Hedging Rate (uncertainty cues), Claim Density per 1k words.
Credibility & Alignment: Headline-Body Alignment Score, Primary Source Ratio (attribution).
Traditional Labels: Topic, Political Orientation, Bias Strength, Credibility Level.

Technical Specs:

Format: Tabular CSV (Clean, no text blobs to protect legal/copyright).
Usability: 10.0/10.0 on Kaggle (fully documented columns).
License: CC BY 4.0 (Open for research/commercial use).

Link: Kaggle

AMA about the methodology or the pipeline!

submitted by /u/Queasy_System9168
[link] [comments]

7,000 News Articles Metadata: 22 NLP Metrics For Narrative Alpha & Bias Analysis

Leave a Reply Cancel reply

Recent Posts

Recent Comments

18+ Content

Leave a Reply Cancel reply

Recent Posts

Recent Comments