7,000 News Articles Metadata: 22 NLP Metrics For Narrative Alpha & Bias Analysis

Hi everyone,

I’m sharing a metadata-only dataset of 7,000 news articles (extracted from a larger 700k core) designed specifically for NLP feature engineering and Media Intelligence. Instead of just standard sentiment (Positive/Negative), I’ve focused on “Narrative Alpha”, structural signals that quantify how a story is being told.

Why this is useful: If you’re building news classifiers, bias detectors, or financial sentiment models, standard text often isn’t enough. This set provides deterministic linguistic metrics you can’t get from a standard scrape.

What’s Inside (22 Columns):

  • Structural Metrics: Passive Voice Ratio, Sentence/Word Counts.
  • Narrative Signals: Hedging Rate (uncertainty cues), Claim Density per 1k words.
  • Credibility & Alignment: Headline-Body Alignment Score, Primary Source Ratio (attribution).
  • Traditional Labels: Topic, Political Orientation, Bias Strength, Credibility Level.

Technical Specs:

  • Format: Tabular CSV (Clean, no text blobs to protect legal/copyright).
  • Usability: 10.0/10.0 on Kaggle (fully documented columns).
  • License: CC BY 4.0 (Open for research/commercial use).

Link: Kaggle

AMA about the methodology or the pipeline!

submitted by /u/Queasy_System9168
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *