[Self-Promotion] [Paid] I Built A 1,437-column Alternative Financial Dataset That Fuses GDELT News Intelligence, AI Sentiment, And Multi-source Price At 15-minute Resolution. Free Sample Inside.

Chart overview — 5 panels of real NVDA data

What it is

ULTRA is a flat CSV dataset that aligns three data layers on the same 15-minute timestamp:

  • GDELT (~1,256 cols): The full GCAM emotional spectrum — WordNet Affect, SentiWordNet, Harvard IV, AFINN, Loughran-McDonald financial sentiment, Moral Foundations, plus geopolitical events (GoldsteinScale, QuadClass, CAMEO codes), media mentions, entity extraction, and macro themes.
  • AI Analysis (18 cols): Contextual sentiment from Gemini — not word-counting, but actual comprehension of why sentiment is negative (export controls vs earnings miss vs CEO departure). Includes impact, novelty, actionability, narrative codes, and binary flags.
  • Price (16 cols): Multi-source OHLCV from Polygon.io + Twelve Data, VWAP, trade count, cross-source mean and spread, 15-min return.

96 timestamps per day. Currently covering the Magnificent Seven (AAPL, AMZN, GOOG, META, MSFT, NVDA, TSLA).

Free sample + data dictionary

Full day of NVDA data (Jan 2, 2026) — all 1,437 columns, 96 rows. No paywall, no signup.

Sample CSV: marketsignal.solutions/data/samples/ULTRA_sample_NVDA.csvData Dictionary: marketsignal.solutions/data/samples/ULTRA_DataDictionary.txt

Quick load:

import pandas as pd df = pd.read_csv("ULTRA_sample_NVDA.csv") print(f"{df.shape[1]} columns, {df.shape[0]} timestamps") # AI sentiment + price at market open cols = ["meta_timestamp", "ai_sentiment_score", "ai_impact_score", "ai_narrative_primary_code", "poly_close", "price_return_15m"] print(df[df["poly_close"].notna()][cols].head(10).to_string(index=False)) 

Why I built it

GDELT is incredible — it’s the world’s largest open news database. But it’s raw, unfiltered, and has no ticker mapping. If you want to use it for quant research, you need months of pipeline engineering just to get it into a usable format.

I built the pipeline that: 1. Ingests 3 GDELT streams every 15 minutes (GKG, Events, Mentions) 2. Matches articles to S&P 100 tickers via org-name resolution 3. Parses all 1,256 GCAM dimensions per ticker 4. Runs Gemini AI on every batch for contextual analysis 5. Fuses with multi-source verified price data

The result is a single CSV you can pd.read_csv() and start researching.

What I’m NOT claiming

  • This is not “beat the market” data. It’s research-grade alternative data.
  • GDELT is open/public — I didn’t create it. I created the pipeline, the AI layer, and the fusion.
  • Coverage is currently 7 tickers (Mag 7). S&P 100 expansion is in progress.
  • The AI layer depends on Gemini — it’s contextual NLP, not proprietary.

Pricing

$99/month for the Mag 7 live feed. Details at marketsignal.solutions.

Happy to answer any questions about the data, the pipeline, or the methodology.


This dataset is for research purposes. Past patterns do not guarantee future performance.

submitted by /u/SuggestionDry6614
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *