Open, Self-hostable Pipeline For U.S. Financial Datasets — SEC Filings (full-text), 13F Holdings, Insider And Congressional Trades, FINRA Short Data, FRED, CFTC, CBOE

Sharing an open-source pipeline I built that scrapes, stores, and serves a bundle of public U.S. financial datasets so you can run the whole thing yourself instead of stitching together rate-limited APIs.

Datasets included (with their original sources — pull straight from these too):

  • SEC filings 10-K/10-Q/8-K, full-text searchable — source: SEC EDGAR (https://www.sec.gov/edgar)
  • Institutional holdings (13F-HR) — source: SEC EDGAR
  • Insider transactions (Form 3/4) — source: SEC EDGAR
  • Congressional trades — source: U.S. House & Senate financial disclosures (disclosures-clerk.house.gov / efdsearch.senate.gov)
  • Short data: fails-to-deliver — source: SEC; short volume & short interest — source: FINRA (https://www.finra.org)
  • Economic indicators — source: FRED, Federal Reserve Bank of St. Louis (https://fred.stlouisfed.org)
  • Futures positioning (Commitments of Traders) — source: CFTC (https://www.cftc.gov)
  • VIX & put/call ratios — source: CBOE
  • Daily OHLCV prices + indicators — source: Yahoo Finance

How to get it: self-host with one command (`docker compose up`); data lands in Postgres + ParadeDB so you get SQL + full-text/vector search out of the box. There’s a web UI for browsing, a plain HTTP API, and an MCP server if you want to query it from an LLM. Stores everything locally — no account, no paid API.

Repo: https://github.com/daniel3303/Equibles (if you liked it, leave a star 🙂 )

Disclaimer: I’m the developer of this project. It’s free and open-source, I’m not selling anything, and all data comes from the public government/exchange sources listed above. Equibles is just the open pipeline to collect and query them yourself.

Feedback and feature requests welcome.

submitted by /u/DanielAPO
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *