Sharing an open-source pipeline I built that scrapes, stores, and serves a bundle of public U.S. financial datasets so you can run the whole thing yourself instead of stitching together rate-limited APIs.
Datasets included (with their original sources — pull straight from these too):
- SEC filings 10-K/10-Q/8-K, full-text searchable — source: SEC EDGAR (https://www.sec.gov/edgar)
- Institutional holdings (13F-HR) — source: SEC EDGAR
- Insider transactions (Form 3/4) — source: SEC EDGAR
- Congressional trades — source: U.S. House & Senate financial disclosures (disclosures-clerk.house.gov / efdsearch.senate.gov)
- Short data: fails-to-deliver — source: SEC; short volume & short interest — source: FINRA (https://www.finra.org)
- Economic indicators — source: FRED, Federal Reserve Bank of St. Louis (https://fred.stlouisfed.org)
- Futures positioning (Commitments of Traders) — source: CFTC (https://www.cftc.gov)
- VIX & put/call ratios — source: CBOE
- Daily OHLCV prices + indicators — source: Yahoo Finance
How to get it: self-host with one command (`docker compose up`); data lands in Postgres + ParadeDB so you get SQL + full-text/vector search out of the box. There’s a web UI for browsing, a plain HTTP API, and an MCP server if you want to query it from an LLM. Stores everything locally — no account, no paid API.
Repo: https://github.com/daniel3303/Equibles (if you liked it, leave a star 🙂 )
Disclaimer: I’m the developer of this project. It’s free and open-source, I’m not selling anything, and all data comes from the public government/exchange sources listed above. Equibles is just the open pipeline to collect and query them yourself.
Feedback and feature requests welcome.
submitted by /u/DanielAPO
[link] [comments]