I Built A Decision Intelligence System That Actually Traces Every Number To Real Data

submitted by /u/AddendumNext2422
[link] [comments]

Does Anybody Know Of Any Quality Datasets That Have Images Of Grocery Receipts?

Preferably from the big American vendors if possible (ex. target, walmart, costco, safeway, albertsons, etc.). Need this info for OCR work. It’s also fine if the grocery receipts are part of a dataset that includes all kinds of receipts.

submitted by /u/z57333
[link] [comments]

0

Skill Labor Shortages In US – Where To Find Data?

I’m researching skilled labor shortages in construction and related industries.

Looking for public or commercial datasets covering:

Electricians
Project Managers
Construction workforce demographics
Apprenticeship enrollment
Retirement risk
Regional wage inflation
Infrastructure project activity

Any recommendations beyond BLS, Census, ACS, and OEWS?

submitted by /u/NelsoelBesto
[link] [comments]

0

Request For Help From Someone Inside Russia To Download Migration Data

Hello,

I’m doing some research and need help getting recent public statistics from the EMISS portal on foreign nationals entering the Russian Federation. The portal is unfortunately not accessible from my location. The site is fedstat[dot]ru.

Specifically looking for the dataset titled approximately:
“Численность иностранных граждан, въехавших в Российскую Федерацию, по странам гражданства и целям поездок”

Filtered by Tajikistan as country of citizenship, for at least 2024–2025.

If anyone has access and can export the Excel table, I would be very grateful if you could share it! Спасибо вам большое!!

submitted by /u/guanabana21
[link] [comments]

0

Looking For Motorcycle Accident CCTV (fixed Or Surveillance-style) Videos

We are having a hard time finding videos for our thesis. We visited most of the social media platforms and so far, we still haven’t managed to reach our goal. Maybe you guys can recommend me an archive website or something.

submitted by /u/Classic-Yoghurt-3762
[link] [comments]

0

Spent A Few Months Trying To Beat Polymarket’s BTC Up/down Candle Markets. They’re Efficient At Basically Every Static Entry — Here’s What The Data Showed.

TL;DR: short-horizon crypto direction on Polymarket is NOT where the edge is. I tested entries at candle birth, mid-candle, T-180s, T-60s, and the “deathbed” — the book charges ~fair value (incl. fees) at every static point. The deathbed-favorite edge people mention is basically closed (priced to ~$1.00).

Where edge actually showed up: execution/structure (nested-market consistency, late-window favorites with real leads), favourite-longshot bias (longshots overpriced, favorites underpriced), and LP rewards — not price prediction.

Happy to share the fair-value model I used (driftless P(up) from realized move + time + vol) in the comments. Anyone found the same?

submitted by /u/Murky-Lifeguard9485
[link] [comments]

0

UBER MOVEMENT. Wanted A 2022 Uber Movement Dataset But Uber Has Completly Discontinued It.

I am currently working on a paper. So I need atleast 1 year of uber movement dataset of any city possible. Any suggestions? Found in kaggale but could find only 2017 oct to 2017 november. So can someone please help me with it

submitted by /u/ClassroomLumpy3014
[link] [comments]

0

Dataset: Global Wealth Distribution By Band. Credit Suisse Global Wealth Databook And UBS Global Wealth Report, 2010 To 2023

submitted by /u/anuveya
[link] [comments]

0

[Self-Promotion] Active DeepTech Investors Mapped From Recent Funding Activity

DeepTech Venture Capital Firms — firm websites, investment stages, sectors, office locations, and portfolio links. Structured from recent funding activity.

https://deeptechvclist.com

submitted by /u/project_startups
[link] [comments]

0

WildVid-Lip — A Lip Reading Dataset

Helloo

I have been working in the branch of lip reading for a while now. Currently there are about 100000 videos with youtube ids, start time, and end time of the clip. I am constantly working to reduce the friction in the dataset — as we cannot share the actual video clips from youtube — by adding download scripts and the actual transcripts in the near future.

I have transcripts ready of about 80000 videos. The rest are yet to be made but since the dataset is constantly expanding (150,000 ish by end of day), transcripts would lack behind until I am done with the actual videos.

Also trying to figure out how to not get rate-limited when downloading the videos from youtube using yt-dlp. If anyone knows, please enlighten me a bit 🙂.

My core aim is to make this a standard like LRS2,LRW,LRS3 etc.

I will soon add a commercial subset in the dataset. Made from youtube videos which specifically allow commercial use so if someone wants to make a hardware out of it and bring it into the market, they can wholeheartedly do so :D.

That’s mostly it.

Have a look at the dataset if you would like to 😀

huggingface.co/datasets/Rizul2159/WildVid-LIP

There isnt much right now on it. Just a csv file with 115k videos with their ids and timestamps but soon there would be a lot more than that.

submitted by /u/Historical_Pin1429
[link] [comments]

0

I’m 18 And Hand-built The First Tunisian Darija-English Parallel Dataset Field-collected From My Grandmother, Strangers In Cafes, And 50 Categories Of Daily Life. Open Source, Provenance-tagged, 500+ Pairs.

I’m 18, from Tunisia, and I built this because nobody else had.

Tunisian Darija is what 12 million Tunisians actually speak. Not Modern Standard Arabic. Not Moroccan. A separate dialect that borrows from Arabic, French, Italian, and Amazigh, written online in Arabizi Latin letters with numbers for Arabic sounds (3→ع, 7→ح, 9→ق, 5→خ).

When I searched for a parallel corpus to build a translation model, I found nothing. TUNIZI covers sentiment analysis. TunBERT does dialect classification. But zero parallel datasets existed for Tunisian Darija-to-English translation. Not one.

So I built the first one from scratch with no funding, no university affiliation, no mentor, and no institutional support. Just me, a laptop, and the language I grew up speaking.

The first 500 pairs came from my own memory as a native speaker, covering 50 categories of real Tunisian daily life cafe culture, Ramadan traditions, wedding customs, bac exam stress, barbershop talk, louage rides, haggling at the medina, football arguments, bureaucracy nightmares, olive harvest season, Friday afternoon naps, and more. Zero automated generation. Every pair hand-written and validated.

Then I left my desk and started collecting from real people:

My father’s childhood memories growing up in Ain Draham, a mountain village in northwestern Tunisia the scent of the forest, nearly getting bitten by a snake, his cousin falling off his uncle’s horse
My grandmother’s stories about her father’s farm cows, sheep, thieves stealing the neighbors’ animals at night, and her father calmly finishing his morning prayer before stepping outside to check
An elderly man from Siliana I met at a cafe who speaks a dialect I barely recognized — words I had to ask about, rhythms I’d never heard

Every pair is provenance-tagged with its source: self, family-father, family-grandmother, community-siliana. Every collection session is logged with date, place, speaker context, and consent status.

I excluded an entire session of data because I hadn’t established consent before the conversation began. The language was rich. I threw it all away anyway. A dataset built on trust means sometimes throwing away good data.

What this dataset has that scraped corpora don’t:

Regional dialect diversity: urban , mountain Ain Draham, rural Siliana
Generational variation: grandmother’s speech vs mine
Provenance: every pair traces to a known speaker, region, and context
Documented ethics: consent logged, exclusions documented, no anonymous mass scraping

I trained the first Tunisian Darija-to-English translation model on this dataset a 15.6M parameter Transformer built from scratch on an RTX 3050 (4GB VRAM). v1 BLEU: 3.89 on a held-out test set. Low, but the first benchmark ever measured for this language. A published ACL researcher who found my work on Reddit said it’s ‘basically guaranteed to be novel.’

I’m heading toward 1,000+ pairs through continued community collection and will be presenting this research at Tunisia’s AI National Summit (AINS 4.0) later this month the first high schooler to ever present at the event.

The dataset is CC BY-NC-SA 4.0 and public on HuggingFace. 110+ downloads so far.

If you work on low-resource NLP, Arabic dialect processing, or sociolinguistic data it’s yours.

HuggingFace: huggingface.co/datasets/Dhiadev-tn/tunisian-darija-english
Full pipeline + model: github.com/Dhiadev-tn/darija-translator

submitted by /u/Dhiadev-tn
[link] [comments]

0

233 Canadian Used Car Listings Scraped From AutoTrader.ca — Prices, Specs, GPS Coords, Equipment Lists (JSON, June 2026)

Sharing a dataset of 233 used car listings I pulled from AutoTrader.ca this week. All records are from dealer listings (no private sellers, so no personal contact info).

Fields per record (PII removed from this sample):

Price (CAD, formatted + numeric + average market price for comparison)
Specs: make, model, year, trim, body type, drivetrain, transmission, color, displacement, doors, cylinders
Mileage (formatted + numeric km)
Location: city, postal code, latitude, longitude
Equipment by category: comfort, safety, entertainment, extras
History: accident-free flag, Carfax URL, rental flag
Images: URLs (1280×960)

Sample (3 records, contact fields removed):

[ { "data_source": "AutoTrader.ca", "ad_id": "264a7bb7-5b85-4b0c-9420-b87783a41389", "make": "Mazda", "model": "CX-5", "year": 2024, "trim": "Signature AWD – BOSE Sound", "body_type": "SUV", "status": "Used", "price_cad": 39900, "price_formatted": "$ 39,900", "average_market_price": 37600, "mileage_km": 29454, "mileage_formatted": "29,454 km", "transmission": "Automatic", "drivetrain": "All Wheel Drive", "exterior_color": "Red", "interior_color": "Brown", "fuel_type": "Gasoline", "displacement": "2,500 cc", "doors": 4, "cylinders": 4, "city": "NORTH VANCOUVER", "zip_code": "V7P 3R8", "country": "CA", "latitude": 49.3165, "longitude": -123.09942, "seller_name": "Morrey Mazda of the Northshore", "dealer_google_rating": 4.5, "accident_free": true, "comfort_equipment": ["Automatic climate control", "Cruise control", "Heads-up display", "Heated steering wheel", "Navigation system"], "safety_equipment": ["Adaptive Cruise Control", "Electronic stability control", "Lane departure warning system"], "image_count": 34, "created_timestamp": "2026-04-18T07:43:14.098Z" }, { "data_source": "AutoTrader.ca", "ad_id": "ec42fc58-8459-457c-a9a8-54638894a694", "make": "Mazda", "model": "CX-5", "year": 2024, "trim": "GS AWD | Heated Leather", "body_type": "SUV", "status": "Used", "price_cad": 27994, "price_formatted": "$ 27,994", "average_market_price": 30300, "mileage_km": 49984, "mileage_formatted": "49,984 km", "transmission": "Automatic", "drivetrain": "All Wheel Drive", "exterior_color": "Grey", "fuel_type": "Gasoline", "doors": 4, "cylinders": 4, "city": "Fredericton", "zip_code": "E3C 1N8", "country": "CA", "latitude": 45.94504, "longitude": -66.68895, "seller_name": "ReCar", "dealer_google_rating": 4.5, "accident_free": true, "comfort_equipment": ["Air conditioning", "Cruise control", "Leather steering wheel", "Power windows"], "safety_equipment": ["Anti-lock braking system (ABS)", "Electronic stability control", "Traction control"], "image_count": 18, "created_timestamp": "2026-04-24T19:47:48.215Z" }, { "data_source": "AutoTrader.ca", "ad_id": "bd822421-6d67-47ac-a079-69b129aea48f", "make": "Mazda", "model": "CX-5", "year": 2024, "trim": "GS", "body_type": "SUV", "status": "Used", "price_cad": 31757, "price_formatted": "$ 31,757", "average_market_price": 30000, "mileage_km": 66855, "mileage_formatted": "66,855 km", "transmission": "Automatic", "drivetrain": "All Wheel Drive", "exterior_color": "White", "fuel_type": "Gasoline", "doors": 4, "cylinders": 4, "seats": 5, "city": "Mississauga", "zip_code": "L5L1X3", "country": "CA", "latitude": 43.53093, "longitude": -79.67701, "seller_name": "Erin Mills Mazda", "dealer_google_rating": 4.2, "accident_free": true, "carfax_url": "https://vhr.carfax.ca/?id=2GpEicFIk9VsxXw/rcTLBLxhbymmt8Oz", "image_count": 19, "created_timestamp": "2026-04-02T09:26:07.098Z" } ]

Collected via AutoTrader.ca’s public search pages. Happy to share more records or answer questions about the fields.

submitted by /u/kmiloaguilar
[link] [comments]

0

Polymarket 5-minute Crypto Up/down Markets — Full Order Books At 1 Hz, ~26.8M Rows, 7 Coins (CC0)

Sharing a dataset I recorded because nothing like it seems to exist publicly: the order book
of Polymarket’s 5-minute crypto up/down markets, sampled once per second.

~89,000 markets across 7 coins (BTC, ETH, SOL, XRP, DOGE, HYPE, BNB)
~26.8M per-second rows (~300 per market), Mar–May 2026, UTC
Two Parquet tables per coin, joined on `condition_id`: `markets` (one row per 5-min market) and `ticks` (one row per second)
Per tick: best bid/ask, resting sizes, and bid-side 5¢ depth for both the Up and Down outcome – ~725MB total, 99.8%+ coverage, no duplicates
Licence: CC0 (public domain)

Caveats up front: fixed window (collection ended 18 May 2026), outcome is inferred from
the final tick rather than read on-chain, ask-side depth isn’t recorded, and there are ~1.5h
of collector outages over the span (shared across all coins, so collector hiccups rather
than market-data loss). Full data dictionary and coverage audit are in the write-up.

Hugging Face: https://huggingface.co/datasets/kachoio/polymarket-5-minute-crypto-up-down-markets
Kaggle: https://www.kaggle.com/datasets/kachoio/polymarket-5-minute-crypto-updown-markets
Write-up (schema, provenance, limitations): https://kacho.io/polymarket-5min-crypto-dataset

submitted by /u/File-Environmental
[link] [comments]

0

We Mapped ~500k Rooftop PV Installations Across France With Deep Learning — Model, Weights, And Dataset Now Fully Open

**Self-promotion**

Hi r/remotesensing,

I’m sharing DeepPVMapper, an open-source tool we developed to detect and characterize rooftop PV systems from very high-resolution aerial imagery (IGN orthophotos, 20cm).

What’s available:

Model weights on HuggingFace: huggingface.co/gabrielkasmi/bdappv-models
Interactive demo (no GPU, ~1 min/km²): huggingface.co/spaces/gabrielkasmi/deeppvmapper
Training dataset (45k+ images, segmentation masks): huggingface.co/datasets/gabrielkasmi/bdappv
Full detections for France (~500k systems, GeoJSON): https://zenodo.org/records/19188878
Code: github.com/gabrielkasmi/deeppvmapper

What it does:
Detects rooftop PV panels and estimates surface area, installed capacity, tilt and azimuth. Deployed at national scale across France — evaluation against official registries (RTE, RNI) revealed 10% missing capacity nationally.

The repo has been refactored and is open to contributions. Happy to discuss methodology, limitations, or potential extensions.

Project page: gabrielkasmi.github.io/deeppvmapper

submitted by /u/SuperbUpstairs9825
[link] [comments]

0

[self-promotion] [PAID] Built A Deterministic Job Postings Data Pipeline: Looking For Feedback

Disclosure: I built this project and this is my own API/product. It has free and paid access tiers. I’m sharing it here because I think the data engineering approach may be useful, and I’m looking for technical feedback.

I built Trace Jobs Core, a job postings data API built around a simple idea: Do not guess.

A lot of job data pipelines end up doing some combination of:

scraping HTML pages
parsing unstable frontend output
using models to extract fields
guessing missing/ambiguous values
deduplicating after the fact

I took a different approach.

The pipeline ingests job postings from public machine-readable sources, translates them into a Schema.org JobPosting format, applies only deterministic normalization where the source provides clear structure, and preserves original values when fields are ambiguous.

Current system:

9,800+ structured feeds
~13k new postings/day
daily refresh
Schema.org JobPosting records
SHA-256 based deduplication
RFC 8785 canonicalization
original upstream values preserved when normalization is uncertain

The goal is not to create a “smart” interpretation layer. The goal is to provide stable, predictable data and leave interpretation to the downstream user.

A future enrichment layer could exist separately, but it would remain separate from the source-faithful data layer.

Examples (HTML + JSON responses refreshed daily):
https://kaleh.net/trace/examples.html

Documentation:
https://kaleh.net/trace/docs.html

Project overview:
https://kaleh.net/trace/

I would especially appreciate feedback on:

dataset design
normalization strategies
preserving source fidelity
handling schema differences between providers
what fields/data would make this more useful

Thanks!

submitted by /u/0o3705
[link] [comments]

0

Looking To Build And Monetize My First Data Set. All Help Is Appreciated!

So I have access to a vast network of farms and farm workers and have been looking into collecting videos to sell as data sets to AI labs etc. I’ve done research and noticed that it’s hard to find quality data sets specifically in agriculture. A lot of the video data is either from a vehicle moving at a higher speed (which also lacks hand to object interaction) or is simply a birds eye view. I realized I have an opportunity and have started working on it and sending basic outreach to dataset licensing and a few agtech startups. I was curious if anyone has experience in this sort of field?

For video gathering I’ve already found and set up a set of glasses that are able to get the job done. I’ve tested them and have sample videos ready. If you have any advice or tips that would greatly appreciated!

submitted by /u/lter8
[link] [comments]

0

Free Dataset: 3250 Graded LLM Runs On Whether Models Trust In-context Docs Over The Actual Code

I ran a benchmark for a tool I built and figured the dataset might be useful to others. It took ~$100 of API credits to produce.

The test is simple: I give the agent a document describing a piece of code it can’t directly see, then record whether it double-checks the doc against the real code or just takes the doc’s word for it. The doc is sometimes accurate and sometimes out of date, so the data captures how each model handles documentation it can and can’t trust. The writeup covers what I found; the dataset lets you check it or look for your own patterns.

Dataset
Outcome

Star the repo if it’s useful. Cheers.

submitted by /u/AverageGradientBoost
[link] [comments]

0

Announcement: New Release Of The JDBC/Swing-based Database Tool Has Been Published

submitted by /u/Apprehensive-Fix-996
[link] [comments]

0

Free Hosted MCP Server For Open German City Data — 21 Tools, No Key, Open Source

submitted by /u/Fabulous-Rub-7301
[link] [comments]

0

Every US ETF’s Full Holdings And Operational Census Is Public, Machine-readable SEC Data (N-PORT + N-CEN) And Underused

Sharing a data source that’s surprisingly underused for fund analysis: the SEC’s N-PORT and N-CEN filings on EDGAR.

– N-PORT (quarterly, structured XML): every fund’s complete position list with weights, share counts, CUSIP/ISIN, country of domicile, ASC 820 fair-value level, monthly returns, and monthly creation/redemption flows.
– N-CEN (annual, structured XML): tracking difference vs benchmark (gross AND net of fees), securities-lending activity, in-kind creation/redemption percentages, per-broker commissions, and the full service-provider roster.

What you can pull out without any paid vendor:
– Index-fund tracking split into replication vs cost. VOO 2025 was -0.4 bps vs the S&P 500 gross of fees, -16.9 bps net.
– True per-CUSIP overlap between funds. SPY vs VOO is 476 shared holdings, ~97% by weight.
– Issuer-domicile reality checks. SPY is ~97% US, ~3% Ireland/Switzerland/Bermuda/Netherlands.

Gotchas: positions are keyed on CUSIP (not ticker), so you need a CUSIP-to-ticker map to join to anything else; unit investment trusts (like SPY) file lighter N-CEN sections than open-end funds (like VOO), so some fields are legitimately empty; and the public lag is ~60 days after quarter-end.

The StockFit API does the XML parsing and CUSIP resolution if you don’t want to build it yourself.

Not financial advice, just pointing at the filings.

submitted by /u/Either_Door_5500
[link] [comments]

0

Released A Free 45M Doc European Multilingual Corpus — German, French, Spanish, Dutch + 37 More (CC0, HuggingFace) [P]

Built this as part of a multilingual pretraining research project. Figured I’d share it here.

European HPLT v1 — quality-filtered from HPLT v3 web crawl data:

45M documents across 41 European languages (Germanic, Romance, Slavic, Celtic, Baltic, Finno-Ugric + more

~50.9B estimated tokens, ~190 GB raw JSONL

Every doc has a WDS quality score of 10 or higher — exact SHA-256 deduplication applied

Per-document metadata: language, URL, quality score, register/genre tag, char/word count

CC0 1.0 license — fully open, inherited from HPLT v3

Covers lower-resource languages (Maltese, Faroese, Scottish Gaelic, Occitan, Luxembourgish, Irish, Asturian) that are underrepresented in OSCAR and CulturaX.

HuggingFace: huggingface.co/datasets/ashtok897/european-hplt-v1

submitted by /u/ashtok897
[link] [comments]

0

BacenR: R Package For Brazilian Economic Data And Financial Institutions

The goal of bacenR is to provide R functions to download and work with data from the Brazilian Central Bank (Bacen).

The datasets available through bacenR include:

Check it out: https://github.com/rtheodoro/bacenR

#bacen #financialdata #finance #rstats #datacollect #braziliandata

submitted by /u/troyandabedtalkshow
[link] [comments]

0

What Alternative Data Sources Do You Use?

submitted by /u/CrazyCowboySC
[link] [comments]

0

Data Collection For Personal Project

To the People who are gathering data for your RAG, how do you actually collect the data of your own personal information related to location history, payments and message and put it into Database.

I’m building a project where i can ask the questions to it related to my past history events. so most of the things are done through phone but the main problem is how should i send it from the device to DB.

Help me out, any suggestions related to project or any sources will be helpful.
Thanks in Advance!

submitted by /u/yogi_006
[link] [comments]

0

I Am Looking For Historical Mandi Price Data For Wheat Across Maharashtra, India, For A Minimum Period Of 10 Years.

I am looking for historical mandi price data for wheat across Maharashtra, India, for a minimum period of 10 years.

submitted by /u/jaijnendra
[link] [comments]

0

748 Mechanistic Interpretability Papers From ArXiv + Semantic Scholar; Quality-scored JSONL, Free

Sharing a dataset I built.

Disclaimer: this is my project. Free to download and use.

https://huggingface.co/datasets/fineset-io/mechanistic-interpretability-papers

Stats:

– 748 records, 2022–present

– Sources: arXiv + Semantic Scholar, cross-referenced by arxiv_id and DOI

– quality_score: 0–1, citation-normalized

Fields: id, title, abstract, authors, categories, published_date, citation_count, quality_score, has_code, code_url, venue

Built with FineSet (fineset.io).

The waitlist is open if you want daily-refreshed datasets on your own topic.

submitted by /u/fineset-io
[link] [comments]

0

Do You Trust The Data, Or Your Gut, When Outcomes Are Uncertain?

I’ve been following visa backlog updates and community-driven tracking tools recently, trying to make sense of timelines and what they might mean for my own immigration process.

It’s interesting how the same numbers can create different reactions some people feel reassured, others feel anxious, and many of us keep checking for patterns that may or may not actually exist.

It made me think about how we don’t just interpret data for accuracy we also use it for emotional grounding when outcomes feel uncertain.

As someone from a market research background, I naturally try to find patterns in data. But this experience is teaching me that not everything we track has a clear signal, even when it looks very data driven.

Maybe sometimes data is not just about prediction it also helps people sit with uncertainty.

I’m curious how do others deal with uncertainty when the “data” is incomplete and constantly changing.

submitted by /u/Anxious-North3299
[link] [comments]

0

[self-promotion] [PAID] I Built A Macro Stress Monitor For African And LatAm Economies — Structured JSON From Central Bank APIs, World Bank, IMF, And Pink Sheet

Data covers 18 economies across two regions. Each run returns:

– FX momentum (30d/90d, z-scored vs own history)

– Inflation level and trend

– Commodity terms-of-trade impact (price × export share per commodity, e.g. copper +42% × 32% export share = +13.5pp impact for Peru)

– Real interest rate

– Reserve drawdown

– Structural vulnerability (debt, fiscal, banking, governance, REER)

Every signal shows the exact value, threshold, source, and reason string. No black box. Latest addition: companySignals — when a commodity tailwind or shock fires, returns the listed companies with exposure to that commodity in that country (e.g. copper tailwind in Chile → Antofagasta, BHP, Anglo American, Lundin, Teck).

Available on Apify ($1.50/run) and RapidAPI. Full methodology and schema documented in the README.

https://apify.com/malmon/african-economic-stress-monitor

https://apify.com/malmon/latam-economic-stress-monitor

submitted by /u/g_kalle
[link] [comments]

0

Looking For Geomechanical Datasets From CCS/deep Injection Sites For ML Research

Need field-scale data such as:

– In-situ stress (Sv, SHmax, Shmin)

– Pore pressure

– Fault parameters

– Rock mechanical properties

– Injection pressure/rate history

Interested in sites like Sleipner, In Salah, Weyburn, Otway, Decatur, etc.

Already checked CO2 DataShare and NETL EDX, but geomechanical data is limited.

Papers with tabulated field values or any datasets/repositories would be greatly appreciated.

submitted by /u/atralwanderer_1
[link] [comments]

0

Free Hosted MCP Server For Open German City Data — 21 Tools, No Key, Open Source

submitted by /u/Fabulous-Rub-7301
[link] [comments]

0

Category: Datatards

I Built A Decision Intelligence System That Actually Traces Every Number To Real Data

Does Anybody Know Of Any Quality Datasets That Have Images Of Grocery Receipts?

Skill Labor Shortages In US – Where To Find Data?

Request For Help From Someone Inside Russia To Download Migration Data

Looking For Motorcycle Accident CCTV (fixed Or Surveillance-style) Videos

Spent A Few Months Trying To Beat Polymarket’s BTC Up/down Candle Markets. They’re Efficient At Basically Every Static Entry — Here’s What The Data Showed.

UBER MOVEMENT. Wanted A 2022 Uber Movement Dataset But Uber Has Completly Discontinued It.

Dataset: Global Wealth Distribution By Band. Credit Suisse Global Wealth Databook And UBS Global Wealth Report, 2010 To 2023

[Self-Promotion] Active DeepTech Investors Mapped From Recent Funding Activity

WildVid-Lip — A Lip Reading Dataset

I’m 18 And Hand-built The First Tunisian Darija-English Parallel Dataset Field-collected From My Grandmother, Strangers In Cafes, And 50 Categories Of Daily Life. Open Source, Provenance-tagged, 500+ Pairs.

233 Canadian Used Car Listings Scraped From AutoTrader.ca — Prices, Specs, GPS Coords, Equipment Lists (JSON, June 2026)

Polymarket 5-minute Crypto Up/down Markets — Full Order Books At 1 Hz, ~26.8M Rows, 7 Coins (CC0)

We Mapped ~500k Rooftop PV Installations Across France With Deep Learning — Model, Weights, And Dataset Now Fully Open

[self-promotion] [PAID] Built A Deterministic Job Postings Data Pipeline: Looking For Feedback

Looking To Build And Monetize My First Data Set. All Help Is Appreciated!

Free Dataset: 3250 Graded LLM Runs On Whether Models Trust In-context Docs Over The Actual Code

Announcement: New Release Of The JDBC/Swing-based Database Tool Has Been Published

Free Hosted MCP Server For Open German City Data — 21 Tools, No Key, Open Source

Every US ETF’s Full Holdings And Operational Census Is Public, Machine-readable SEC Data (N-PORT + N-CEN) And Underused

Released A Free 45M Doc European Multilingual Corpus — German, French, Spanish, Dutch + 37 More (CC0, HuggingFace) [P]

BacenR: R Package For Brazilian Economic Data And Financial Institutions

What Alternative Data Sources Do You Use?

Data Collection For Personal Project

I Am Looking For Historical Mandi Price Data For Wheat Across Maharashtra, India, For A Minimum Period Of 10 Years.

748 Mechanistic Interpretability Papers From ArXiv + Semantic Scholar; Quality-scored JSONL, Free

Do You Trust The Data, Or Your Gut, When Outcomes Are Uncertain?

[self-promotion] [PAID] I Built A Macro Stress Monitor For African And LatAm Economies — Structured JSON From Central Bank APIs, World Bank, IMF, And Pink Sheet

Looking For Geomechanical Datasets From CCS/deep Injection Sites For ML Research

Free Hosted MCP Server For Open German City Data — 21 Tools, No Key, Open Source

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments