Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Seeking Recommendations For Low-Cost Mobility Data Providers For People Density Analysis In Stores And City Areas

Hi everyone,

I’m working on a project to understand people density, both within stores and across different areas of the city, to analyze foot traffic patterns. I know that location data providers like SafeGraph, Cuebiq, and Factori offer these types of mobility datasets, but I’m concerned about the potential cost, which I’ve heard can be quite high.

I’m hoping to find some alternative providers or potentially lower-cost options that could still give me the insights I need without breaking the bank. My ideal dataset would allow me to:

See density and movement patterns around specific POIs (like retail stores or malls) Understand general population density fluctuations across city areas

If you have experience working with affordable mobility data providers (like Veraset, Quadrant, etc.), I’d love to hear about your recommendations, especially if you’ve found options that provide flexibility in pricing or smaller, more budget-friendly packages. In general there’s no options available for small pet projects?

Thanks in advance for any tips!

submitted by /u/mynameisnotjason123
[link] [comments]

Help With ML Project For Damage Detection

Hey guys,

I am currently working on creating a project that detects damage/dents on construction machinery(excavator,cement mixer etc.) rental and a machine learning model is used after the machine is returned to the rental company to detect damages and ‘penalise the renters’ accordingly. It is expected that we have the image of the machines pre-rental so there is a comparison we can look at as a benchmark

What would you all suggest to do for this? Which models should i train/finetune? What data should i collect? Any other suggestion?

If youll have any follow up questions , please ask ahead.

submitted by /u/shroffykrish
[link] [comments]

Request For A Dataset For Rasch Analysis

Hello, Reddit community!

I am currently working on a project involving the analysis of student performance using the Rasch model. I’m looking for a dataset that includes individual student responses to exam questions, specifically with data indicating whether each response was correct or incorrect.

If anyone knows of any publicly available datasets that fit this description, or if you have recommendations on where I might find such data, I would greatly appreciate your help!

Thank you in advance for your assistance!

submitted by /u/Agreeable-Ad-5882
[link] [comments]

Datasets S&P 500 To Measure Innovation

Hey guys!

Our empirical research study focuses on top management characteristics (e.g. age, gender) in relation to the measurement of innovation strategies (e.g. patents, R&D investments).

We are currently struggling to find free databases that provide access to the S&P 500 data that take these characteristics into account.

Apart from WRDS (access to e.g. CRSP Quarterly Update not available), do you know of any other good databases that we could look at?

Many thanks and best regards! 🙂

submitted by /u/Urquharts
[link] [comments]

[PAID] Magazines Dataset, Economist, Vanity Fair, The Atlantic And More

Magazines dataset of all the past issues of following magazines:

Economist (1997 to current issue) The Atlantic (1857 to current issue) Vanity Fair (1913 to current issue) MIT Technology Review (1997 to current issue) TIME (1923 to current issue)

There are a few more magazines in the pipeline (Newyorker, NY Times Mag and a few more), which will be added.

Format: Data is available in JSON and epub format, pdfs can be generated on demand.

NOTE: Vanity Fair shutdown in 1936 and relaunched in 1983, so data between these dates isn’t available for it.

If you’ve any queries or want to buy, please dm me.

submitted by /u/waqarHocain
[link] [comments]

Selling Preprocesed And Cleaned Job Description Dataset (Latest LinkedIn And Indeed STEM Postings From US). The Dataset Contains Both Uncleaned And Preprocessed Data For AI Training. Please Let Me Know If Anyone Would Like It, I’m Trying To Raise Some Money For My Startup. Thanks!!!

Hey!

I have around 700K lines of job description processed for AI and ML training. This extracting just the requirements and responsibilities, splitting them into individual lines, correcting all grammatical mistakes, extracting keywords into software skills and experience, classifying the job description, and adding an H1B filter to it.

The dataset is from LinkedIn and Indeed, I scrape and process around 15K everyday. I also have uncleaned and purely scraped data that is 60K everyday. They are all STEM jobs in the US.

I have attached an example of both datasets with this. You can find them here.

I’m trying to raise around $2000 for my startup and this would help me a lot. However, its no pressure I’m not trying to solicitate, just trying to sell some good dataset.

Let me know if anyone has any questions, and please no hate.

Thanks!

submitted by /u/assassinator444
[link] [comments]

Thanks For The Support! New API To Bypass Cloudflare Turnstile Is Live

A few months ago, I launched my cheap scraping API, and I’m happy to share that 79 users are already using it! 🙌

I’ve received lots of requests asking for an API to bypass Cloudflare Turnstile, and I’m excited to announce that it’s now live! 🎉

Plus, the new API supports custom headers, giving you more flexibility for your scraping needs.

Thanks a ton for all the support!

Let me know if you have any feedback or further requests!

submitted by /u/Affectionate-Olive80
[link] [comments]

[Research] Mushroom Description Dataset

Hi

As my final year uni project, I am building an app that will attempt to classify wild mushrooms, and I would like to build a ‘page’ with an image of the mushroom and some basic info like genus and edibility. Does anyone know of any such dataset?

For context, I have an AI model which is trained with Mushroom Observer’s Machine Learning dataset. I tried to use their Name/Descriptions csv but it is clunky and does not contain images.

Thanks for any help

submitted by /u/Gostinker
[link] [comments]

Need A Data Set That Uses Social Media

Hi, I am currently working on a project which focuses on the influence that social media has on cryptocurrency price fluctuations. Does anyone know where I might be able to find a dataset to help me with this or if a way in which I can collect data from social media myself? Thanks

submitted by /u/GeorgeW427
[link] [comments]

Grocery Price API V2 In The Works – Which Stores Should We Add Next?

Hey r/datasets!

A few months back, I launched a Grocery Price API, and I just wanted to start by saying a big thank you to everyone who subscribed and supported it early on. 🙏

The response has been amazing!

Based on feedback, I’m now diving into V2 to add more stores and make the API even more comprehensive.

I’d love your input:

What are the top grocery stores you’d like to see included?

Whether it’s big national chains or popular local spots, drop your suggestions below!

Thanks again, and I’m excited to keep building this with the community’s needs in mind!

submitted by /u/Affectionate-Olive80
[link] [comments]

Light Pollution Dataset For Data Visualization

I would like to obtain a usable dataset on light pollution: tracking the increase brightness in United States cities. I have not been able to locate a suitable dataset. Lots of maps and visualizations, but not a dataset I can work with myself in python and R. Any recommendations and leads are appreciated. Thanks!

submitted by /u/SupremoSpider
[link] [comments]

Need Ideas For Data Science School Project

My friend and I are looking for a fun dataset to use for our end of year project. The goal is to make a random forest and then use that to make predictions about unseen instances.

We aren’t entirely sure where to look for data sets or what we want to do, so all recommendations are welcome! Thanks in advance!

submitted by /u/DeltaShadow4
[link] [comments]

Need Help Opening A Massive .dbo (45GB) — Any Advice?

Hey everyone! I’ve got this gigantic file, ePCR.dbo.MedicalRecord, sitting at a whopping 45.4 GB, and I’m stumped on how to open it. 😅 I tried using DBeaver, but I keep hitting an OutOfMemoryError, even after bumping up the memory settings. It seems like it’s way too big for DBeaver to handle.

Does anyone have any experience with these kinds of files or know any tricks for working with huge .dbo files? Ideally, I’d like to export the data to a CSV so I can actually dig into it, but I’m open to any advice or tool suggestions. I’m not even 100% sure what program originally created this file, so I’m working with limited info here.

Image: File Properties

Any help would be awesome — thanks in advance! 🙏

submitted by /u/alb53
[link] [comments]

PhysioNet Account Registration And Other Sources Of EHR Dataset

Hi, I’m developing a project in which I need electronic health record (EHR) data to identify high risk patient and for early intervention.

Got to know about MIMIC-IV and trying to create account on PhysioNet, however unable to proceed since no emails were received regarding account creation/ activation, even in spam folder. Anyone had the same issue? Any ways to resolve this?

And any other sources of EHR dataset around? Preferably include lab data and patient clinical note to assist with early prediction among others.

Any help and suggestion are much appreciated.

submitted by /u/Radiant_Blue_Eyes
[link] [comments]