Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

I’ve Collected Comprehensive Private Company Data

Hello everyone,

I’ve been compiling a vast array of data on private companies worldwide and thought it might help to post it here. Whether you’re conducting market research, engaging in competitive analysis, or just have a penchant for data, there’s a lot to explore.

Here’s what you’ll find:

Company Profiles: Names, locations, and industry details of private companies.
Location: Insights on companies from various global locations.
Size Insights: Data on everything from budding startups to established enterprises.
Industry: Breakdown by type of company.

This resource has been crucial for understanding market dynamics and identifying trends. I’m keen to see how the community might use this information in their projects or analyses.

If you find this data potentially useful, feel free to DM me. I’d be happy to discuss how you can access it and perhaps tailor it to your specific needs.

submitted by /u/Shepreneur
[link] [comments]

Need Help To Find Melanoma Subtypes Dataset

Hi everyone,

I’m searching for datasets specifically focused on melanoma subtypes, like:

Nodular melanoma Superficial spreading melanoma Lentigo maligna melanoma Acral lentiginous melanoma

Most of the publicly available datasets I’ve found seem to focus on melanoma vs. benign classification or broader skin cancer types but I haven’t come across anything that categorizes melanoma into its different subtypes.

If anyone can help me or guide me it would be very helpful.

Thanks in advance.

submitted by /u/namesreddito
[link] [comments]

Anyone Have The Following Dataset? The R6A – Yahoo! Front Page Today Module User Click Log Dataset, Version 1.0 (1.1 GB) Https://webscope.sandbox.yahoo.com/

Please help, I want to do some experiment with LinUCB since the original paper seemed using this dataset or older version (not sure). And it seemed it needed an edu email to apply access? Does anyone have access to it? Would you kindly share it through google drive or other drives? Thanks in advance!

submitted by /u/sylph520
[link] [comments]

Seeking Recommendations For Low-Cost Mobility Data Providers For People Density Analysis In Stores And City Areas

Hi everyone,

I’m working on a project to understand people density, both within stores and across different areas of the city, to analyze foot traffic patterns. I know that location data providers like SafeGraph, Cuebiq, and Factori offer these types of mobility datasets, but I’m concerned about the potential cost, which I’ve heard can be quite high.

I’m hoping to find some alternative providers or potentially lower-cost options that could still give me the insights I need without breaking the bank. My ideal dataset would allow me to:

See density and movement patterns around specific POIs (like retail stores or malls) Understand general population density fluctuations across city areas

If you have experience working with affordable mobility data providers (like Veraset, Quadrant, etc.), I’d love to hear about your recommendations, especially if you’ve found options that provide flexibility in pricing or smaller, more budget-friendly packages. In general there’s no options available for small pet projects?

Thanks in advance for any tips!

submitted by /u/mynameisnotjason123
[link] [comments]

Help With ML Project For Damage Detection

Hey guys,

I am currently working on creating a project that detects damage/dents on construction machinery(excavator,cement mixer etc.) rental and a machine learning model is used after the machine is returned to the rental company to detect damages and ‘penalise the renters’ accordingly. It is expected that we have the image of the machines pre-rental so there is a comparison we can look at as a benchmark

What would you all suggest to do for this? Which models should i train/finetune? What data should i collect? Any other suggestion?

If youll have any follow up questions , please ask ahead.

submitted by /u/shroffykrish
[link] [comments]

Request For A Dataset For Rasch Analysis

Hello, Reddit community!

I am currently working on a project involving the analysis of student performance using the Rasch model. I’m looking for a dataset that includes individual student responses to exam questions, specifically with data indicating whether each response was correct or incorrect.

If anyone knows of any publicly available datasets that fit this description, or if you have recommendations on where I might find such data, I would greatly appreciate your help!

Thank you in advance for your assistance!

submitted by /u/Agreeable-Ad-5882
[link] [comments]

Datasets S&P 500 To Measure Innovation

Hey guys!

Our empirical research study focuses on top management characteristics (e.g. age, gender) in relation to the measurement of innovation strategies (e.g. patents, R&D investments).

We are currently struggling to find free databases that provide access to the S&P 500 data that take these characteristics into account.

Apart from WRDS (access to e.g. CRSP Quarterly Update not available), do you know of any other good databases that we could look at?

Many thanks and best regards! 🙂

submitted by /u/Urquharts
[link] [comments]

[PAID] Magazines Dataset, Economist, Vanity Fair, The Atlantic And More

Magazines dataset of all the past issues of following magazines:

Economist (1997 to current issue) The Atlantic (1857 to current issue) Vanity Fair (1913 to current issue) MIT Technology Review (1997 to current issue) TIME (1923 to current issue)

There are a few more magazines in the pipeline (Newyorker, NY Times Mag and a few more), which will be added.

Format: Data is available in JSON and epub format, pdfs can be generated on demand.

NOTE: Vanity Fair shutdown in 1936 and relaunched in 1983, so data between these dates isn’t available for it.

If you’ve any queries or want to buy, please dm me.

submitted by /u/waqarHocain
[link] [comments]

Selling Preprocesed And Cleaned Job Description Dataset (Latest LinkedIn And Indeed STEM Postings From US). The Dataset Contains Both Uncleaned And Preprocessed Data For AI Training. Please Let Me Know If Anyone Would Like It, I’m Trying To Raise Some Money For My Startup. Thanks!!!

Hey!

I have around 700K lines of job description processed for AI and ML training. This extracting just the requirements and responsibilities, splitting them into individual lines, correcting all grammatical mistakes, extracting keywords into software skills and experience, classifying the job description, and adding an H1B filter to it.

The dataset is from LinkedIn and Indeed, I scrape and process around 15K everyday. I also have uncleaned and purely scraped data that is 60K everyday. They are all STEM jobs in the US.

I have attached an example of both datasets with this. You can find them here.

I’m trying to raise around $2000 for my startup and this would help me a lot. However, its no pressure I’m not trying to solicitate, just trying to sell some good dataset.

Let me know if anyone has any questions, and please no hate.

Thanks!

submitted by /u/assassinator444
[link] [comments]

Thanks For The Support! New API To Bypass Cloudflare Turnstile Is Live

A few months ago, I launched my cheap scraping API, and I’m happy to share that 79 users are already using it! 🙌

I’ve received lots of requests asking for an API to bypass Cloudflare Turnstile, and I’m excited to announce that it’s now live! 🎉

Plus, the new API supports custom headers, giving you more flexibility for your scraping needs.

Thanks a ton for all the support!

Let me know if you have any feedback or further requests!

submitted by /u/Affectionate-Olive80
[link] [comments]

[Research] Mushroom Description Dataset

Hi

As my final year uni project, I am building an app that will attempt to classify wild mushrooms, and I would like to build a ‘page’ with an image of the mushroom and some basic info like genus and edibility. Does anyone know of any such dataset?

For context, I have an AI model which is trained with Mushroom Observer’s Machine Learning dataset. I tried to use their Name/Descriptions csv but it is clunky and does not contain images.

Thanks for any help

submitted by /u/Gostinker
[link] [comments]

Need A Data Set That Uses Social Media

Hi, I am currently working on a project which focuses on the influence that social media has on cryptocurrency price fluctuations. Does anyone know where I might be able to find a dataset to help me with this or if a way in which I can collect data from social media myself? Thanks

submitted by /u/GeorgeW427
[link] [comments]

Grocery Price API V2 In The Works – Which Stores Should We Add Next?

Hey r/datasets!

A few months back, I launched a Grocery Price API, and I just wanted to start by saying a big thank you to everyone who subscribed and supported it early on. 🙏

The response has been amazing!

Based on feedback, I’m now diving into V2 to add more stores and make the API even more comprehensive.

I’d love your input:

What are the top grocery stores you’d like to see included?

Whether it’s big national chains or popular local spots, drop your suggestions below!

Thanks again, and I’m excited to keep building this with the community’s needs in mind!

submitted by /u/Affectionate-Olive80
[link] [comments]