Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Having Trouble Launching Survey Via Facebook Ads.

Hi all,

I am working on my thesis for my MBA and I am completing the survey portion of the paper via Facebook ads. Does anyone here have experience successfully launching a survey via Facebook ads and getting conversions?

If so, any insight or resources that would help me to do this successfully is greatly appreciated. Thanks.

submitted by /u/DrivenCleats
[link] [comments]

Can Anyone Provide Me With A Dataset That Is Dental Or Endodontics Related?

I’m building my data analytics portfolio and am particularly interested in dental or endodontic-related data. Does anyone have recommendations for publicly available datasets or shareable anonymized data from dental or endodontic practices? I’m looking specifically for datasets that could be used for analysis, visualization, and insights relevant to clinical outcomes, patient demographics, treatments performed, revenue, insurance claims, or similar topics.

Thanks in advance for your help!

submitted by /u/Plane_Fail9033
[link] [comments]

[PAID] Huge WhoIs Dataset Available From Http://bestwhois.org/domain_name_data/domain_names_whois/ (Private Access Only)

Hi. I have access to a lot of whois related data, for the last 6 months. Data uploads everyday.

Fields are:

  • id
  • domainName
  • registrarName
  • contactEmail
  • nameServers
  • createdDate
  • expiresDate
  • registrant_email
  • registrant_organization
  • registrant_street1
  • registrant_city
  • registrant_state
  • registrant_postalCode
  • registrant_country
  • registrant_telephone
  • administrativeContact_email
  • administrativeContact_name
  • administrativeContact_organization
  • administrativeContact_street1
  • administrativeContact_city
  • administrativeContact_state
  • administrativeContact_postalCode
  • administrativeContact_country
  • administrativeContact_telephone
  • technicalContact_name
  • technicalContact_organization
  • technicalContact_email
  • technicalContact_street1
  • technicalContact_street2
  • technicalContact_city
  • technicalContact_state
  • technicalContact_postalCode
  • technicalContact_country
  • technicalContact_telephone

DM if interested.

submitted by /u/Persian_Cat_0702
[link] [comments]

Common Crawl Claims To Be Free And Available To Everyone — But That’s Not Really True

Common Crawl advertises itself as “freely available to anyone,” but the reality is much less accessible than that.

Yes, the data is technically free. But to actually use it, you have to deal with:

  • Massive WARC files that require serious compute just to parse
  • Storage and bandwidth costs that can easily hit enterprise-level pricing
  • Complex indexing and filtering tools, many of which assume you’re running this on a cloud infrastructure setup

Unless you’re backed by a company, university, or loaded with cloud credits, you’re priced out. It’s not practical for individuals or small teams.

This kind of marketing gives a false impression of openness. Free data that’s functionally inaccessible to most people isn’t truly free.

Has anyone here actually managed to work with Common Crawl as an independent dev or researcher? Curious what workflows or tools (if any) make it doable without breaking the bank.

submitted by /u/uslashreader
[link] [comments]

Worldwide Presidents And Their Non-presidential Occupations/fields Of Study

Hi,
A while ago, I had a very specific question – what former profession is a president (or any publicly elected head of country) most likely to have? I thought it could be fun and a good way to learn some basics of data processing. But where do I even start?
My initial idea was to scrape off the relevant information off wikipedia or wikidata, but i can’t find a good way to do it. any advice? any pre-existing dataset that could work for this?
i have experience in python coding but have never done anything similar, any resources would help.

submitted by /u/nee_chee
[link] [comments]

[PAID] Multiple Websites Datasets I Have Scraped Over The Last Few Months.

Hi. I have scraped around 500K products from GlobalSources. Also have datasets for these websites:

  • CastleDoct
  • ConferenceIndex
  • CourtsDelaware
  • CpaDirectory
  • Dubizzle
  • SearchPeopleFree
  • FastPeopleSearch
  • Go4WorldBusiness
  • HealthGrades
  • Itel
  • Patpat
  • PropertyFinder
  • UsaTopDentists
  • WebDistricts
  • SpeedyDrive
  • DirectMacro
  • Npino
  • Tradefest
  • WholeSaleCentral
  • MadeInChina
  • Beforward
  • UsNews
  • SmartSd
  • Osec
  • HardDiskDirect
  • Mem4Less

Will provide any data you want on a low price. DM for details. Thanks.

submitted by /u/Persian_Cat_0702
[link] [comments]

Multiple Websites Datasets I Have Scraped Over The Last Few Months.

Hi. I have scraped around 500K products from GlobalSources. Also have datasets for these websites:

  • CastleDoct
  • ConferenceIndex
  • CourtsDelaware
  • CpaDirectory
  • Dubizzle
  • SearchPeopleFree
  • FastPeopleSearch
  • Go4WorldBusiness
  • HealthGrades
  • Itel
  • Patpat
  • PropertyFinder
  • UsaTopDentists
  • WebDistricts
  • SpeedyDrive
  • DirectMacro
  • Npino
  • Tradefest
  • WholeSaleCentral
  • MadeInChina
  • Beforward
  • UsNews
  • SmartSd
  • Osec
  • HardDiskDirect
  • Mem4Less

Will provide any data you want on a low price. DM for details. Thanks.

submitted by /u/Persian_Cat_0702
[link] [comments]

Need Urgent Help Merging MIMIC-IV CSV Files For ML Project

Hi everyone,

We’re working on a machine learning project using the MIMIC-IV dataset, but we’re struggling to merge the CSV files into a single dataset. The issue is that the zip file is 9GB, and we don’t have enough processing power to efficiently join the tables.

Since MIMIC-IV follows a relational structure, we’re unsure about the best way to merge tables like patients, admissions, diagnoses, procedures, etc. while keeping relationships intact.

Has anyone successfully processed MIMIC-IV under similar constraints? Would SQLite, Dask, or any cloud-based solution be a good alternative? Any sample queries, scripts, or lightweight processing strategies would be a huge help.

We need this urgently, so any quick guidance would be amazing. Thanks in advance!

submitted by /u/bindumalavika24
[link] [comments]

Looking For A Pan-UK Dataset With Demographic Information

I am looking for a dataset for the United Kingdom, which contains information about ethnicity, BMI or weight/height, smoking habits (categorical or numerical), alcohol consumption (categorical or numerical), current medical conditions and family history of medical conditions. Data does not have to be clean, but I am not seeking data tables composed of summary statistics. Please help!

PS: Not looking to scrape at this point!

submitted by /u/Mayeeah
[link] [comments]

US Housing Sale Price Dataset (2025)

Hi, I’m looking for a good dataset of current/updated US property sale prices to build a home valuation calculator as a project. Looking for one that encompasses all of the US. Does anyone know of a free (or inexpensive) dataset that can be acquired. Ideally, it should have features such as ‘bedrooms’, bathrooms’, ‘zip code’, ‘area’, etc…
Thanks!

submitted by /u/ynewman8
[link] [comments]

Finding Festival Lineup Data For An Assignment

Hey everyone! I’m working on a school project where I’m looking at how music festival lineups have changed over time. I want to analyze things like: How different genres have been booked over the years Gender diversity in festival lineups If festivals book trending artists vs. just big names

I’m trying to find past lineup data from festivals like Coachella, ACL, Lollapalooza, and others. Does anyone know where I can find full historical lineups in a spreadsheet or database format? Even a good website that lists them year by year would help a lot.

If anyone has worked on something similar or knows a good resource, I’d really appreciate it! Thanks in advance.(ps I’m still a noob when it come to learning excel so any help is much appreciated)

submitted by /u/Mother_Dragonfruit_9
[link] [comments]

Looking For A Multi-File Dataset For Business Analysis + Predictive Modeling + XAI (SHAP/LIME)

Hey everyone,

I’m currently working on a business analysis project and I’m on the lookout for a real-world dataset that meets the following criteria: • Contains at least 3 separate files (e.g., orders, customers, products – or anything similar that requires joining/merging). • Involves a business-related problem (e.g., sales forecasting, churn prediction, customer segmentation, etc.). • Suitable for predictive modeling (classification or regression). • Offers scope for applying Explainable/Responsible AI techniques like SHAP or LIME to interpret model predictions.

The goal is to build a pipeline that includes data cleaning, exploratory analysis, predictive modeling, and model explainability — ideally tied to a meaningful business decision.

If you know of any public datasets (Kaggle, GitHub, open data portals, etc.) that fit this description, I’d really appreciate your help!

Thanks in advance!

submitted by /u/Consistent-Judge101
[link] [comments]

Looking For Marathon/Race Bib Number Detection Dataset

Hey r/datasets

I’m working on a deep learning project for my class to develop an automated bib number detection system for marathon and running events. Currently struggling to find a comprehensive dataset that captures the complexity of real-world race photography.

Anyone have datasets they’d be willing to share or know of research groups working on similar projects? Happy to collaborate and credit contributors!

Crossposting for visibility. Appreciate any leads! 🏃‍♂️📸

submitted by /u/galdorgo
[link] [comments]