Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Any Kind Of Datasets For My Assignment

Greetings to everyone,
I’m looking for a meaningful dataset for my assignment, containing at least 50 rows of observations and 10 columns of categorization. I’ve searched many sites (data.gov, archive.ics, Harvard, world data, etc.), but either the number of rows is low or the columns. Also, I can’t use Kaggle. It’s important for it to be meaningful because I’ll draw an inference from that dataset and support it with articles. Do you have any suggestions? Thank you in advance.

submitted by /u/efrasgar
[link] [comments]

Help Finding Niche County Data For Website Update

I’m working on revamping my company’s website, and we’re aiming to create a detailed profile of our county. Unfortunately, the usual suspects like the Bureau of Labor Statistics and Bureau of Economic Analysis haven’t been super helpful for the specific data I need.
Here’s what I’m looking for to paint a picture of our county’s industrial and lifestyle landscape:

Industrial Parks: Types of industries typically housed in the parks, number of industry parks
Gross Regional Product (GRP): Recent figures and breakdown by industry sector.
Industry-Based Stats: Growth trends in specific industries, key employers in the area.

Productivity Rating: Any available data on worker productivity within the county.
Commuting Stats: Average commute times, preferred modes of transportation.
Lifestyle Stats: Cost of living index, housing market trends, educational attainment levels (if possible).
Do any of you have suggestions for resources with reliable, up-to-date county-level statistics on these topics? Perhaps some hidden gems or gems I’m just not aware of. Local government websites are not very helpful either.

submitted by /u/Agreeable-Ad574
[link] [comments]

Zip+4-to-County Cross-reference Data Source

Does anyone have a source for cross-referencing 9-digit zip codes to their county? I’ve scoured the net pretty hard looking for one and even found a site that seemed to sell it cheap, but they’re apparently out of business. Sources appear to come from CDC (link is dead) and HUD (doesn’t have what it seems to promise . . . 5 digit zip codes only) and I can’t make them work.

This is for state government. We’re looking to place Zip+4 into the proper county for hundreds of thousands of records, need a data table as a source. Onesie lookups and Geocoding don’t look like viable options.

This is burning thousands and thousands of taxpayer dollars. If anyone can provide a lead I’d very much appreciate it.

submitted by /u/Sagrilarus
[link] [comments]

Dataset For Realistic Bank Transactions

I’m currently working on a clustering project that focuses on analysing the spending habits of bank customers to group them into clusters. To do this effectively, I need access to realistic bank transaction data for various different customers, which I will use to test my model. I’ve experimented with GPT-4, but found it inadequate for replicating user behaviours and characteristics. Does anyone have recommendations on where I could find such a dataset, or suggestions on how to generate one?

submitted by /u/ConTheD0N
[link] [comments]

Are There Data On Kowloon Walled City ??

Hey,
I’m currently researching the fascinating history of the Kowloon Walled City, and I’m hoping to find valuable insights or data related to this unique urban phenomenon. For those unfamiliar, the Kowloon Walled City was a densely populated, anarchic enclave in Hong Kong that existed until its demolition in 1993. It was a labyrinth of interconnected buildings, narrow alleyways, and makeshift infrastructure, housing an estimated 3.2 million people per square mile—an astonishing density that defied conventional urban planning.
more info here: https://en.wikipedia.org/wiki/Kowloon_Walled_City

Do you know whether there are public datasets about the whole area? like buildings, population, streets network and so on?

The best would be structured datasets, however also unstructured data (for instance image or pdf that can be easily parsed but with valuable information inside) are interesting.

Thanks for your time

submitted by /u/riegel_d
[link] [comments]

Physical Sciences Keywords/phrases Dataset Request

I’m looking for a dataset of keywords/phrases in the physical sciences (can be a subset of a wider dataset across the sciences), with a range of levels of specificity/granularity that includes terminology that doesn’t exist outside of the relevant fields, as well as words+phrases used across the sciences.

I’m aware of the [https://physh.org/](PhySH) ontology but it’s designed around entities/concepts rather than words+phrases, so its value is limited by the specific terms they’ve used to label those concepts. I’m looking for something more in line with the vocabularies of keywords/phrases used in semantic tagging of articles in places like Web of Science and Scopus.

submitted by /u/dhatch75
[link] [comments]

Math Equations ( Websites, Books, Or Datasets)

I am trying to make a dataset of math equations ( arithmetic, algebra, and trigonometry) for a study project, so I need to scrape some websites or pdf files on my own. I just need equations, but the websites and books that came to my mind will be a hell to scrape (or maybe I am just new to this and missing something.)

If you have some websites, books, or datasets, it will me alot.

Thanks in advance

submitted by /u/AmateurPhilosopher6
[link] [comments]

REQUEST: 275M USA Business Email Dataset

Hey r/datasets,

I represent a small business that is looking to replicate the 275,000,000 record in Apollo.io, ZoomInfo, etc. We are just looking for USA biz emails (not consumer).

This is essentially LinkedIn data + emails.

We can go without phone numbers perhaps.

We have some surprisingly low offers already, but please DM me with any leads on a dataset like this.

Thanks in advance!

(Would also accept offers on 2 column dataset: Name / Email)

submitted by /u/Anon_PR_pro
[link] [comments]

Building A Niche Data Community Of Likeminded People!

Hello everyone,

TL;DR – I’m starting a community for professionals in the data industry or those aiming for big tech data jobs. If you’re interested, please comment below, and I’ll add you to this niche community I’m building.

A bit about me – I’m a Senior Analytics Engineer with extensive experience at major tech companies like Google, Amazon, and Uber. I’ve spent a lot of time mentoring, conducting interviews, and successfully navigating data job interviews.

I want to create a focused community of motivated individuals who are passionate about learning, growing, and advancing their careers in data. Please note that this is not an open-to-all group. I’ve been part of many such “communities” that lost their appeal due to lack of moderation. I’m looking for people who are genuinely interested in learning and growing together, maybe even starting a data-related business.

Imagine a community where we:
* Share insights about big tech companies
* Exchange actual interview questions for various data roles
* Conduct mock interviews to help each other improve
* Access to my personal collection of resources and tools that simplify life
* Share job postings and referral opportunities
* Collaborate on creating micro-SaaS projects

If this sounds exciting to you, let me know in the comments or reach out to me.

PS: Would you prefer this community on Slack or Discord?

Cheers!

submitted by /u/IllustratorOk7613
[link] [comments]

Seeking Feedback: Grocery Pricing Dataset API

Hello, DataMunchers!

I just launched my Grocery Pricing API on RapidAPI, and I’m super stoked to share it with you all! It’s a real-time treasure trove of pricing info for all your grocery needs.

I’m all ears for your thoughts! Any cool features you think would make this API even better? Shoot me your ideas—I’m here to make this tool awesome for us all.

Check it out on RapidAPI and let’s chat about making our data game stronger!

Thanks a ton for your input !

submitted by /u/Affectionate-Olive80
[link] [comments]

[REQUEST] Saudi Market Data, Live Or Historic.

Hi, I searched online alot for historic and live (even if it’s daily updated) Saudi market data but couldn’t seem to find it. I don’t know if such data is open or not, but it feels like market data should be readily available since it’s something public

So if anyone could help me find it or have any open source (or even paid, just not tickerchart -laggy, faulty, unclean, couldn’t easily export data to csv and expensive- ) source?

submitted by /u/Pxy_
[link] [comments]

Searching For A Data Set: School Data Task On, The Dietary Habits And Nutritional Knowledge Of High School Students In Relation To Academic Performance

For school I have a task where using secondary and primary data I have to investigate my topic of “How do the dietary habits and nutritional knowledge of high school students correlate with overall health and academic performance?” The idea is using previous Australian data I can build some kind of questionnaire to find primary data, but finding this data is difficult and I was wondering if anyone could point me in the right direction or help me out with a dataset.

submitted by /u/Jeddyson
[link] [comments]

Independence Of Observations In Datasets

Hi everyone,

I’ve was performing some binary logistic regressions today, but had a bit of a disaster. My analysis involves looking at a country’s international criminal court membership as the dependent variable (coded 0 or 1) and other independent factors such as level of democracy etc.

I thought it was going well. However, when it came to my assumptions testing, I realised something was slightly wrong: my Breusch Pagan test (for residuals) and my GVIE text (for multi-collinearity) had terrible scores.

Then something occurred to me: the dataset I had being using had a row per country per year. I am presuming that this violates the independence of observations as multiple rows have the same country in them?

Does this mean I have to re-do all my analysis which just one row per country instead? This would mean I would have to change my scope to looking at stats for the country upon the year they joined rather looking across all the years.

I would appreciate any help or advice you could give, as I am slightly stressed and confused!

Many thanks,

Tom

submitted by /u/grovseyy
[link] [comments]

Worldwide Violence Perception Dataset For The Period 1970-2021

I’m looking for a dataset that measures perceptions of violence or crime globally for the period 1970-2021. The Global Peace Index (GPI) would be ideal, but it only covers the years 2008-2023.

I’m aware that it’s almost impossible to find such dataset, so I’d take suggestions that measure violence, crime, conflict or any similar proxy for violence perception. However, I can’t deviate much from the period 1970-2021.

submitted by /u/Puzzleheaded_Steak54
[link] [comments]

How To Obtain Data For Journalist Discovery

Hey everyone,

I’m currently working on developing a platform to assist startups in pitching journalists for media coverage, and I could really use some advice on obtaining the necessary journalist data to make it happen.

As part of our efforts to build a comprehensive Journalist Discovery Module, we’re looking to gather essential data to facilitate the identification and connection with relevant journalists. Here’s a list of the data we need:

Email Addresses of Journalists Recent Articles Written by Journalists (with publication details and dates) Social Media Profiles of Journalists (e.g., Twitter, LinkedIn) Topics Covered by Journalists

If you’ve got any ideas how we can access this data, I’d be eternally grateful for your guidance!

submitted by /u/Imaginary-Bench-3175
[link] [comments]

Looking For A Self-hostable Platform For Sharing Datasets

Objective:

I’m looking to create a website intended to gather together and release datasets for a specific theme (impact investing).

These would be a mixture of unamened open access datasets and a few with my edits. CSV and JSON mostly.

It would be cool to also be able to add blog posts with live data object embeds. And maybe (this is a “stretch feature” idea) include a sandbox for querying a read-only database. But the essential elements would be sharing datasets in a way that’s better than Github (no objection to that but I want to give potential visitors a specific site to access).

I tried setting up CKAN today on a VPS and found it a lot of work to get running. I think something a little simpler from an admin perspective would make more sense.

It’s a not-for-profit personal project so I’d like to keep costs reasonable.

Any suggestions for platforms, hosting, or both much appreciated!

submitted by /u/danielrosehill
[link] [comments]

Need Written On People’s Perception Of Artificial Intelligence (AI) And Their Job Prospects

If anyone can connect me with any written prose (up to and including reddit threads) from everyday working-age people on the adoption of artificial intelligence by corporations and organizations and what they feel it portends for their job prospects now and in the future, I’d sure be thankful. I’m doing a primary research study on such, but I’d like to have unprompted thoughts with which to compare my dataset.

My gratitude abounds.

submitted by /u/molineskytown
[link] [comments]

Crime Rates In The US- Latest Data Needed

Hi everyone, I’m looking for a reliable open source where I can find the latest available either crime rates/crime index or the ranks data for all the cities in the USA. Can anybody help me out with this? I have tried looking on FBI’s site but all I could find over there is the data by states or region population size.

submitted by /u/bandhu_
[link] [comments]

Looking For An Old Drugbank.ca Dataset

Dear community,

back in 2019 or 2020, I downloaded the full dataset from Drugbank.ca and have been using it for personal purposes ever since. Unfortunately, I recently lost all my data (both in NAS and backup), and now I’m unable to re-download the dataset as access is restricted now. I’m not affiliated with any academic institution and sadly, I can’t afford the payment.

Does anyone happen to have an old version of their full database?

I would be *extremely* grateful for your help.

submitted by /u/VohaulsWetDream
[link] [comments]

Earth Science Dataset Binary Classification

I’m a statistician looking for a dataset in earth science for a binary classification task, i.e., the response variable should be binary. My goal is to test a newly developed version of the invariant causal prediction algorithm, which tries to find the immediate causal drivers of some response variable. Do you have any suggestions for interesting datasets with roughly 3 to 10 covariates (continuous or categorical) and a binary response? Any help would be much appreciated!

submitted by /u/ParticularJacket6330
[link] [comments]