Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Is Anyone Aware Of Any Country-wide, Detailed And Multi-topic Attitude And Behavior Polls?

As the title states, I’m looking for some country-wide datasets which cover topics like people’s views and behaviors concerning technology, the environment, and beyond, in a detailed way. What I’m looking for goes a little more in-depth than most national/international polls — for example, the European Social Survey will also cover niche topics, but will usually only ask a question or two about them.

The UK Household Longitudinal Study is an excellent example, but I’m wondering if these kinds of datasets exist for other countries, or even across countries. The Gallup World Poll also seems to cover these topics in a multi-country context, but is behind a paywall.

Any recommendations would be greatly appreciated!

submitted by /u/oliveheron
[link] [comments]

Words That Do Not Convey The Subject Of A Sentence

Hi all! I’m building an application that automatically quizzes you on textual datasets! So far things are working brilliantly, but I’m running into an issue. I wish to remove words that are “uninteresting” for quizzing. Exactly my problem is that I don’t know how to describe them, so don’t know what to lookup. I’ll show an example instead.

“The mitochondria is the powerhouse of the cell”

If I had a simple fill-in-the-blanks question, I want to avoid blanking “the” “is” and “of” as that would make for a very boring quiz question. I’m not a linguist, but from my rudimentary knowledge, I don’t know of any linguistic term that applies to these words as they aren’t just, in the general case, prepositons, for example.

Best case, someone already knows a dataset of words that I can use, but I would really appreciate any help for even what to look up on this topic.

I hope this is appropriate to ask here, else, forgive me and I’ll happily take recommendations for where else to ask!

Many thanks

submitted by /u/langers8
[link] [comments]

Billion Social Media Posts Datasets / Sample – Dicussion

Hey fellow datasets enthusiasts!

I’ve developed a robust public data collection engine that’s been quietly amassing an impressive dataset, and I’m curious about its potential applications and demand.

The Dataset

Scale: Over 2 billion data points, with 10 million added per day (4 billion per year at our current rate) Sources: Diverse and challenging public social media sources (X, Reddit, BlueSky, Youtube, Mastodon, Lemmy, TradingView, bitointalk, jeuxvideo.com, etc.) (6000+ sources) Collection: Near real-time capture Rich: Structured, and annotated with translation, emotions, sentiment, top_keywords, topics.

We are an emerging, small startup, and of course I’m not trying to do self promotion, so won’t write the link/name (PM me for that).

I was thinking of opening datasets on Hugginface. I could do several & in various forms, I wanted to know what this community would be most interested in?

Possibilities are:

– A full slice of 1 day of data, with all annotated/attributes

– A sampled set of 1 source (for example X dataset, Reddit dataset) up to like 10 million items

– etc.

What would be interesting to you all? We want to do a genuine gift to the Open Source community, especially since Twitter/X shut down its free API & locked out 99.99% of OSINT/researchers.

submitted by /u/askolein
[link] [comments]

[self-promotion] Introducing My Newegg & Glovo Scrapers On Apify

Heyo!

I’m a Computer Science MSc student with recent interest in web scraping and data automation. Over the past few years, I’ve honed my skills in backend development and web scraping, and I’m excited to share two Apify Actors I’ve developed to help you build comprehensive datasets effortlessly.

🔍 What I Built:

Newegg Scraper: Newegg Scraper on Apify Features: Extracts detailed product information, pricing, customer reviews, and category listings from Newegg. Use Cases: Ideal for creating datasets for market analysis, price tracking, and competitive research in the electronics and e-commerce sectors. Glovo Scraper: Glovo Scraper on Apify Features: Gathers comprehensive restaurant data, including names, addresses, delivery fees, promotions, and menu items from Glovo. Use Cases: Perfect for building datasets related to food delivery services, local restaurant analysis, and market trend tracking.

Why These Scrapers?

Building high-quality datasets can be time-consuming and technically challenging. These scrapers are designed to simplify the data collection process, providing you with structured and ready-to-use data for your projects. Whether you’re conducting research, developing machine learning models, or performing business intelligence, these tools can save you valuable time.

Seeking Your Feedback:

I’m eager to hear your thoughts! If you have any suggestions for improvements, additional features you’d like to see, or feedback on your experience using these scrapers, please let me know. Your insights are invaluable in making these tools even better for the community.

Thank you for your time, and happy data hoarding! 🗄️✨

submitted by /u/Rorisjack
[link] [comments]

Data Provenance: What Solutions Are You Using, If Any?

Hello everyone,

I’m curious about how people in this community are handling data provenance. For those unfamiliar, data provenance is about tracking the origins and transformations of data throughout its lifecycle.

Are you currently using any tools or methods to track the provenance of your datasets? If yes, what solutions are you using? Are they custom-built or off-the-shelf? If not, do you see a need for such tools in your work? What features would you consider essential in a data provenance solution?

submitted by /u/crtahlin
[link] [comments]

Retail Electricity Prices In PJM And ISO-NE Operation Regions

I am trying to decompose retail electricity prices into its components (transmission costs, fuel costs etc), and discuss determinants of retail energy prices in these two markets. My overarching goal is to explain the reason(s) behind different energy costs faced by retail customers across the US. These two regions have the most similar markets among those with organized capacity markets (although correct me if I am wrong). These regions have consistently high pricing, but what explains this discrepancy compared to the rest of the country? Locational Marginal Prices would also work.

Any advice is greatly appreciated. Thanks in advance!

submitted by /u/capricious_scales
[link] [comments]

Final Year Project In Data Analytics

Hi all,

I am currently a Malaysian student, in my final year and have my FYP pending. I am studying computer science, specialising in Data Analytics. I’ll need to do the standard data pre-processing, visualising, model building etc. However, it is mandatory to include 1 of the SDG goals in my overall project.

I just need some advice on which potential topics I could go into, as I keep over thinking every topic, and am struggling to settle with one. And if anyone could help me find some good datasets to go with the topic, that would be very appreciated.

Thanks to anyone who takes time to read this!

submitted by /u/Shadow_Wing210
[link] [comments]

Looking For Owner-occupied Housing By ZIP Code (USA)

I’ve been searching for a reliable data set showing owner-occupied housing numbers by ZIP code in the US. I’ve found several data sets from HUD and the Census Bureau, but so far I’ve not found these numbers, at least broken down by ZIP code. Has anyone else found a reliable source for such data? Thanks in advance.

submitted by /u/tdmitch
[link] [comments]

Need Dataset For The Final Project ..

I need to make a Ai/ML final project for my course, the deadline is for 2 weeks and i have decided to go with personalised learning pathways…. therefore i need for the same so that i can make the project and also some feedback would be good , about is this a good project . If not then , please tell me some ideas or share resources for another idea…but yeh please share the dataset

submitted by /u/Pristine_Rough_6371
[link] [comments]

Looking For A Labeled Water Quality Anomaly Dataset

Hi good people,

I’m currently working on a project focused on anomaly detection in water quality and am on the lookout for a labeled dataset that include labeled instances of abnormal water quality conditions.

If anyone has come across or worked with such datasets, I’d greatly appreciate it if you could share a link or point me in the right direction.

Any help is much appreciated!

submitted by /u/evonshahriar
[link] [comments]

Has Anyone Used The Health And Retirement Study 2016

I was doing a project using the Health and Retirement Study (HRS) but it turns out the years I wanted to use would not work but 2016 would. The data is downloading as a .dat file which means that it is not possible to open. Has anyone ever used it in the past before it was converted to a .dat file. I need to make this change in 24 hours and have spent the last few months trying to clean the other data I thought I needed. Now I need to make this switch.

submitted by /u/Rajah_1994
[link] [comments]

Looking For Datasets On The May 6, 2010 Flash Crash

Hi everyone!

I’m a student working on a research project about the 2010 Flash Crash. My focus is on understanding how algorithmic trading and market infrastructure contributed to the event.

I’m searching for historical datasets that capture intraday trading activity on May 6, 2010, particularly for key indices (Dow Jones Industrial Average, S&P 500, and Nasdaq Composite Index) and other heavily impacted individual equities. Ideally, i’m looking for tick-level or minute-by-minute data, but i’m open to aggregated datasets as well.

Also any pointers to how I can obtain this data is appreciated!

Thanks in advance!

submitted by /u/FunTax2689
[link] [comments]

Looking For DATA Sets Sites And Sources

Hello everyone,

I am currently working on module as part of my artificial intelligence course in the university, and my task is to develop a module which find correlation connection chronical diseases with ECG and blood test recordings.
I am currently struggling to find the right data sets and recordings on PhysioNet and on Kaggle.
Can you direct to me more websites contain data bases or even specific data sets?

Thanks.

submitted by /u/The_Eliyahu
[link] [comments]

Need Help Retrieving Parcel Data Set

I’m trying to download the parcel data set from the following public website:

https://gishub-beltramicounty.hub.arcgis.com/datasets/BeltramiCounty::tax-parcels/about

But it seems to keep failing out and not being able to create the download. i’ve tried this on multiple computers for several different internet connections and haven’t been able to get this to work.

Does anyone know what I’m missing here? Or do i just need to email the county and ask for the file directly?

Thank you!

submitted by /u/cptncivil
[link] [comments]

Looking For Dataset For My Project Due To Next Week

Hello everyone, this is my first time posting in here and I’m really really in need of heart beat, geroscope, thermometer,

My project is about detecting phobia specifically agoraphobia using ML and AI yet I couldn’t find any dataset for it or any kind of data related to stress and it’s too late for me to back off and change the topic

I’m begging you, if you can help me please dont hesitate I am desperate and I dont know what to do

submitted by /u/Revolutionary_Bat94
[link] [comments]

Weight And Height Of People In One Country Over Time

People used to be small. And now they are taller and have a heavier BMI. But i wonder what the increase of just weight (mass) over time looks like. Theres data for BMI in ourworldindata and gapminder. But not raw
average mass eg of men in France 1900 60kg, 1920 65kg 1940 70kg etc type data.
The separated out heights and weights that make up BMI.

Do you know a dataset like this?
This wikipedia page links to individual government sites but searching for German data if you are not german is really hard https://en.wikipedia.org/wiki/Human_body_weight

BMI but not height and weight separated https://w3.unece.org/PXWeb2015/pxweb/en/STAT/STAT__30-GE__06-Health/006_en_GEHEWeight_r.px/table/tableViewLayout1/

https://www.gapminder.org/fw/world-health-chart/
Heght and weight but not with by historical time https://www.kaggle.com/datasets/burnoutminer/heights-and-weights-dataset
3 recent years but not a long view https://data.gov.ie/dataset/his53-average-weight
Does the us army have data on the people it takes in each year? That would do it.

submitted by /u/cavedave
[link] [comments]