Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

How To Build A Realistic Health Related Dataset

Hi, guys. I need to create a realistic health data set to showcase how a data analytics platform can help to draw useful insights, such as identifying seasonal trends, local hotspots, supply chain issue, etc.

The data needs to be recorded daily/weekly and have dimensions as facility name, age group, gender and indicators such as suspected and confirmed cases, vaccine stock, people immunized and missed immunizations.

I tried GPT but it cannot handle this task well. Does anyone know how to do this? Thanks!

submitted by /u/Technical-Blood9031
[link] [comments]

Looking For A Gym Exercise Dataset From A Peer Reviewed Journal

Hey guys, basically I’m working on a system that would use machine learning to recommend workout plans (exercise selection etc) based on the muscles that users want to prioritise, whether their goal is strength or hypertrophy etc, and I need some datasets that I could potentially use to train my model.

My professor said to look into emg studies and whatnot but I was wondering if anyone could help me out and even potentially link to some datasets I could use. He said to try to use datasets that have been used in peer reviewed journals, and to avoid places like kaggle if I can. I really want to be able to use a dataset like this one https://www.kaggle.com/datasets/niharika41298/gym-exercise-data/data

How would I go about finding a similar dataset but it’s from a peer reviewed journal?

Any help would really be appreciated, thanks. If this isn’t the right place to ask then any pointers on where to ask would be appreciate too, thanks

submitted by /u/ILikeFish69
[link] [comments]

Looking For A Paraquat Applicator/Farmers Database

Hey 👋🏻,

I’m currently working on a project and I’m trying to get my hands on a database that tracks farmers or applicators who have used Paraquat. I’m particularly interested in any datasets that could provide info on usage patterns, application history, or anything related to this herbicide.

I’ve done some basic searches but haven’t had much luck finding something concrete. Does anyone here know where I might be able to find such a dataset? Whether it’s publicly available, or even something I’d need to purchase or request through an organization, any lead would be super helpful.

Thanks in advance for any tips or suggestions! 👨‍🌾

submitted by /u/alb53
[link] [comments]

Looking For Data Set To Detect Anxiety Or Panic Attacks Or Phobia Or Stress

I’m working on a project about detecting physiological symptoms of anxiety in general using physiological sensors: Gyroscope, Thermometer, Heartbeat.

And using machine learning.

I need data set to put in the system so he can tell if that person is stressed or not and I don’t have much time to submit the project to actually train the system

Thank you all in advance

submitted by /u/Revolutionary_Bat94
[link] [comments]

MIT Technology Review Data In JSON Format [1997-2024]

MIT technology review magazine data from January 1997 to October 2024. I started scrapping from 1890 but looks like posts from years < 1997 aren’t posted so I’ve excluded them from the dataset (I’ve metadata about these issues though, which includes the cover image, title and link to the pdf file for that issue).

Format:

{ title: “Issue Title”, date: “2024 January”, hero: “cover image url”, pdfLink: “link to pdf file”, posts: [{ title: “Post Title”, date: “Article publishing date”, topic: “Policy”, headerImg: “image url for article hero img”, authors: [{ name: “Author name”, link: “Link to author profile”, }], body: “<p>Article content goes here</p>”, }] }

All files are stored in folders named by year.

Useage: I actually scrapped this data for myself to generate epub and pdf files with less clutter and better readability on mobile/kindle devices. I’m currently scrapping all the popular magazines like economist, newyorker, atlantic, vanity fair etc without a solid usecase other then generating epubs/pdfs. You can generate epubs/html or combine it with other data to use in some LLM projects.

Download link: Google Drive

submitted by /u/waqarHocain
[link] [comments]

Seeking Dataset Of Breast Cancer Evolution During Treatment

We are trying to develop a model that could help predict the resistance of breast cancer. For that, having clinical, digital pathology, genomic and transcriptomic profiles of pre-treatment biopsies of breast tumours and the pathology end point is necessary. Even MRI or some other mammogram images of the evolution of breast tumor as the treatment is given will help. So can someone help me with it. I tried looking up on cancerimagearchive but i was not able to find any dataset that shows the progression of tumor as the treatment progresses.

submitted by /u/Desperate_Parking_29
[link] [comments]

Does Anyone Have A Copy Of The IAM Online Handwriting Database?

Here is the dataset link:

https://fki.tic.heia-fr.ch/databases/iam-on-line-handwriting-databaseIt

It seems their verification system to get access to the database may be outdated, as it doesn’t send verification emails for new accounts anymore, I was wondering if anyone had a copy of the full dataset and was willing to send it? Or, had an account that still had access to the database?

Thanks

submitted by /u/AdEmbarrassed1605
[link] [comments]

Need Help With Luminate Television Viewership Data

https://variety.com/h/most-watched-streaming-originals-movies-tv-shows/

I require some assistance. Since this page kept updating every week. And their weekly report page is no longer include previously min watched. Some of the data is no longer available online. Wayback and Archive.

This is important due to how Luminate begin their weekly period which differed from Nielsen and Netflix. I think it is a terrible idea. I feel like a third to half of the time. A show began a day or two in their time period. Those 1 to 2 days are usually the highest individual day views. Not enough to showed up on the top 10, but way too significant to not include. This is why the previous min watched is important, since it does included views even if it doesn’t make the top 10.

I am missing (previous min watched) data from

May 10-16, May 17-23, June 14 – June 20

July 12 – July 18, July 19 – July 25, July 26 – August 1, August 2 – 8

August 16 – 22, August 23 – 29, Aug. 30-Sept. 5

I had send email to the Variety article writer that usually cover the weekly rating. But I am not certain if she going to respond. I would love some help from the internet.

submitted by /u/wu_kong_1
[link] [comments]

Scraping Techpowerup.com CPU Database For School Project – Advice

Hi all,
this semester in school i decided to take up Information Retrieval course, where the semestral project includes making our own web scraper on a given topic. I decided to use Techpowerup.com as I am into PC components. I made a scraper in Go, however I have found very aggressive limits on the site that I would like advice on how to pass them. Currently, I have implemented thse precautions:

Random user agent from list of 5 for each request (even the retries) Exponential increase of time after each 429 Random jitter of 0-10 sec in addition to the exponential timeout

Currently, it seems like i am able to get 26 results and no more.

If needed, i am able to post the whole code, but dont want to spam the post if not needed.
Any suggestions please? I am able to switch the sites, however I would like to stay in the topic of PC components (can be another component though) as this has been assiged to me already by the teacher.
Sorry if the post is not up to standards of this reddit, this is my first reddit post here.
Thanks all for suggestions!

submitted by /u/Clean-Culture7563
[link] [comments]

Seeking Dataset Of Public Spitting And Littering Images For AI Model Training On Cleanliness

I’m working on an AI project focused on improving public cleanliness by identifying key behaviors such as spitting and littering. I’m in search of a dataset containing images of spitting in public places, as well as littering incidents, with accompanying descriptions of the scenes. These datasets will help in training the AI model to detect and address these issues more effectively.

If you have any relevant resources or datasets or know where I can find them, I’d greatly appreciate your support!

Thanks in advance for your help!

submitted by /u/candy_one8
[link] [comments]

Looking For Soil Physical And Chemical Property Dataset Sources

Hello guys please help a thesis girlie :> I have a concept: Real Time Soil Quality Assessment for Coffee Farms using ResNet50 for my thesis project. I have a problem in searching for some datasets for this concept and I need help since I need some sources for this. Anyone here who has some access or know any sources for the mentioned datasets ? Need it for my thesis about soil quality assessment :<< Any help is appreciated thank you!!!

submitted by /u/smg_nabi
[link] [comments]

Looking For Medical Malpractice Data

Does anyone know of way to get data on incidents of medical malpractice or medical board disciplines? I am aware of this tool: https://www.npdb.hrsa.gov/faqs/puf1.jsp

However this is aggregated at the state level. I know some states allow you to look this information up if you know a doctors name (Oregon: https://www.oregon.gov/omb/investigations/pages/malpractice-claim-information.aspx), but I am struggling to find a source that gives this information for all doctors in a state.

I’m interested in any states or sources that might make this type of data possible to obtain. Thanks!

submitted by /u/jyddyj20
[link] [comments]

Self Hosted Dataset Registry/browser

Hi all,

I’ve been looking for a solution to set up a dataset browser, e.g. something like https://huggingface.co/datasets, so that our teams can browse existing datasets (their metadata at least).

due to constraints, we would need something that we can self host without sharing any of our information on any platforms on the open web, preferably an out of the box app or a framework where we could quickly create a “browser”; something that we could use freely…

any suggestions?

many thanks in advance!

submitted by /u/met4xa
[link] [comments]