Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

How To Legally Find Dataset Of Doctors’ Treatment Histories?

I am currently working on a project which would require having the history of conditions treated by a doctor (easy to quantify) or their qualifications / research contributions (hard to quantify, pain to work with). I looked into things like OpenMRS and EMRBots but am pretty sure that they are simulated.

Where could I find a giant repository of these types of real but anonymised “health records” without committing a crime?

submitted by /u/Ok-Program-3656
[link] [comments]

Help Download The ASVSpoof2017 Dataset

So I live in South East Asia (I am assuming this is the root of the problem) and downloading from the Edinburgh DataShare website is nearly impossible. In my case, there was once where I was able to reach 850mb out of 1000mb on the evaluation dataset. Unfortunately, my internet died for a second and when i resumed the download, it resets back from the beginning. Yes 1000mb is small but the download speed is in kbps. I tried downloading it for the entire day and now it failed.

Here’s the link https://datashare.ed.ac.uk/handle/10283/3055

So i want to know whether someone has a mirror link for it or a way i can download it faster. That’s all from me. Thanks.

Oh and also do tell me if you think that i need to go through a formal procedure for it. I did ask about this to the informarion services of the university of Edinburgh but have yet to get a reply. Once again, thank you.

submitted by /u/Puzzleheaded-Path306
[link] [comments]

259k – LLM Unity3d API Dataset For Fine-Tuning

Hey everyone just uploaded a 259k dataset for unity. It’s not a coding dataset but rather a dataset to teach the model about unity’s API properties. It took me 3 days to create with 6 instances of llama3 8B exl2. I have trained a model on the dataset and works very well. It does cause the model to hallucinate so you might have to play with the fine tuning hyper parameters and possibly align the model after. Enjoy

submitted by /u/Delicious-Farmer-234
[link] [comments]

I Would Like To Know About Commercials/advertisements On Cable News Prime Time Hours (7p-12a).

One of the things over the last several months that occurred to me was the sheer volume and type of commercials aired during cable news programs between segments. I’d like to know the odds of 1) landing on a commercial/ad, and as a bonus, 2) the odds of that commercial/ad being one of healthcare relevance (prescription meds, supplements, insurance/medicare/medicaid, etc., things targeted at seniors, for the most part).

submitted by /u/johnnybiggles
[link] [comments]

Ensuring Accurate School Account Setup: Resolving Missing Unique Reference Numbers And Preventing Future Errors

Can someone answer this for me, I’m currently learning how to best resolve issues in data-setups: “You have been working through account set up on the platform for a number of schools. After completing the set up for the majority of schools you realise you have forgotten to manually enter each school’s unique reference number. This creates the risk that some school users may have been assigned to the wrong school. What steps would you take to resolve this issue, and ensure it didn’t happen again?”

submitted by /u/OrderOnly8503
[link] [comments]

Language Lists – Blacklisted Words, Male & Female First Names, Common Surnames, & More

List of Vulgarity – each word / term is separated by a newline.

List of First Names – CSV file with fields name, gender, probability where gender is represented with either M or F with respective probability for gender accuracy.

List of Surnames – CSV file with the following fields:

name – surname / last name rank – national rank based on commonality count – number of people with the last name prop100k – proportion per 100,000 population for name cum_prop100k – same as above except cumulative proportion pctwhite – percent white pctblack – percent black or african american pctapi – percent asian, native hawaiian, and pacific islander. pctaian – percent american indian and Alaska native pct2prace – percent mix of two or more races pcthispanic – percent hispanic or latino

submitted by /u/JTrexler
[link] [comments]

Healthcare Mergers And Acquisitions

I’m trying to understand acquisitions and mergers in healthcare and ownership data; so far the resources I’m looking for which have some leads include a NYS DoH and CMS database. I also found a ‘Hospitalogy’ toolkit, but it seems to be more about data visualization and it’s only accessible by paying $250.

Does anyone know any open-source data that tracks this kind of info?

submitted by /u/wasacarpenter
[link] [comments]

Where Is The Spotify Sequential Skip Prediction Dataset?

Hi everyone,

I’m on the hunt for the Spotify Sequential Skip Prediction Challenge dataset. This dataset was part of a competition organized by Spotify, WSDM, and CrowdAI and focused on predicting whether users would skip or listen to the tracks they’re streamed. Unfortunately, it seems the dataset is no longer available on the official link.

Here’s a bit of background about the challenge and dataset:

Organizer: Spotify, WSDM, CrowdAI Dataset Size: Public part – ~130 million listening sessions; Challenge leaderboard – ~30 million listening sessions Features: User interactions, track metadata, acoustic features, etc. Task: Predict if users will skip tracks based on their session history Challenge Details: Challenge Overview

The dataset is crucial for my work on developing a recommender system for my start up.

If anyone has access to this dataset or knows where I can obtain it, I would greatly appreciate your help. This dataset would be incredibly beneficial for my research and development in the field of music recommender systems.

For more details on the challenge and dataset, here’s an overview page.

Thank you in advance!

submitted by /u/Elpiramidonus
[link] [comments]

Directory / Dataset Of Landscape Webcams?

I am looking for datasets / directories of webcams, mainly focused on landscape, cities, etc., not private (streaming/gaming) cams. Ideally this dataset would contain both the page where the image is embedded as well as the image url itself. Does anyone know where I can find this?

submitted by /u/j0nes2k
[link] [comments]

Help Finding Certain Data By Address

No idea where to post this, sorry!

I have a list of about 5,000 addresses. For each one, I want to know the census tract, the voting districts, the region (as defined by my city), and maybe more later on.

How can I set something up where I can match my list of addresses with a list of all addresses in my state (Ohio), cross-reference all of that other data, and have all of that information spit out for each address for me?

Really any way to make this process faster would be appreciated. I’ve found some files online from various government agencies but I’m not sure if they are all relevant or useful. What kind of file types am I looking for? I have some maps overlayed in Google Earth so I can look up addresses and find the information that way, but I’m not doing it one by one. I chatted with my IT guy but he’s part time and didn’t have any standout ideas at the time.

Thank you!

submitted by /u/countesscranberry
[link] [comments]

Does Anyone Know, Where To Find Lactate Test Datasets?

Hi everyone,

I’m seeking for some datasets about lactate tests. They should ideally include the following information: Lactate levels (preferably continuous or regular measurements), Heart rate, Respiratory rate, Other relevant physiological parameters (blood pressure, temperature, etc.), Contextual data (e.g., type of physical activity, duration, intensity)

I’m seeking for a I feel like I went through the whole www, but I just can’t find anything useful.

Does anyone have experience with this topic and can provide tips on where I might find such datasets? Or perhaps someone has access to relevant data and would be willing to share?

I would greatly appreciate any help and guidance Thank you in advance for your support!

Regards, algyier

submitted by /u/Electrical_Present73
[link] [comments]

Does Anybody Know A Place To Download The Rom Graphs Dataset?

Hey, I wanted to ask if someone knows a place the rom graphs dataset can be downloaded from.
I tried to search for it but I only found a german document which cites them.
Using the document I found the paper of the rom graphs “An Experimental Comparison of Four Graph Drawing Algorithms.” ( https://doi.org/10.1016/S0925-7721(96)00005-3 ).
In the paper they mention that the graph dataset can be downloaded from their ftp server ftp://infokit.dis.uniroma1.it/public/ but the domain does not resolve anymore.
So I wondered if anybody knows a place it can still be downloaded from.

submitted by /u/finanzbruch
[link] [comments]

Looking To Share Or Sell A Large Collection Of Stock Prices Stored In MySQL

I have gathered a large set of data that includes the prices of 10,286 different stocks, updated every minute since November 17, 2021. This data is organized and stored using MySQL.

I’m looking for advice on where I might be able to share or sell this data, especially to people who use such information for studying the stock market, building trading software, or conducting research.

Does anyone know of any places or communities where I could do this? Also, if you are interested in talking more about this data and possibly using it together, please let me know!

I’m excited to hear your ideas and talk more about this!

submitted by /u/ScienceNerd2023
[link] [comments]

Cycling Dataset For Different Positions On The Bike

Hey guys, I’m working on a project about cycling and need a dataset to help me out. It should have just two columns: the speed and the average power output of the cyclist maintaining that speed. I want data for different postures on the bike, like the drops or the hoods. Any help would be greatly appreciated! Thanks!

submitted by /u/Anass_Lpro
[link] [comments]

Where Can I ‘sell’ A Potent Dataset?

Hi guys ! Have quite a potent dataset that can be used to further research in the renewable energy sector. The data is from a facility where I’m a stakeholder, so I really don’t wanna put it up for free.

Any leads as to what would be a good website where I can put it up to be used for a small fee?

(Uni student here, so I need the extra income this may generate lol)

submitted by /u/Illustrious_Grass199
[link] [comments]

Need Help Scraping Text From Benefits Websites For AI Project (Python, BeautifulSoup, Selenium)

Hi everyone,

I’m currently taking a course on Python, and I’ve been learning web scraping with BeautifulSoup and Selenium. My situation is a bit unique and time-sensitive, so I’m reaching out to this amazing community for some assistance.

My wife and son are both disabled, and navigating through benefits websites to find the best solutions and information has become quite overwhelming. My goal is to scrape the text from a few key benefits websites and input this data into an AI system to help manage and sift through the information more effectively.

Despite my efforts, I’m still struggling to get the code right. I’m really keen to learn and understand how to do this properly, but given my circumstances, I could really use a bit of a jump start with some working code examples.

If anyone could provide a working script or point me in the right direction, especially using Python with BeautifulSoup or Selenium, I would be incredibly grateful. Here are a couple of specific websites I need to scrape:

https://www.service-public.fr/ However, the main body of content is under the ‘Practical sheets by theme’ drop down if you translate it to English. https://www.aide-sociale.fr/

If it’s easier to share a working code snippet for just one website, that’s perfectly fine too.

Thank you so much for taking the time to read this and for any help you can offer. I really appreciate it!

submitted by /u/myway_thehardway
[link] [comments]

I’m Looking For Koi Fish Dataset At Least 10000 Images

I have this thesis about koi fish counting and classification, the document was accepted, however, I find trouble finding the number of datasets required by our professor for the implementation part. Let’s say around 10,000 images of koi fish would suffice.

I appreciate any help that I can get, since my current dataset only ranges around 1200 which are already classified and annotated which I’ve sourced from Kaggle and Roboflow. Thanks. (I’ll be using YOLOv9 for the model to be trained)

P. S. Don’t mind the link.

submitted by /u/shhty
[link] [comments]

For Anyone Wanting US Weather Observation Station Data

You can find a list of observation station IDs accessible by US NWS API at https://demos.synopticdata.com/meta-lists/#networks

Idk if it’s just me and maybe it is but I had a bit of a hard time trying to find a master list of observation stations and their IDs accessible by the NWS API. I think the link above has most of them.

I only accidentally came across the one from Synoptic.

Not surprisingly I came across a lot of paid services and products but they all get their data from taxpayer funded sources anyway.

If anyone has other sources of free weather APIs or list of observation stations accessible by the NWS API, feel free to comment below. I know MADIS is another source but haven’t checked it out yet.

submitted by /u/Live-Machine4746
[link] [comments]