Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Best Way To Learn About Data Analytics

Hi, I’m graduating this year I’ve good grip on sql,python and all computer science fundamentals I’ve also made two projects with power bi using already available ready to use datasets. I wanted to get into data engineering but I’ve heard from many people data engineering is not beginners role I need to start as a data analyst. If it’s correct. Which certification is best for learning about data analytics google, ibm, or Microsoft. I know the best way is to learn by making projects but I think in job interviews they ask about tools and techniques in depth so that’s why preferring certification or course. Regards

submitted by /u/Parking-Sun-8979
[link] [comments]

Dataset Of US Weather Across 15 US Cities, First Three Months Of 2024 And 2023. Max Temp And Precipitation Counts. Would Anyone Have A Best Rec?

Howdy folks,

Im looking for a data set to comprise of about 15 US cities or so, and looking for max temperature and precipitation measurements for the first three months of 2023 and 2024. I know I can use https://www.ncei.noaa.gov/, but its a pain in the rear end to try to go city by city and then extract em all out one by one, year over year and then synthensize and transform 15 or 30 more sets altogether.

Would anyone know if this currently exists somewhere in a CSV format possibly?

submitted by /u/WhatsTheAnswerDude
[link] [comments]

Dataset With # Of Employees For US Healthcare Facilities?

For my research I’m looking for a database that has the # of employees at each healthcare facility in the US. I’ve been using the CMS healthcare facilities dataset through HRSA, but unfortunately it doesn’t seem to have data for all facilities. Any suggestions on other database that may be helpful?
I’m also looking for a data on number of in & outpatient visits for each healthcare facility in the US, and would appreciate suggestion for that as well.

submitted by /u/Dapper_Willow731
[link] [comments]

LinkedIn Dataset – Exploring Career Paths, Educational Backgrounds (How To Obtain?)

Hello All,

As the title suggests, I am looking for a way to get data on specific career paths, and what background/years of experience individuals had to get them there.

Data I will need:

All individuals in US who held positions at target firms (see below for list) in last 10 years. All companies (past & present) All positions held + length of time Educational background and dates

Target is individuals who currently hold or in the past held Associate, Engagement Manager, Associate Partner, or above positions at the MBB firms:

McKinsey Boston Consulting Group Bain & Co

Purpose: Decide on where to get my MBA (online) in order to maximize my chance enter these firms within a given timeframe.

Intended Analysis Methods: Determine % of individuals who attended Ivy league, vs top 25, vs other schools, % of individuals with MBAs. Determine breakdown by industry background. Determine distribution for years of experience under two conditions – entering at that level and rising to that level from within.

Also, will need to do the same thing for Tech (M7 companies, Nvidia, Tesla, Microsoft, Google, Apple, Meta, Amazon). Would also like to cross check and see how many from consulting ended up in Tech.

From what I can tell, there are a few ways I can do this:

Write code accessing the LinkedIn API and figure out the limitations. Purchase software that will scrape for me through my account. Pay for another company to scrape the data for me. Pay for an existing data set. Find a free publicly available dataset.

Any help would be greatly appreciated.

submitted by /u/typeIIcivilization
[link] [comments]

Access To Crunchbase Data For Master’s Thesis

I know it’s been already asked several times, but does anyone still have access to a CrunchBase Pro subscription by any chance? I’d like to download a dataset about ClimateTech startups (including funding, investors, number of employees, etc.) on the EU or global level for my master’s thesis, but I can’t do it without the subscription. I already applied for the academic research access but haven’t received any response and don’t hold out much hope of receiving it. Or perhaps you know someone who conducts research on startups or works in this industry and might have access to this data? I would greatly appreciate any help with that!

submitted by /u/lurekude
[link] [comments]

Dataset Search Help: Nuclear Power Policy And Help On Assignment

I am looking for a dataset to aid in my research for a school project. I have checked Datasets Gov and Kraggle. To no avail nothing seems to fit what im looking for. My guiding question for this statistics project is: How does a state’s political affiliation affect the nuclear power restrictions?

I figured Id analyze the relationship between more conservative states and how lax/stringent they are on nuclear policy versus liberal states. Any help is appreciated. Thanks.

submitted by /u/AirborneRatScabies
[link] [comments]

Dietitian Appointments Dataset, Including Whether They Showed Up Or No-showed/last Minute Cancelation.

I am currently a MSDS student. I am trying to come up with an idea for my capstone project.

My wife is a Registered Dietitian and her biggest annoyance is the amount of no-shows and last minute cancelations she gets. I want to try to build a model predicting the likelihood of a patient keeping their appointment. Presumably based on features like payment method (out-of-pocket vs insurance), age, reason for appointment, etc.

Where can I get such data?

If I can’t find data for Dietitians specifically, I would look into other medical practitioners, so those leads would be appreciated as well.

submitted by /u/Sheng25
[link] [comments]

My Scraping Of PGT High Roller Poker Tournament Data From 2015-2024

PGT Poker High Roller Tournament Data 2015-2024

This is my very first time trying to create a database. I decided to use information that I’ve been recording for awhile now on the Poker High Stakes Tournament scene. One of the guarded secrets in poker is how much a player is in profit for his career. They’ll gladly post x player has nine million in earnings, but its very possible that the player is also in the negative when you subtract buyins.

I figured out years ago, news reporting sites will gladly give you all the information on who played and busted a poker tournament in their chip counts report. If you click on chip counts and day 1 you can get the full list of players. https://www.pgt.com/live-reporting/pokergo-cup-2024/event-2-10100-nolimit-holdem So what I did was compaired the list of players who played an event with the list of players who also recived a payout, and then subtracted the buyin from their payout.

There were really two limitations with this. One is we don’t know how many times a player rebought into an event. In most of these tournaments a player if they busted out can rebuy into the event. We know that 61 players were reported to have played the event, but also by the info in the table uptop that 89 players had played, so 28 had rebought in. This information makes since, when we look at other tournaments like Pot Limit Omaha events the rebuy counts are higher and games like Short Deck Hold’em which is known to have even higher rebuying. Problem is we don’t know exactly who. The second issue is the rake. The rake in a poker tournament is a fee the casino takes out of the buyin to help pay for costs. PGT is unique in that if you show up on time when the event starts, you get to play rake fee. The problem there is we don’t know who showed up early or late. Rake can have a huge effect on a persons win rate. Sadly we can’t accurately calculate it.

So if you have any interest in poker have a look. Now one thing I did do was to give each player a #ID instead of using their real name. I didn’t want to public shame anyway by putting out how much they lost.

submitted by /u/thriftbin
[link] [comments]

Request For An MVQG (Multi Visual Question Generation) Dataset

Tell me if there is a way to access datasets for Multi visual question generation

The requirement for the data is :

Inputs will be a sequence of images that carry a context altogether

Outputs will be one or a few questions generated regarding the context of the images

I am tagging the research paper related to the problem statement here .

The authors of the paper use VIST ( Visual Storytelling) dataset. It is public but it’s drive link is not permitted for everyone to access images. I tried mailing the authors but they didn’t respond in last 7 days.

https://visionandlanguage.net/VIST/

submitted by /u/voidmain137
[link] [comments]

Dataset For Ontology Alignment Tasks

I am currently working for my master thesis with topic of ontology alignment.

I have developed a tool to be used for the alignment of ontologies. However I lack from available datasets that contain ground truth, for example a reference.rdf that contains the possible matchings between classes.

Are there any available online? I have tried from the OAEI but still, are there any available from somewhere else?

Thank you in advance!

submitted by /u/Costas_8
[link] [comments]

Looking For A List Of Good Datasets About Wildlife

I have been assigned to search for training data related to wildlife. To company would like to create an extra service for their outdoor cameras. Usually where do you search for datasets without much specific need just a category? I am also generally curious how do you gather new datasets, because I find it pretty hard to.

submitted by /u/matteohorvath
[link] [comments]

Data With Voting Outcomes And Answers To Informal Polling Questions

Hello everyone, I came across this project on Kaggle and I find it really fascinating.

https://www.kaggle.com/competitions/can-we-predict-voting-outcomes/overview

However, I’d like to try tackling this challenge with a different dataset. I’ve searched everywhere but can’t seem to find similar data (aside from this one).

Thank you a thousand times over. It would be a lifesaver.

submitted by /u/AntoineMagnin
[link] [comments]