Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

LinkedIn Dataset – Exploring Career Paths, Educational Backgrounds (How To Obtain?)

Hello All,

As the title suggests, I am looking for a way to get data on specific career paths, and what background/years of experience individuals had to get them there.

Data I will need:

All individuals in US who held positions at target firms (see below for list) in last 10 years. All companies (past & present) All positions held + length of time Educational background and dates

Target is individuals who currently hold or in the past held Associate, Engagement Manager, Associate Partner, or above positions at the MBB firms:

McKinsey Boston Consulting Group Bain & Co

Purpose: Decide on where to get my MBA (online) in order to maximize my chance enter these firms within a given timeframe.

Intended Analysis Methods: Determine % of individuals who attended Ivy league, vs top 25, vs other schools, % of individuals with MBAs. Determine breakdown by industry background. Determine distribution for years of experience under two conditions – entering at that level and rising to that level from within.

Also, will need to do the same thing for Tech (M7 companies, Nvidia, Tesla, Microsoft, Google, Apple, Meta, Amazon). Would also like to cross check and see how many from consulting ended up in Tech.

From what I can tell, there are a few ways I can do this:

Write code accessing the LinkedIn API and figure out the limitations. Purchase software that will scrape for me through my account. Pay for another company to scrape the data for me. Pay for an existing data set. Find a free publicly available dataset.

Any help would be greatly appreciated.

submitted by /u/typeIIcivilization
[link] [comments]

Access To Crunchbase Data For Master’s Thesis

I know it’s been already asked several times, but does anyone still have access to a CrunchBase Pro subscription by any chance? I’d like to download a dataset about ClimateTech startups (including funding, investors, number of employees, etc.) on the EU or global level for my master’s thesis, but I can’t do it without the subscription. I already applied for the academic research access but haven’t received any response and don’t hold out much hope of receiving it. Or perhaps you know someone who conducts research on startups or works in this industry and might have access to this data? I would greatly appreciate any help with that!

submitted by /u/lurekude
[link] [comments]

Dataset Search Help: Nuclear Power Policy And Help On Assignment

I am looking for a dataset to aid in my research for a school project. I have checked Datasets Gov and Kraggle. To no avail nothing seems to fit what im looking for. My guiding question for this statistics project is: How does a state’s political affiliation affect the nuclear power restrictions?

I figured Id analyze the relationship between more conservative states and how lax/stringent they are on nuclear policy versus liberal states. Any help is appreciated. Thanks.

submitted by /u/AirborneRatScabies
[link] [comments]

Dietitian Appointments Dataset, Including Whether They Showed Up Or No-showed/last Minute Cancelation.

I am currently a MSDS student. I am trying to come up with an idea for my capstone project.

My wife is a Registered Dietitian and her biggest annoyance is the amount of no-shows and last minute cancelations she gets. I want to try to build a model predicting the likelihood of a patient keeping their appointment. Presumably based on features like payment method (out-of-pocket vs insurance), age, reason for appointment, etc.

Where can I get such data?

If I can’t find data for Dietitians specifically, I would look into other medical practitioners, so those leads would be appreciated as well.

submitted by /u/Sheng25
[link] [comments]

My Scraping Of PGT High Roller Poker Tournament Data From 2015-2024

PGT Poker High Roller Tournament Data 2015-2024

This is my very first time trying to create a database. I decided to use information that I’ve been recording for awhile now on the Poker High Stakes Tournament scene. One of the guarded secrets in poker is how much a player is in profit for his career. They’ll gladly post x player has nine million in earnings, but its very possible that the player is also in the negative when you subtract buyins.

I figured out years ago, news reporting sites will gladly give you all the information on who played and busted a poker tournament in their chip counts report. If you click on chip counts and day 1 you can get the full list of players. https://www.pgt.com/live-reporting/pokergo-cup-2024/event-2-10100-nolimit-holdem So what I did was compaired the list of players who played an event with the list of players who also recived a payout, and then subtracted the buyin from their payout.

There were really two limitations with this. One is we don’t know how many times a player rebought into an event. In most of these tournaments a player if they busted out can rebuy into the event. We know that 61 players were reported to have played the event, but also by the info in the table uptop that 89 players had played, so 28 had rebought in. This information makes since, when we look at other tournaments like Pot Limit Omaha events the rebuy counts are higher and games like Short Deck Hold’em which is known to have even higher rebuying. Problem is we don’t know exactly who. The second issue is the rake. The rake in a poker tournament is a fee the casino takes out of the buyin to help pay for costs. PGT is unique in that if you show up on time when the event starts, you get to play rake fee. The problem there is we don’t know who showed up early or late. Rake can have a huge effect on a persons win rate. Sadly we can’t accurately calculate it.

So if you have any interest in poker have a look. Now one thing I did do was to give each player a #ID instead of using their real name. I didn’t want to public shame anyway by putting out how much they lost.

submitted by /u/thriftbin
[link] [comments]

Request For An MVQG (Multi Visual Question Generation) Dataset

Tell me if there is a way to access datasets for Multi visual question generation

The requirement for the data is :

Inputs will be a sequence of images that carry a context altogether

Outputs will be one or a few questions generated regarding the context of the images

I am tagging the research paper related to the problem statement here .

The authors of the paper use VIST ( Visual Storytelling) dataset. It is public but it’s drive link is not permitted for everyone to access images. I tried mailing the authors but they didn’t respond in last 7 days.

https://visionandlanguage.net/VIST/

submitted by /u/voidmain137
[link] [comments]

Dataset For Ontology Alignment Tasks

I am currently working for my master thesis with topic of ontology alignment.

I have developed a tool to be used for the alignment of ontologies. However I lack from available datasets that contain ground truth, for example a reference.rdf that contains the possible matchings between classes.

Are there any available online? I have tried from the OAEI but still, are there any available from somewhere else?

Thank you in advance!

submitted by /u/Costas_8
[link] [comments]

Looking For A List Of Good Datasets About Wildlife

I have been assigned to search for training data related to wildlife. To company would like to create an extra service for their outdoor cameras. Usually where do you search for datasets without much specific need just a category? I am also generally curious how do you gather new datasets, because I find it pretty hard to.

submitted by /u/matteohorvath
[link] [comments]

Data With Voting Outcomes And Answers To Informal Polling Questions

Hello everyone, I came across this project on Kaggle and I find it really fascinating.

https://www.kaggle.com/competitions/can-we-predict-voting-outcomes/overview

However, I’d like to try tackling this challenge with a different dataset. I’ve searched everywhere but can’t seem to find similar data (aside from this one).

Thank you a thousand times over. It would be a lifesaver.

submitted by /u/AntoineMagnin
[link] [comments]

Does This Schema Look Normalized To 3NF? Need Help With Class Project.

MATCH(MatchID, Season, MatchDate, MatchStartTime, Field#, Park, HomeTeamID, AwayTeamID, RefereeName, MatchScore)

FIELD(Field#, Park, FieldName)

PLAYER(PlayerID, JerseyNumber, PlayerFirstName, PlayerLastName, PlayerGender, PlayerAge, TeamID, Position, CaptainStatus)

PLAYERSTATS(MatchID, PlayerID, MatchDate, MatchStartTime, HomeTeamName, AwayTeamName, JerseyNumber, PlayerName, Goals, Assists, PossessionPercent, PassCount, PassingChain#)

TEAM(TeamID, TeamName, CoachID, AssistantCoachID, SponsorID)

COACH(CoachID, CoachFirstName, CoachLastName, TeamID, CoachAge, CoachGender, CoachRole)

SPONSOR(SponsorID, SponsorName, SponsorEmail, SponsorAddress)

submitted by /u/volkxx
[link] [comments]

Video Dataset For Abnormal Event Detection In Bank ATM?

I am doing a project in abnormal event detection in ATM counters. For training purposes, I need videos from people behaving ‘normally’ in ATM counters and from people showing abnormal behavior.

With ATM counter I mean a (small) room with one or more ATM machines built in a wall.

Normal event: A person walks into the room, puts his card into the ATM, enters pincode, retrieves his card, take his money, maybe a receipt, then leaves.

Abnormal events: Someone hitting the ATM, attacking and robbing a customer at an ATM, fiddling with the machine in unexpected ways, taking photographs inside, etc.
Thank you so much!

submitted by /u/ThreshLaSquale
[link] [comments]

For Those Looking For A Mock Data Generator! [self-promotion]

If you guys need a mock data generator, me and my team got you covered!

Our product core features are:

Ouptut support for json, yaml, psql, sql, and xml (with more formats and application support coming soon) Code gen for various languages with language specific settings (rust, typescript, go, dart, c, c++, c#, java, swift, protobuf (syntax3) more on the roadmap) Nested object generation, with array, and null controls Seeded generation possible too for reproducible results

Let me know what you guys think, or if you want us to add more features

Try it out here https://www.dataconstruct.io/organizations/playground/schemas

No sign up required!

submitted by /u/originalchuan
[link] [comments]