Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Can You Suggest An (AI) Tool That Can Read A Spreadsheet And Produce A Summary Word/pdf Document That Summarizes The Data Into Formatted Text, Table, And Figures?

I’m trying to figure out how to essentially automate the production of monthly data report with nice clean visuals and written summaries based off of the excel spreadsheets that are provided. I’m not sure if chatgpt is best for this, or another AI tool, or some combination of a python code and something else. Any advice would be appreciated!

submitted by /u/Arfusman
[link] [comments]

A Tool To Create Datasets From Research Papers Using Augmented LLMs– Would This Be Helpful?

I’ve developed a program that uses multiple language models that talk to each other to create databases from scientific papers. I’m looking to use it to build custom datasets for medicinal neural networks. I’m considering deploying it as a website to see if it could be useful for others, but I’m looking for input on how to make it more robust and accessible for broader use.

For those with experience in dataset creation, AI applications in medicine, or similar fields, what features or improvements would make this tool more valuable or realistic for researchers and practitioners? Any insights would be greatly appreciated!

submitted by /u/chiralneuron
[link] [comments]

Pitchbook Access Request Help Please

Hello everyone. I’m an undergrad student currently conducting a thesis related to VC-funded firms. I found that Pitchbook may have lots of information (financials) that I need for my paper, but it’s really pricey. Wanting to see if there is anyone in the community who can share access with me or pull the data for free 😅 This would really help me kickstart my research. Help this broke student graduate

submitted by /u/Apprehensive_Stick55
[link] [comments]

Is News APIs Usage Legal And Reliable?

I need some source of information for a data science project (academic research). Specifically, I need to retrieve an historical record of news about certain topic so I am thinking of using a news API instead of web scraping because these APIs seem to return the kind of data I am searching for.

I’ve came upon some of them such as newsdata.io, newsapi.org and newsapi.ai, but I am wondering if its usage is legal and realiable? I mean, are they legal themselves? And if so, am I inherently allowed to use them for my personal (academic) purposes?

Term & Conditions say this:

“We don’t have the right to authorise any user to use the data for their personal and professional purposes. However, the users can use the data for their personal or professional purposes”

I mean, should I have any concern about this? It’s not like Twitter or Reddit’s API where data belongs to them and they deliberately give it to you. (In fact, I’m asking this because I planned to extract data from these platforms but I’ve just realized it’s just not possible at all so I am wondering if there’s another alternative I can use to meet my requirment)

Well… in essence, my questions are: Are these platforms/tools (APIs) legitimate and meant for data science? or, in other words: is it a common/familiar practice to use these kind of “news APIs” for data science?

I didn’t even knew them. Have you ever tried them before? Should I do web scraping instead or can you see another alternative you could advise me to use?

I’d appreciate your help.

submitted by /u/Sondre_BJ
[link] [comments]

Source Of Historical Company Profiles By Date?

It’s easy to find API’s that return the current “company profile” that includes fields such as:

{ “symbol”: “AAPL”, “price”: 231.41, “beta”: 1.239, “companyName”: “Apple Inc.”, “cik”: “0000320193”, “exchangeShortName”: “NASDAQ”, “industry”: “Consumer Electronics”, “website”: “https://www.apple.com”,

But I’m looking to compare the current profiles to historical profiles. Is there a source to get the historical profiles info by date?

submitted by /u/vicegripper
[link] [comments]

European Cities Population Data Set.

Hello, I’m making a ML algorithm that uses a city infrastructure as features and want to predict its populations.
With OSM library I was able to easly extract the infrastructure data, however I am not able to find a data set with enough european cities. So far all data sets I’ve encontered only contain data from 50-80 european cities and the rest is Asian cities.

I’ve tried to use Population density and city area to create the data set for population my self but the numbers I got were terribly wrong.

If someone has any idea of how to get this data I would love the help.

submitted by /u/Top_Hyena1923
[link] [comments]

Mortgage Loan Application Data Sample For A Scorecard

I’m planning on making an application scorecard for home loans as my bachelor thesis for University.

One of my(along with my academic supervisor’s) main concern is having a reliable dataset or rather the dataset being from a reliable source. One of the big questions that I’m going to be potentially challenged on in such a thesis is the dataset’s reliability so it can’t be from somewhere like Kaggle, but for a example somewhere like Experian/Equifax would be okay. I work at a bank and deal with such models but unfortunately I can’t use any company data (even if it gets anonymized). So far I’ve seen some promising stuff in FFIEC’s website but would like some additional sources so I can make a more educated decision

Roughly I would need the data to contain these fields:

Age

Job

Income

Education

Marriage Status

Information about previous defaults ( something like a Y/N if the applicant has defaulted on a loan in the last 5 years for example)

Type of property that would be purchased with the loan

Some other fields that I could potentially exclude in further analysis

submitted by /u/JesusBreakdancing
[link] [comments]

Seeking VO2max Test Data For Research Training

Hello everyone!

I’m a researcher-in-training working on exercise physiology, and I’m currently looking for datasets on VO2max or incremental exercise tests that include VO2 and, ideally, blood lactate measures. My goal is to practice determining ventilatory and lactate thresholds to refine my analytical skills in these areas.

If you have access to any anonymized data or know of open-source datasets, I’d be very grateful for any pointers! I’ve checked platforms like OSF and PhysioNet but haven’t found exactly what I need, so any help would be highly appreciated.

Thank you in advance!

submitted by /u/jeanpauleti
[link] [comments]

[Urgent] Seeking HIPAA-Compliant PHI Database With Identifiable Health Data

Hi everyone! I’m urgently looking to source a HIPAA-compliant database that includes identifiable PHI (Protected Health Information), such as names and specific diagnosis histories, for a research project with rigorous data protection standards.

I need a reputable third-party vendor experienced in securely handling identifiable health data, with all necessary patient consent and compliance protocols in place. Does anyone know of reliable sources or vendors for acquiring such data legally and ethically? Any insights or recommendations are greatly appreciated—thanks!

submitted by /u/alb53
[link] [comments]

Hi :) I´m Looking For Data On The Amount Of Daily E-scooter Rides In A City (any City Possible) Over One Year.

Hello,

I am currently researching the correlation between weather patterns and the usage of shared mobility services, specifically focusing on e-scooter rides. I am looking for a dataset containing daily e-scooter ride counts in a city (any city) covering at least one year.

Details of the request:

Data Scope: Daily ride counts over a one-year period Primary Interest: E-scooter usage data, though data on bike-sharing or shared car services would also be very helpful for comparison.

Any help or direction to relevant data sources would be greatly appreciated.

Thank you very much in advance for your assistance!

submitted by /u/Ok_Water_9376
[link] [comments]

Looking For A Dataset On Companies That “speak Out”

I’m not sure of the terminology. But I’m attempting to do research surrounding event based studies when companies speak out. And I’ve been banging my head against the wall on this for weeks! 😂

Possibly if it’s on social issues, on political issues, if they comment on a humanitarian crisis or on an international conflict, etc. But I’m having trouble finding any day sets or any proxies that would measure or rank the number of times they “speak out” other than perhaps things like trading volume of the underlying stock or trending on social media.

Is there any datasets you could suggest or point me towards which can help serve as a proxy for companies that stand up and speak out on societal issues?

Thank you kindly for any thoughts!

submitted by /u/Unhappy-Bus-7334
[link] [comments]

Looking For Harry Potter Dataset With Spell Cast Data By Character

Hi guys, just wondering if there are any datasets that include information on each character in harry potter, specifically data on:

each spell casted by every character the number of times each spell was used the target person of each spell (if any) who they killed with each spell (if any)

If a dataset like this exists, or if anyone has suggestions on where I might find similar information, I would really appreciate it. Thanks

submitted by /u/wasbornyesterday1
[link] [comments]

[request] Seeking Dataset For Dynamic Pickup And Delivery Problem (DPDP)

Hi all,

I’m working on a project involving the Dynamic Pickup and Delivery Problem (DPDP) and am searching for any datasets that support dynamic scenarios. Specifically, I’m looking for the ICAPS 2021 dataset for The Dynamic Pickup and Delivery Problem. if anyone has access to this dataset or something similar, I would really appreciate it if you could share it or point me in the right direction to find it.

Thanks a lot for your help!

submitted by /u/husseinelhawary
[link] [comments]

Past 30 Years Badminton Statistics- Historical Tournament Scores

Hi everyone! I believe this is the best community to reach out to. I am currently working on my thesis on using algorithms to create a new badminton ranking system. I would need the past historical matches for the last 30 years (from 1995 to 2024) for this!

Does anyone know where I can go to get this dataset? Do I really need to webscrape all the data one by one from TournamentSoftware? Fan Websites like Badmintonranks, BadmintonStatistics, badmintoncn also do not have any options to export them 🙁 Tried reaching out to the admins but havent been getting any replies for weeks now (which is expected tbh :”).

If anyone have done the webscraping, do you mind sharing the codes with me here as I tried doing that, but I cant seem to get it in a neat and clean format in csv 🙁

Any help and leads would be highly appreciated!

submitted by /u/Sufficient_Bad7829
[link] [comments]

Dataset For Contract Analysis/Verifying Costs And Which Vendor To Keep Utilizing Or Not? Need To Practice For An Interview.

Howdy folks, hope all is well.

Ive been contacted by a local recruiter for a data role, that seems to be oriented around contract analysis. Ill be working with a technology organization thats basically a research consortium (I believe), and Ill have to essentially look through their contracts with organizations and vendors and verify which ones are valuable or which ones arent that good anymore.

Ill have to use tools like SQL, Tableau/Power BI, Microsoft SQL (Studio and SSRS/SSAS/SSIS) and Excel.

Does anyone know a dataset that I could use to do this? Or possibly a good youtube walkthrough of going through a contract analysis dataset possibly? Itd be IMMENSELY helpful!

submitted by /u/WhatsTheAnswerDude
[link] [comments]

Trouble Understanding Sample Trajectory Data

I’m currently undertaking a project in which I aim to predict the trajectory of other road users as recorded from a camera at the top of a car. I am currently looking at the apolloscape sample_trajectory data which is a text document of 9 headings. This data represents the position of vehicles from the images in the second downloadable data (which is 500 images). I want to identify a specific vehicle (which I’m guessing, is represented by a unique ID in the second column) and locate it on the images. However, I am unsure as to what the other columns represent.

Overall I would like to know for sure what each column of the dataset means with respect to the images so I can plot vehicle locations ontop of the images.

https://apolloscape.auto/trajectory.html#to_structure_href

submitted by /u/Hxnter10
[link] [comments]

Requesting United States Population By County/State And By Year

As said in the title. Either level of geographic granularity would be fine and I only need 1996-2024.

I’m sorry if there’s a really simple way of going about this but this is my first project so any help/direction would be greatly appreciated.

I saw that previous posts recommended the census website but I’ve been struggling to navigate the site to pull the data that I want! 🙂

submitted by /u/Velox-Corvum
[link] [comments]