Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Help Needed: Looking For Crime Scene Datasets For A Crime Scene Reconstruction Project 🚔🔍

Hi everyone!

I’m part of a team working on a capstone project focused on crime scene reconstruction and analysis using machine learning and 3D simulations(blender/unity )

What We’re Doing: 3D Crime Scene Reconstruction: Creating an interactive model that lets investigators explore and “rewind” scenes to see potential sequences of events (e.g., weapon use, bullet trajectories).

Simulated Evidence Analysis: Replaying crime scenes based on data to visualize how evidence like blood spatter patterns or object placements might have occurred

We’re specifically looking for datasets that contain information related to crime scenes, including data on:

Crime types (especially homicide) Evidence details (e.g., weapon type, trajectory info, blood spatter)

If anyone has worked on a similar project before or knows where we can find reliable and detailed crime scene datasets, we’d greatly appreciate any guidance! We’re especially curious if there’s any open-source or academic dataset available, or if there are any other resources that might be useful for this type of project.

Also any other help related to any aspect of this project will be appreciated and is needed

Thanks in advance for any help, suggestions, or shared experiences!

submitted by /u/AdSquare9152
[link] [comments]

BEA Archive Data Availability Issues

Greetings! I am currently conducting research on the US. To start the analysis I require data from BEA that dates back to 1990s (specifically 1997, when the NAICS has been introduced). I am pretty new to the BEA website, so I may be lost. The data I need is county-level. When I head to the archive for GDP by county and metro level, the only data that’s available dates back to 2017. Maybe I am doing something wrong? Where can I find older data for county and metro? I may need other county level data from other categories on the website. Maybe there is a website like nhgis but for BEA data?

submitted by /u/tasyaaaaa
[link] [comments]

France Inflation Data (per Department, Index Type, Index Variation, Household, And Product Type)

Hi!

I struggled a lot to find the inflation data for France from an official source. I either found articles from INSEE (National Institute for Statistics and Economic Studies) on the inflation for each month which had a link for that data, and even that was only a subset of all the data for that month. Or I found auxiliary websites that didn’t cite the source for their data.

I also looked for official APIs but didn’t find something that directly provided the consumption index (inflation index) or a preprocessing of it (year-over-year variation for example). But I stumbled randomly on this https://www.insee.fr/fr/statistiques/series/102342213 (it’s an official source, it’s the INSEE) for which the title might be confusing. The title suggests that the data there is grouped by products and detailed products (a special nomenclature named COICOP).

I preprocessed it here https://github.com/ReinforcedKnowledge/france-inflation-data-cleaned (includes raw data, preprocessing scripts and preprocessed data). The README is in French but it explains the data a bit and explains how I got granular datasets from that big raw data. I found it a bit messy and confusing at the beginning when I started looking at it, but I was able to extract every unique combination of the modalities (region/department, index type, index variation, if product is under the COICOP nomenclature, household type).

I hope it can help if someone is looking for that data or understand it because it really took me some time and effort to find it and make sense of it.

submitted by /u/ReinforcedKnowledge
[link] [comments]

Regression And Classification Datasets

Hello everyone, I am currently in a class at the moment that requires me to use a classification dataset and a regression dataset that is not from the UCI ML repository and I want to do my project about something in the social sciences (I have a poli sci background) however I’ve been struggling to find datasets that align with what I’m looking for. Does anyone have good recs for places to look for the kind of datasets I wan?

submitted by /u/jeanxette
[link] [comments]

Are There Any Recipe Datasets For Commercial Use?

I’m looking for a dataset/database of good quality (NO Al) food recipes with PICTURES that go alongside with instruction steps for commercial use. I would like to use it in an app l’m creating.

I don’t mind paying for it- preferably one time payment, rather than a subscription.

I would have to translate the instructions anyway, so what l’m really worried about are the pictures because of the copyright issues.

And NO APIs, I want to store the database locally.

Thank you

submitted by /u/3prisms
[link] [comments]

Are There Any Open Source Recipe Datasets For Commercial Use?

I’m looking for a dataset/database of good quality (NO AI) food recipes with PICTURES that go alongside with instruction steps, for commercial use. I would like to use it in an app I’m creating.

I don’t mind paying for it- preferably one time payment, rather than a subscription type of thing.

I would have to translate the instructions anyway, so what I’m really worried about are the pictures because of the copyright issues.

And NO APIs, I want to store the database locally.

Thank you

submitted by /u/AdministrativePie300
[link] [comments]

Can You Suggest An (AI) Tool That Can Read A Spreadsheet And Produce A Summary Word/pdf Document That Summarizes The Data Into Formatted Text, Table, And Figures?

I’m trying to figure out how to essentially automate the production of monthly data report with nice clean visuals and written summaries based off of the excel spreadsheets that are provided. I’m not sure if chatgpt is best for this, or another AI tool, or some combination of a python code and something else. Any advice would be appreciated!

submitted by /u/Arfusman
[link] [comments]

A Tool To Create Datasets From Research Papers Using Augmented LLMs– Would This Be Helpful?

I’ve developed a program that uses multiple language models that talk to each other to create databases from scientific papers. I’m looking to use it to build custom datasets for medicinal neural networks. I’m considering deploying it as a website to see if it could be useful for others, but I’m looking for input on how to make it more robust and accessible for broader use.

For those with experience in dataset creation, AI applications in medicine, or similar fields, what features or improvements would make this tool more valuable or realistic for researchers and practitioners? Any insights would be greatly appreciated!

submitted by /u/chiralneuron
[link] [comments]

Pitchbook Access Request Help Please

Hello everyone. I’m an undergrad student currently conducting a thesis related to VC-funded firms. I found that Pitchbook may have lots of information (financials) that I need for my paper, but it’s really pricey. Wanting to see if there is anyone in the community who can share access with me or pull the data for free 😅 This would really help me kickstart my research. Help this broke student graduate

submitted by /u/Apprehensive_Stick55
[link] [comments]

Is News APIs Usage Legal And Reliable?

I need some source of information for a data science project (academic research). Specifically, I need to retrieve an historical record of news about certain topic so I am thinking of using a news API instead of web scraping because these APIs seem to return the kind of data I am searching for.

I’ve came upon some of them such as newsdata.io, newsapi.org and newsapi.ai, but I am wondering if its usage is legal and realiable? I mean, are they legal themselves? And if so, am I inherently allowed to use them for my personal (academic) purposes?

Term & Conditions say this:

“We don’t have the right to authorise any user to use the data for their personal and professional purposes. However, the users can use the data for their personal or professional purposes”

I mean, should I have any concern about this? It’s not like Twitter or Reddit’s API where data belongs to them and they deliberately give it to you. (In fact, I’m asking this because I planned to extract data from these platforms but I’ve just realized it’s just not possible at all so I am wondering if there’s another alternative I can use to meet my requirment)

Well… in essence, my questions are: Are these platforms/tools (APIs) legitimate and meant for data science? or, in other words: is it a common/familiar practice to use these kind of “news APIs” for data science?

I didn’t even knew them. Have you ever tried them before? Should I do web scraping instead or can you see another alternative you could advise me to use?

I’d appreciate your help.

submitted by /u/Sondre_BJ
[link] [comments]

Source Of Historical Company Profiles By Date?

It’s easy to find API’s that return the current “company profile” that includes fields such as:

{ “symbol”: “AAPL”, “price”: 231.41, “beta”: 1.239, “companyName”: “Apple Inc.”, “cik”: “0000320193”, “exchangeShortName”: “NASDAQ”, “industry”: “Consumer Electronics”, “website”: “https://www.apple.com”,

But I’m looking to compare the current profiles to historical profiles. Is there a source to get the historical profiles info by date?

submitted by /u/vicegripper
[link] [comments]

European Cities Population Data Set.

Hello, I’m making a ML algorithm that uses a city infrastructure as features and want to predict its populations.
With OSM library I was able to easly extract the infrastructure data, however I am not able to find a data set with enough european cities. So far all data sets I’ve encontered only contain data from 50-80 european cities and the rest is Asian cities.

I’ve tried to use Population density and city area to create the data set for population my self but the numbers I got were terribly wrong.

If someone has any idea of how to get this data I would love the help.

submitted by /u/Top_Hyena1923
[link] [comments]

Mortgage Loan Application Data Sample For A Scorecard

I’m planning on making an application scorecard for home loans as my bachelor thesis for University.

One of my(along with my academic supervisor’s) main concern is having a reliable dataset or rather the dataset being from a reliable source. One of the big questions that I’m going to be potentially challenged on in such a thesis is the dataset’s reliability so it can’t be from somewhere like Kaggle, but for a example somewhere like Experian/Equifax would be okay. I work at a bank and deal with such models but unfortunately I can’t use any company data (even if it gets anonymized). So far I’ve seen some promising stuff in FFIEC’s website but would like some additional sources so I can make a more educated decision

Roughly I would need the data to contain these fields:

Age

Job

Income

Education

Marriage Status

Information about previous defaults ( something like a Y/N if the applicant has defaulted on a loan in the last 5 years for example)

Type of property that would be purchased with the loan

Some other fields that I could potentially exclude in further analysis

submitted by /u/JesusBreakdancing
[link] [comments]

Seeking VO2max Test Data For Research Training

Hello everyone!

I’m a researcher-in-training working on exercise physiology, and I’m currently looking for datasets on VO2max or incremental exercise tests that include VO2 and, ideally, blood lactate measures. My goal is to practice determining ventilatory and lactate thresholds to refine my analytical skills in these areas.

If you have access to any anonymized data or know of open-source datasets, I’d be very grateful for any pointers! I’ve checked platforms like OSF and PhysioNet but haven’t found exactly what I need, so any help would be highly appreciated.

Thank you in advance!

submitted by /u/jeanpauleti
[link] [comments]