Datasets For Large Language Models: A Comprehensive Survey Of 444 Datasets

submitted by /u/cavedave
[link] [comments]

Washington State Employment And Wages Dataset From 2016 – 2019

Hey! Needing the “title” dataset and I am having a hard time looking for one. I need it for a Data project within my CS class and only found some that are already graphed up but not tables like for excel.

submitted by /u/ShuTheShinobi
[link] [comments]

0

Body Measurements (Age, Height, Weight, Sex)

Hello, I am trying to find a dataset with the abovementioned fields. Ideally, at least 1000 samples from adults in the US. Any ideas?

submitted by /u/Ghost4113
[link] [comments]

0

Does Discogs Allow A Data Export From The “whole” Marketplace?

I’m working on building a dashboard for an interview at a sales org. I figured sales data/price tracking with Discogs would be a good fit, and is something I can speak to and am interested in.

It looks like the only exportable data they can give is from my own marketplace/sales data. Unfortunately, I haven’t done enough sales on Discogs to build anything interesting/compelling. Is there a way I can get a CSV with the sales data across the whole Discogs marketplace? Or another way to gather a wider swath of metrics from their sales?

(I’m also open to hearing other suggestions, if you can think of a place that has CSV data in multiple tables pertaining to sales that I can easily download/import.)

submitted by /u/eulerpony
[link] [comments]

0

Dataset For Serial Killers For Data Science Project

Hello all, I’m searching for a dataset that would correlate between serial killers and how to prevent them, recognize them, or use advancing technology to help the goverment stop them. It’s a data science project I have and I’ve been searching for a bit but couldn’t find anything worth it. If anyone has any suggestions please help

submitted by /u/Toottootyarabamoot
[link] [comments]

0

Role Of Interoperability In End-to-End Data Governance: As Implemented By Data Developer Platforms

submitted by /u/growth_man
[link] [comments]

0

GPS Dataset Columns Interpretations.

Hey Data Scientists,I’ve been working with a GPS dataset for vehicle routing, but I’m having trouble interpreting some of the columns. The dataset doesn’t have column names, but I’ve managed to figure out some of them:

First column: Vehicle ID Second column: Timestamps Third column: Longitude Fourth column: Latitude Seventh column: Speed (I’ve determined this through patterns in the data)

However, I’m still unsure about the remaining columns:

Fifth column: This column starts with a value of 319 and keeps changing increasingly in general even though the vehicle is stationary. I noticed that the value stays constant when speed is constant. Sixth column: This column starts at 0 (the vehicle is stationary), moves up to 303 once the vehicle starts moving slightly, and goes back to 0 when the vehicle is stationary. Also, it shows a constant behaviour when speed is constant Eighth column: This column changes with location change, similar to the speed column. However, when the longitude and latitude remain constant, the values are 0. Any ideas on what this column signifies?

submitted by /u/ziade_e
[link] [comments]

0

Casia NIR-VIS 2.0 Dataset For Facial Images

The Casia NIR-VIS 2.0 dataset is for infrared and corresponding visible images. But when asked for access by mailing the email given, it says the domain is not available.

Is there anyone who knows about it’s availability and how to access it?

Or any other large scale publicly available dataset for NIR-VIS images

Thanks in advance!

submitted by /u/Ok_Championship9256
[link] [comments]

0

Data Set Needed For Predictive /classification Model Building

Can anyone suggest a dataset for predictive analysis /classification (not image based)?
not from kaggle

submitted by /u/FrontCryptographer59
[link] [comments]

0

Could Someone Please Recommend A Source For Data On Dietary Intake For Many Different Countries?

I want to analyze the prevalence of dementia across different countries (compared with the proportion of elderly for reference) and thought it would be interesting if I compared it with something dietary, just to see if there is any correlation at all that I could look into further. Where can I find datasets on specific dietary aspects for many different countries? For example: saturated fat or omega-3 fatty acid intake. Anything is good! It doesn’t have to be relating to fat specifically, though that would be cool. The most important thing I think is that it covers many different countries.

submitted by /u/Relevant_Engineer442
[link] [comments]

0

Looking For Dataset Of Songs Sorted By Repetitiveness

Hi. I’m a desperate psychology PhD student looking for experimental stimuli for one of my experiments. I am studying how repetition in music is linked to cognitive mechanisms and how it affects aesthetic appraisal.
As the title says, I am looking for a dataset or database of songs/melodies/auditory stimuli that can be sorted from the most repetitive to the least repetitive. Looked everywhere but could not find one that suits my needs. Stumbled upon FMA but I am a bit lost in all the programming lingo and I don’t seem to find what I need in there.
Any lead would be appreciated, thanks in advance!

submitted by /u/AlexrooXell
[link] [comments]

0

Looking For Electric Panel Capacity Data

I’m looking for data on what types of panels (ie how many amps) people (California in particular) have both in single-family and multi-family homes. Plan to use it to build estimates for home electrification Does anyone have any recommendations.?

submitted by /u/dalberts
[link] [comments]

0

Looking For A Simple Country/territory Dataset With Population And Area

Hi all, subject says it all. I was using the countryinfo package in Python but it is incomplete and outdated, I think. Any format would be fine, REST API form of some kind would be the very best though. Thank you in advance. I apologize if this is common knowledge or asked frequently.

edit: when I say territory, I mean that distinct/separate areas should be separate in the dataset, for instance Puerto Rico should be separate from the 50 states; British territories in the Caribbean should be separate, etc

submitted by /u/Beneficial_Order1050
[link] [comments]

0

Looking For A Dataset Of Photoshopped Images Of People

I’m looking for an image dataset of photos of people that have been photoshopped. The quality of the photoshopping is not particularly important, but ideally would have a range of good and bad photoshopping. Does such a dataset exist? I’m new to this area and to this sub, so apologies in advance if I am somehow breaching expected conduct! Thank you so much

submitted by /u/spring_beauty
[link] [comments]

0

Looking For A CBCT Annotated Teeth Dataset

Does anyone have a dataset of cbct with annotated teeth?

CTooth fits the bill but i’ve been trying to reach out to the devs of the dataset and no one has been answering or giving official news for like 5 months now.

Thanks in advance.

submitted by /u/MrSpuriz
[link] [comments]

0

A Growing Database Of InfoSec/Cybersecurity Salaries For 2024 (Open Data)

Hi all,
This is the InfoSec/Cybersecurity Index for 2024 – released in the Public Domain!

You can download the data here (including previous years!): https://infosec-jobs.com/salaries/download/
Or check out some aggregated stats and an overview here: https://infosec-jobs.com/salaries/

Hope it helps, have fun playing around with the dataset 🙂

Cheers

submitted by /u/infosec-jobs
[link] [comments]

0

Looking For A Dataset For Business Scheduling

Currently hitting a roadblock on a project of mine, I can’t seem to find a dataset for business schedules. The precise data I need is time and dates of business meetings and/or employee tasks.

A potential solution I’m starting to think of falling back on is mocking the data, in that case I’d love to hear any constraints I should introduce to the process to make the data at least quasi-realistic (like avoiding weekends/off work hours, concurrent meetings/tasks…etc)

Any help would be highly appreciated!

submitted by /u/pyrocitor02
[link] [comments]

0

Dataset For Customer Acceptance Rates On Financial Offers For Master’s Thesis

Hi everyone!
I’m working on my master’s thesis and urgently need a dataset on customer acceptance rates for insurance/private loans/mortgage offers. My collaboration with a bank has been delayed, and I’m exploring other options to test my models. I’ve checked Kaggle and Google Dataset Search with no luck.
Does anyone have leads on where I can find real data from banks or insurance companies? Any help or direction would be greatly appreciated!
Thanks!

submitted by /u/Intrepid_Patience_37
[link] [comments]

0

Dataset Needed For My Tableau Project

Hey everyone, I have been taking a Data Visualisation & Storytelling course. The curriculum involved data cleaning techniques, representing data to build a story around it, and teaching us Tableau & PowerBI. Now that we are at the end of the course, we need to choose a dataset and work on it using Tableau. I will say we were taught an intermediate level of Tableau. Please suggest a few datasets that you think will be perfect to work using Tableau. TIA!

submitted by /u/riteshzd
[link] [comments]

0

Cosmopedia Is A Dataset Of Synthetic Textbooks, Blogposts, Stories, Posts And WikiHow Articles

submitted by /u/yaph
[link] [comments]

0

Are There Any English Medical Datasets?

My company asked me to test MedicalGPT, they just want to know it’s capabilities and take it for a test run.

The problem is they provide a very small English medical dataset, it’s very useless. Their real dataset is Chinese, I can’t work with Chinese, how will I be able to know if they get the questions or answers correctly if I don’t understand the dataset.

And the dataset is too big to translate, ChatGPT and Google translate can’t translate that because it’s too big.

I’m looking for a clean data structured data, I prefer not to waste time cleaning it, it’s fine if it’s paid, if the price is okay. The company would pay so that’s fine

submitted by /u/lynob
[link] [comments]

0

A Growing Database Of AI/ML/DS Salaries For 2024 (Open Data)

submitted by /u/ai_jobs
[link] [comments]

0

[self-promotion] Free Nasdaq Price/Volume Data From Cybersyn & Databento On Snowflake

Available free of charge for internal use
Dataset: https://app.snowflake.com/marketplace/listing/GZTSZAS2KF7/cybersyn-inc-financial-economic-essentials
Docs: https://docs.cybersyn.com/getting-started/concepts/stock_prices_trading_volumes

submitted by /u/aiatco2
[link] [comments]

0

In Dire Need Of 3 Csv Files That Have A Similar Column To Merge On, Can Be Any Topic

^^^

submitted by /u/hpisces
[link] [comments]

0

Real Estate Agents By Location, (and Production?)

Howdy Howdy,

I am doing market research and looking for specific data sets. Specifically, I would like to 1.) Identify all licensed real estate agents in the united states by geography (by county, at a minimum).

Some States (California) allow the entire state’s list of licensees to be downloaded. Other states are slightly more challenging (Florida allows search by county with 50 record results per page). This data will be sued to create heat maps showing the concentrations of these agents that will be overlaid with other data sets.

State of Florida https://www.myfloridalicense.com/LicenseDetail.asp?SID=&id=DD459E87706F08CE93C23892B24FDAC4

I’m sure this could be scraped, but it also seems like something that would be for sale somehwere already.
Additionally, Id LOOOVE to see the production of these agents to create a bell curve.

Thoughts, suggestions are welcome and appreciated!

submitted by /u/LotsaProperty
[link] [comments]

0

Seeking Advice On Customer Segmentation For E-commerce

I’m currently embarking on a project to revamp customer segmentation for an e-commerce company.
We’ve got lots of data already, but I’m not sure what exactly I need to make this work well. Figuring out customer groups helps us make shopping better for everyone.
Here’s what I’m wondering:
1. Important Data Stuff: What kind of information should we have in our data to understand our customers better?
2. Fixing Data: How can we make sure the data we have is good enough to help us understand our customers?
3. Good Ways to Sort Customers: Do you know any good tricks or tools to help us figure out what groups our customers belong to?
4. Checking if it Works: Once we have our groups, how can we tell if they’re helping us make shopping better?
We’ve got loads of data, but making sense of it all is tough. I’d really appreciate any advice you can give. Whether it’s from your job, what you’ve learned, or just good ideas, I’m all ears. Thanks a bunch for your help!

submitted by /u/Appropriate_Union_58
[link] [comments]

0

Seeking Help: FIVB Volleyball Men’s World Cup 2022 Attendance Data In Slovenia

Hey r/datasets community!

I hope this post finds you all well. I’m reaching out to this amazing community because I’m currently working on a sports analysis project focused on the FIVB Volleyball Men’s World Cup 2022, specifically looking into the attendance figures for matches held in Slovenia.

I’ve been scouring various sources for this data, but unfortunately, information on the number of people who attended each match in Slovenia seems to be quite elusive. The limited availability of this data is proving to be a significant challenge for my analysis.

If any of you were fortunate enough to have access to reliable sources, I would greatly appreciate your help. It would be fantastic to get accurate attendance figures for every match played in Slovenia during the FIVB Volleyball Men’s World Cup 2022.

Whether you have personal experiences, know someone who attended, or have stumbled upon some hidden gems of data, any information you can provide would be incredibly valuable for my project.

Additionally, if you have tips on where I could potentially find this data or if there are any local sources in Slovenia that might have compiled such information, please let me know.

Thank you so much for taking the time to read this, and I truly appreciate any assistance or guidance you can offer. Let’s work together to make this analysis a slam dunk!

Looking forward to your responses! 🏐🌍

submitted by /u/nejcGo3
[link] [comments]

0

Is There Any Dataset That Contains All The Facebook Groups And Subreddits?

Is there any dataset that contains all the Facebook groups and subreddits?

submitted by /u/Icy_Ad_8248
[link] [comments]

0

Where Can I Find Datasets Relating To Genetics And Diseases?

For instance, data on how changes in a certain genetic locus impacted the rates of Alzheimer’s disease, or any other disease. Or– how a certain non-genetic lifestyle factor, ie: omega 3 in the diet, related to rates of Alzheimer’s disease. I’m doing a project for a statistics class where we use the program R to calculate summary statistics and analyze the data. The problem is, I have no idea where to actually find data! I’m pretty new to this. Does anyone have any suggestions? It doesn’t have to be this specific, either. It can be about anything, really. I mostly just want to know some good sources.