Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Stock Market News Dataset – 2008 Or Later

Hello,

I’m working on a machine learning project, and need a large dataset of financial news. Specifically, I’m looking for news on companies that have a medium market cap or lower, and from a period of 2008 until now… or any interval of time over this period.

Is anyone aware of such a dataset? Or any websites where I can query historical financial news – ideally free?

Thank you.

submitted by /u/JustinPooDough
[link] [comments]

Enlighten Me About These Project’s Dataset.

I have a school project which involves creating an Ingredient-Based Recipe Generator Chatbot for Bicol Cuisine Main Dishes. The chatbot should generate recipes based on user commands, but these commands must contain a minimum of three ingredients. I plan to use fine-tuning with OpenAI’s language model. Since this is my first AI project, I’m a bit confused about how to begin creating the dataset. Can someone help me by explaining how I should go about creating the dataset?

submitted by /u/akameaoi
[link] [comments]

Want A Huge Dataset Of All English Songs

i want to train my AI on songs and poems, so i want a huge dataset of all english songs and poems, any suggestions on websites , i can scrape to get a large set of english songs only i heard of azlyrics but it contains other languages romanized versions too that makes it hard to get english songs only

submitted by /u/innocentboy0000
[link] [comments]

Providing Datasets, Leads As Needed. US Healthcare Available.

Hey all! 👋
👩‍⚕️ Healthcare Datasets Expertise:
Been diving into USA healthcare datasets for a year now 🏥✨
🔧 Services:
Web scraping, data management, and cleaning – I’ve got your data needs covered. Let’s tidy up those datasets and make them shine! 🌟
🌐 Tech Stack:
Python, Node.js, Puppeteer, Scrapy, Selenium, BS4 – name it, I’ve conquered it! 🚀
💬 Let’s Connect:
Ready to boost your projects with quality data? DM me, let’s chat and cook up something awesome together! 📬🤝

submitted by /u/purplepyramid7
[link] [comments]

Looking For Datasets: ClickStream, HealthCare, IOT, Agri, Edtech,Sales

I’m looking for raw datasets either session based or user based, (NOT THE AGGREGATED)

Here’s what I’m looking for, I’ll pay for any or all of the following, I’m fine either with one or many of these ….

1) IOT: timeseries dataset from individual IOT device, I’m fine with any data in it.

2) HealthCare: timeseries for individual patient or procedule, if you have anything else please let me know, it should not be aggregated

3)Agri: Individual sensors or any other device data along with location(perferable)

4)ClickStream: timeseries and session based

5) Sales: timeseries, user or session based along with product and sales cost

6) Edtech: let me know whatever you have.

Please DM me if you can help or point me to some source. I’m fine to pay or free or whatever works.

submitted by /u/Winter-Breadfruit943
[link] [comments]

Help With SPSS Survey Data Set For Grad Student

Struggling grad student here. My advisor is off for the break and I could really use some support with my quantitative analysis. I’m using SPSS on a survey data set I collected. I need to run multiple regression analysis but everything is coming back insignificant. This might be the case, but I would really appreciate a second set of eyes. I’m willing to pay for your time, just wanted to get this knocked out while I’m home for the holidays.

submitted by /u/TiredTiddies
[link] [comments]

Looking For A Comprehensive Sector Categorisation (string) For A Boolean Search On Company Name

Hi,

I’ve got a large list (1M+) of company names which have input by users. I’d like to categorise them by sector, but given the regional bias of some sets (e.g. SIC, NAICS) and the cost of others (Bloomberg) there isn’t a single comprehensive source that I can find.

Does anyone know of one? The end output is career guidance for people getting back into main street work (e.g. mums after kids, veterans leaving the forces).

Thanks

submitted by /u/Early_Respond7150
[link] [comments]

Do You Know Any Dataset Of 3d Human Meshes, Where The Train Images Are Synthetic But The Test Images Are Real?

I need a dataset of human 3d meshes. The most important requirement of this dataset is it to have real test data. The actors of the human 3d meshes must have images in the real scenarios.

The train data can be generated and not be given by the dataset. Since if they provide the meshes with the textures, I can use a software to generate synthetically the train data.

But the test data it must be real.

submitted by /u/henistein
[link] [comments]

I Need A Face+Audio+EEG Dataset For Didactic Purpose

Hello everyone,

I’m a CS student and I’m trying to approach to the emotion recognition. I played a little bit with this multimodal network for emotion recognition (https://github.com/katerynaCh/multimodal-emotion-recognition). I find it pretty cool, with the network that works very well with the Face+Audio modality. However, I was trying to implement in this network the emotion recognition with EEG (I don’t really know how to do it, but still…) but I cannot find any dataset that contains Face, Audio and EEG data. Actually, I find the PME4 dataset (https://figshare.com/articles/dataset/PME4_Emotion_Recognition_with_Audio_Video_EEG_and_EMG/18737924, Face+Audio+EEG+EMG) but it has a very different structure than the RAVDESS dataset used for the multimodal network that I used in first place and I have no idea on how to adapt it to the network, so I was trying to find other datasets.

submitted by /u/_link23_
[link] [comments]

🧼 SUDS – A Guide To Structuring Unstructured Data [self-promotion]

I’ve spent a decent amount of time indexing and formatting a lot of machine learning datasets that include images, audio, video, and text and wanted to propose a simple format that might help us standardize a format for the data with a little more structure. Wouldn’t say it is ground breaking, but I feel like could be a good practice.

https://blog.oxen.ai/suds-a-guide-to-structuring-unstructured-data/

Let me know what you think!

submitted by /u/FallMindless3563
[link] [comments]

Internet Usage (pre/post Covid) Datasets

For a project, I am looking into country wise data of internet usage (laptop, cellphone usage, use in work and school, factors behind using it at home/cafe etc.) and want to find some trends pre/post covid on data usage. Where can I find relevant datasets for this?

So far I have only found CPS computer and internet use supplement datasets but that’s only for the US and I want data on more countries, especially in the EU and for developing or poor countries like India, Africa etc. Anyone knows any relevant data sources for this? Thanks a ton!

submitted by /u/__pringles__
[link] [comments]

USA Presidential Elections By State In History

I’m looking for a dataset with all state-by-state US presidential election results from at least 1960 onward with all candidates and votes cast. I would need a dataset not only with Dem and Rep, but also the various minor candidates (like Perot, Wallace and so on). I’ve searched everywhere, without success.

submitted by /u/Data___Viz
[link] [comments]

Trying To Find ASMR Speech Instruction Datasets

Hi fellow redditors,

I’m working on a mini-project where I want to build an ASMR text-to-speech model. Due to the lack of ASMR datasets available, I went on to build a small audio dataset from youtube videos (about 85 videos large, wav extracted) and downloaded their transcript or STT using Whisper. But during training, a lot of errors popped up due to variation in size, bitrate, sample rate, etc.
I’d be grateful if you could point me to any existing ASMR dataset with high/medium quality small audio files (<1min) along with text transcripts.

submitted by /u/Available-Deer1723
[link] [comments]

Dataset On Betting – Where To Get For Free

Hello, I want to find a dataset on sports betting. I’m doing a school paper on the correlation between the initial expected win rate and the actual return on investment. Any dataset that includes: a large amount of data points (I don’t know much is big), expected rate, return on investment if won, and who actually won would be sufficient. Sport doesn’t matter, could be pickle ball for all I care. Thanks for any help. I would really prefer not to spend money. However, if I really need to please keep it cheap. Thanks again for any help; data is used for school, so I’m protected a little bit.

submitted by /u/TrainerPug
[link] [comments]

Data On Why We Preserve The Environment

Hi everyone,

I’m looking for data on public attitudes towards the environment; more specifically, data that answers the question “why do we want to preserve the environment?” or “what are our reasons for wanting to preserve the environment?”. Something like that. Thanks!

submitted by /u/Connorleak
[link] [comments]

Scraping Google Trends, And Incomplete Datasets. Help, Please?

Hey all,

I’m trying to scrape Google trends data using Python and proxy API.

But it’s not always returning the data. I have to try 10-15 times sometimes and I don’t get anything at all sometimes.

Say I want to get trends for “holidays in Italy” for the last 5 years, It might bring me 3 years’ worth of data, and the rest of the years will be 0.

But, if you check the data in Trends, it’s not 0 for the last 2 years.

So it’s partial. I’m wondering what’s going on here. Is Google detecting my scraper, or is there a solution to this? It’s driving me nuts.

I’ve tried a bunch of APIs. DataforSEO, Keywords everywhere etc and they all suffer from the exact same issue.

Thanks.

submitted by /u/shapeless69
[link] [comments]

Data Lending Club Needed 2017 – 2022/2023

Dear all,

Currently, I am trying to retrieve datasets from Lending Club, but due to my non-American nationality I cannot enter the database. Therefore, I am hoping that one of you can help me out. The data for my research needs to include:

– Loan origination date (monthly would be perfect)
– Amount
– Status (pending/defaulted/active)
– In which state the loan is originated
– 2017 till 2022 or 2023 Month 11 (if possible)

Any help will be highly appreciated and rewarded with 50 USD (Via PayPal).

submitted by /u/Severe-Decision4013
[link] [comments]