submitted by /u/dabressler
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I made an overview of relevant datasets for training AI Agents in Finance contrasting
Sentiment Analysis
Phrasebank FiQA Sentiment Analysis TweetFinSent
Named Entity Recognition
Quantitative Reasoning
Links are all original source.
Read my write-up here (Encyclopedia Autonomica)
Question to the community: Are there datasets that I am missing, but should know about?
submitted by /u/d3the_h3ll0w
[link] [comments]
I am a college student looking to do a project on point shaving in college baseball and I am looking for any historic data on point spreads, preferably going back to at least 2015. I have been able to get all the scores for this period, but I am struggling to find any historic betting odds. If anyone has any advice, please let me know and I appreciate any help.
submitted by /u/ocondr20
[link] [comments]
I’m a big movie person and my local theaters usually put out a schedule for the month. I’d like to compile this schedule and build some aggregator newsletter tickered towards my interests.
I’m seeking sources for information on a movie’s reviews, ratings, and descriptions. Letterboxd is def a source I want to use but it seems their API is not public. The only way is through a scrape? Which isn’t that bad. Has anyone seen some potential sources to use for a project similar to this?
submitted by /u/raz_the_kid0901
[link] [comments]
Hi there, I work for a large national non-profit as a data analyst for our fundraising campaign. I’ve been asked to provide a “dream budget” for licensing third-party data. B2B is the main focus, but understanding consumer behaviors with a place-based focus is very useful as well. Wealth, income, employment, philanthropic giving, Executive networks are all of interest. I’ve always wanted full access to things like Experian and Dunn and Bradstreet, but are there other sets, lists, databases that I should consider?
submitted by /u/xiancaldwell
[link] [comments]
I aane start my financial ML project and I’m thinking of doing a Loan prediction thing, so If anyone knows any sources would be awesome.
submitted by /u/AdAdventurous5441
[link] [comments]
I’m currently trying to compile a database of the foodstuff restaurants offer, with my main focus being Melbourne – something of the form [restaurant, location, menuObject], where menuObject is an object containing the items on the menu. I have identified restaurants and extracted metadata using the Google Maps API.
Any ideas for compiling the menu part? I do need fairly good coverage for my study.
submitted by /u/Wackome
[link] [comments]
I found a YouGov survey that examines this type of data however I cannot find any data on this topic that contains raw observations. Does anyone have any resources for this?
Here’s the YouGov survey: https://docs.cdn.yougov.com/l2y64i4kf5/Subtitles_and_TV_poll_results.pdf
submitted by /u/theamazingnano
[link] [comments]
For example, the project title could be something like “Do Happy Employees Improve Corporate Performance?” or “The Effect of Gun Control Laws on Crime”.
submitted by /u/Icy-String-2648
[link] [comments]
Does anyone have any idea where to download the REDD Dataset from? I tried going to the site http://redd.csail.mit.edu but it’s not working anymore. If you can provide me this dataset for my research, then it would be a big help. Thankyou!
submitted by /u/anxrvdh
[link] [comments]
Hey everyone!
Our organization is gearing up to create some awesome business intelligence solutions tailored specifically for Amazon sellers. We’re currently in the process of putting together a demo architecture, complete with a database and dashboard.
I’ve been assigned the task of sourcing a dataset containing information on Amazon sellers, with a primary focus on orders, returns, and product reviews.
I’ve already taken a look on Kaggle, but unfortunately, I’ve only managed to find datasets related to reviews.
Does anyone happen to have a sample dataset they could share, or perhaps some ideas on where else I might be able to find the data I need? Any help would be greatly appreciated!
submitted by /u/Fun_Signature_9812
[link] [comments]
I’m a senior stats major and am so utterly burnt out but my professor wants us to find an interesting dataset that we can apply GLM which I just can’t fathom doing. If anyone knows an easy dataset that would work you would be a lifesaver:) Extra brownie points if it’s music related because I might actually have some fun working with it lol
submitted by /u/makurroon_
[link] [comments]
Hello, I am writing a thesis (I am a student at CEMFI, Madrid.)
I have 2 projects to do:
Project 1: Use text data and do something fancy, I would like to study the tradeoff between data privacy and utility but I did not find any useful datasets.
Project 2:
I am writing a macroeconomic model about the optimal transitional dynamics towards more sustainable energy production. I am looking for a dataset with granular data where I could exploit some variation over the years in some interesting measures in order to calibrate my model.
I’d greatly appreciate any leads or suggestions on where to find relevant datasets for these projects. Thank you!
submitted by /u/Inevitable_Counter94
[link] [comments]
If someone has access to any of these datasets can they please reach out and help as I am in need and cannot afford the subscription here are their titles and doi:
RF JAMMING DATASET FOR VEHICULAR WIRELESS NETWORKS
10.21227/4zwk-yw78
MEDIUM OBSERVATION UNDER JAMMING ATTACKS IN VANETS
10.21227/yvxd-mf03
submitted by /u/ninjaboytoy
[link] [comments]
I’m looking for a Rotten Tomatoes dataset that has user reviews, critic reviews and movies (doesn’t need to necessarily have metadata but would be preferred) for a recommendation system I’m trying to build. Are there any good datasets that would work for this or would I need to attempt to scrape it myself (I have 0 experience webscraping).
submitted by /u/RealHellcharm
[link] [comments]
Hi,
I am trying to start a project and am looking for a dataset on weight loss drugs and there health effects, or the effects of saunas/cold plunges on health. All I know of to look is Google datasets, and kaggle and haven’t found much.
Could someone point me in the right direction ?
submitted by /u/Rough_Count_7135
[link] [comments]
Where would I find online university student enrolments data, number of students, term start and finish, name of course, course length. I want to produce cohort analysis and scenario analysis on various course lengths and term starts.
submitted by /u/GlitteringActuary693
[link] [comments]
Ever wondered the factors that could affect suicide rates in different countries? Check out my complete dataset on Suicide Rates that you can use for your research, learning or projects for FREE.
submitted by /u/AwuorVII
[link] [comments]
Hello guys, I have created a dataset containing family guy dialogues from season 1 to 19. Anyone interested in text analysis can use this data on kaggle. https://www.kaggle.com/datasets/eswarreddy12/family-guy-dialogues-with-various-lexicon-ratings/data
submitted by /u/Content_Drawer_2943
[link] [comments]
hey there,
after 5 years of building AI models from scratch I know to the bone the importance of dataset to model quality. hence openai is there where it is, solely bc of qualitative dataset.
haven’t seen a good “service” that offers a way to build a dataset (any task: chat, instruct, qa, speech, etc) that’s baked by community.
thinking to start a service that will help companies & individuals to build a dataset by rewarding people w/ a crypto coin as a incentivization mechanism . after ds is build ~data’s collection finalized, that could be sent to HF or any other service for model training / finetuning.
what’s your feedback folks? what do you think about this? does the market exists?
submitted by /u/betimd
[link] [comments]
I have to do a work on data mining to complete my degree on statistics
Do you recommend a specific database that isn’t very hard for data mining? I know literally nothing about this
submitted by /u/Aston28
[link] [comments]
Hi Guys, I am trying to do some analysis on the credit and payment behaviour of Indian customers. For this I am trying to get significant external public data on customer demographics and affluence and spend data on location basis
TIA
submitted by /u/nerdy-oged
[link] [comments]
Hey all, if anyone wants a dataset that would obtainable through web scraping, send me a request through a comment and I’ll scrape the data for you. Obviously I can’t just scrape anything but I have quite a bit of scraping experience.
1 rule, data you want scraped can’t be behind a paywall or a login.
submitted by /u/k7r7f80d
[link] [comments]
What are some good large PET-scan datasets containing PET scans of patients. Does not need to be a full-body, any kind of PET of any part of the body is fine.
submitted by /u/MrShikaslad
[link] [comments]
What are some good large PET-scan datasets containing PET scans of patients. Does not need to be full-body, PET scan of any part of the body would be just fine.
submitted by /u/MrShikaslad
[link] [comments]
To analyse spontaneous but comparable speech samples, researchers often use task-oriented corpora, like the Montclair Map Task Corpus. These are, naturally, focused on location/answering the question ‘where are you?’
Is there anything like this, but focused on determining ‘how much’? Basically, sets of dialogues where speakers have to communicate quantities (price, size, number of marbles, etc)?
Not necessarily just quantities, could be location or other information, too. Just that the map corpora have very few explicit mentions of distances, it’s mostly direction/environment descriptions.
submitted by /u/dennu9909
[link] [comments]