Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking For Updated Dataset On Hofstede’s 6 Dimensions Of Culture

Hi I am trying to use the most recent data from Hofstede’s 6 dimensions for my thesis on how culture impact AI innovation. I found the data i need here: https://www.hofstede-insights.com/country-comparison-tool But it is not in a excel format and typing it over would take a lot of time. Online I could only find datasets from 2015. Is there a more recent version publically available?

submitted by /u/Electronic-Boat5375
[link] [comments]

I’m Struggling To Find A Resource That’ll Give Me A List Of Songs That Released Each Year For The Past Decade

I’m conducting a research project where I compare music from before and after the Advent of TikTok to see if TikTok really changed how people music.

I have been looking far and wide for a a library, package, API or database that can give me a reliable list of the songs released each year from 2010 to 2023.

Could y’all recommend the most reliable source to get this type of data?

Thanks

submitted by /u/reddit_turtleking
[link] [comments]

Historical Sale/coupon/promotional Prices At Grocery Chains

Hello! I’m looking for a dataset of historical grocery store item prices, specifically at the promotional / sale / rewards card price (hopefully including details like if it was BOGO, 2 for 1, minimum or maximum purchase requirements, etc.).

I see quite a few price histories but nothing that specifies if it was on sale and what the deal was. And I’m sure grocery chains wouldn’t share this information.

Maybe the best path would be to scrape this data myself going forward?

Thoughts? TIA!

submitted by /u/secondcupoftea
[link] [comments]

Other Examples Of Websites Like NYC’s Data Visualization?

NYC’s “Open Data” website allows you to quickly visualize the datasets right within your web browser. This includes a tabular view along with customizable graphs and charts:

https://data.cityofnewyork.us/d/k397-673e/visualization

Are there other websites that offer something similar for their respective public (and open source) datasets? I’m curious about the overall UI and UX these websites provide in hopes of drawing some inspiration for a website of my own one day.

submitted by /u/TheCodingCyclist
[link] [comments]

Explore The Ultimate UFC Dataset On Kaggle!

Hey everyone,

Just wanted to share this awesome find on Kaggle: “The Ultimate UFC Archive (1993-Present)” dataset. It’s a treasure trove of UFC data covering events, fights, fighters, and referees.

What’s Inside:

Event details Fight outcomes Fighter statistics

Why It’s Cool:

Detailed fight data In-depth fighter profiles Constantly updated

Whether you’re a data enthusiast, a die-hard fan or just curious about MMA, this dataset has something for everyone. Check it out and dive into the world of the UFC!

UFC dataset

Enjoy exploring!

submitted by /u/ShockOk4912
[link] [comments]

Hotel Data – I’m Build A Hotel Availability App.

Anyone know where hotel apps (hotels.com) would get its data from? Example would be Hotels.com I’m looking to gather availability dates and inventory.

I know that most apps will use API’s. I want to see if there is a single system where I can connect an app to that will pull hotel data from around the world inventory availability dates.

submitted by /u/FreeeRide-
[link] [comments]

Needed Fb, Insta And Twitter Comment Dataset For Sentiment Analysis

I’m currently working on a project to develop an application that can fetch the most recent posts from a provided company’s Facebook, Instagram, and Twitter profiles. The application also needs to perform sentiment analysis on the comments for these posts and create a notification system to alert users if any negative comments are detected.

I need to train the model based on dataset from Facebook, Instagram and Twitter but I can’t find what I need on github/kaggle

submitted by /u/DeVoe69
[link] [comments]

AI Books4 Dataset For Training LLMs Further

What?

More than 400,000 fiction and non-fiction book full-texts. Multiple languages, curated, deduplicated.

More than 6,000,000 scholarly publications, magazines, and manuals full-texts. Multiple languages, curated, deduplicated.

150,000,000 metadata records

Format

Zstd compressed file, JSON lines, one per book/publication.

abstract, content – description and content in markdown format

issued_at – time of issuing of the object (not of the record itself)

metadata – ISBNs, publishers, series etc

id – identifier in external systems, if applicable (i.e. DOI)

other fields should be self-descriptive

Download:

magnet:?xt=urn:btih:a904e660355c49006b2e7d43893d31bf3c2be9cc&dn=libstc2.jsonl.zst&tr=udp://tracker.opentrackr.org:1337/announce&tr=https://tracker1.ctix.cn:443/announce&tr=udp://open.demonii.com:1337/announce

submitted by /u/JohnTheMelancholic
[link] [comments]

I Will Create Free Data Pipeline + Analytics Dashboard For You

I am an experienced data engineer and I have three free days next week.

If you have a dataset for which you would like to create a data pipeline for continuous ingestion, and you would like a dashboard built and/or AI-based Q&A on top of that, I am available to help. I will take on the project if it is interesting enough and if you can benefit from it – for free :).

The dashboard/Q&A would be made available at dataflick.dev ‘s free tier.

Let us see if there are some interesting usecases

submitted by /u/Such-Cartographer750
[link] [comments]

Looking For Synonym Database In Sqlite

Hi all,
I’m looking to program a fun CLI tool in Rust that will take a string and then replace all of the words with a random synonym. I plan on implementing this using a sqlite3 package to make queries to an already existing (SQLite) database containing a bunch of synoyms.
The only issue now is that I can’t seem to find a page for said database, and writing one by hand sounds like a terribly daunting task 😅

Would somebody be able to help me find this?

submitted by /u/7turtlereddit
[link] [comments]

Finding Industry Employment Data Broken Down By Age

I’m trying to find info on employment by sector and age but am having a hard time finding it.

I’d to get a breakdown of where young people work in Austin, TX to compare to El Paso, TX just to try to get some ideas on why El Paso loses so many young people to other cities and what kinds of industries are attracting them

I’ve found good data on different job sectors from the US Bureau of Labor Statistics, but it doesn’t break down by age range: https://www.bls.gov/oes/current/oes_12420.htm

submitted by /u/asarcosghost
[link] [comments]

Looking For Car Theft Data Either City, State, Or National

Hi I’m looking for a dataset that has car theft data. I’m looking for make/model, time of theft, location, recovered(y/n), and details if possible. This is for a school project that I hope becomes a helpful tool to mitigate car thefts.

I reached out to the FBI and local PD, but haven’t received a response. I don’t care much for the location of the dataset but am prioritizing location of thefts.

submitted by /u/iamaguesttoo
[link] [comments]

Popular Streaming Services (eg. Netlifx, AmazonPrime, Disney+, Ect.) Metadata

I’m looking to do a python-based data analysis and visualisation project. I was looking to focus on the data and metadata of most, if not all, available movies and TV series provided by the most popular streaming services.

I see most online projects use this kaggle source: https://www.kaggle.com/datasets/shivamb/netflix-shows/data

As nice as it is, it’s not as up to date as I would have liked, as it only goes up to 2021.

Is anyone aware or any other public, free dataset similar to the above which could fit my purpose?

I’m aware there are many sites such as https://flickmetrix.com/ and https://flixable.com/ which seem to have a large amount of movie’s data but I can’t seem to be able to find their source and/or if they have shared it publicly.

Thank you

submitted by /u/the_forgettable
[link] [comments]

Open Source Data Sharing Project For Research Labs / Individuals

Hey guys! I have noticed that there is not much in the realm of open source datasharing services, so I created a Django REST / React app that allows for upload, download, reviewing, etc, of files. Not sure if would be useful to people. Also, please feel free add features. This is meant to be an open source project that allows research labs / people to share and review datasets without needing to pay for any online subscriptions. https://github.com/lxaw/DataDock

submitted by /u/AGenericBackup
[link] [comments]

Data Labeling In Spreadsheets Vs Labeling Software?

Was talking with some of my classmates from undergrad and discussing our jobs/research. Something that we all still complain about is labeling data in spreadsheets.

Looked around online and found a whole host of data labeling tools from open source options (LabelStudio) to more advanced enterprise SaaS (Snorkel AI, Scale AI). Yet, no one I knew seemed to be using these solutions.

I kinda get it from an ease of use/cost stand point – as an undergrad researcher, it was way easier to just paste data into a spreadsheet and send it to my lab. But I’m currently considering doing a much larger body of work. Would love to hear people’s experiences with these other tools, and what they liked/didn’t like.

For context, doing a bunch of Large Language Model output labeling in the medical space (n = ~2000?).

submitted by /u/ninepancakez
[link] [comments]

Recommend Me A Dataset For Hands On Project

Hey there, I am learning apache spark and aws cloud. I am planning to make a project basically an ETL project using Glue. I want to perform transformations using spark but I haven’t came around any good dataset, it’s not like there are not datasets but I want a big dataset of thousands of rows and some under 10 columns, like I have found out some myself like UFO, World Bank etc either it is too big or it just not have a good source. Are there any fellow redditors who have worked on something similar or you just have a good Recommendation??

submitted by /u/datastoner
[link] [comments]

Need A College Dataset For A AI I’m Making

Hello!

I have spent hours looking for a dataset that includes information over college courses + a description briefly describing the course.

I have had some luck having found thorough datasets explicit to certain colleges. Perhaps I can just use those and call it good; I assume most colleges have roughly the same courses, some differ slightly.

But before I continue my journey I just wanted to see if this community knows of any decent datasets in regards to college information including, but definitely not limited to, the majors and a brief description of the majors?

submitted by /u/sumanila
[link] [comments]

[self-promotion] ICYMI: You Can Now Get Notified When Any New Code Is Released For A Given Paper Or Topic!

ICYMI: You can now get notified when any new code is released for a given paper or topic! Just install the code finder extension (Chrome: https://chromewebstore.google.com/detail/ai-code-finder-for-papers/aikkeehnlfpamidigaffhfmgbkdeheil | Firefox: https://addons.mozilla.org/en-US/firefox/addon/code-finder-catalyzex/ | Edge: https://microsoftedge.microsoft.com/addons/detail/get-papers-with-code-ever/mflbgfojghoglejmalekheopgadjmlkm), click on any bell/alert icon you come across while browsing the web and follow the next steps on the screen 🙂 Also, with alerts

get the latest developments in your area of interest delivered straight to your inbox. Author’s newest work: be the first to know when an author releases new papers.

submitted by /u/fullerhouse570
[link] [comments]