Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Analytics Of Most Successful Youtube Channels

I’ve seen reports posted previously of people analyzing, say, top 500 earning YouTube videos/ channels. They go into thumbnail, video title, genre, audience, etc. I can’t find anything when I Google it though. I keep getting ‘YouTube Analytics’ for your own individual channel.

Anybody have any idea? Thanks

submitted by /u/TaoTeCha
[link] [comments]

Looking For Technical Employment Dataset With Real Data

I’m looking for a dataset targeting technical roles regardless that includes elements such as industry, location, job title, whether the role is managerial/supervisory/has direct reports, gender, salary, company size. I’ve tried a number of places including data.world, kaggle and O*NET but haven’t been able to find something similar. My goal is to identify technical managers (regardless of job title) for further analysis. Can anyone point me at a good source, or good datasets?

submitted by /u/_AriC
[link] [comments]

Large Song Dataset With Artist Similarity, Genres And Song Mood

I am searching for a Large Song Dataset including mood and similarities between artists. I found the Million Song Dataset but it seems that they don’t have valence in the fields, so I would need to query Spotify.

However, it seems like there is no way currently to go from Echo Nest ID to Spotify ID.

Does anybody know a Large Dataset I could use which would have everything I need? Or a way to link the Million Song Dataset with Spotify API?

submitted by /u/MusicAIPerson
[link] [comments]

Isolated Instruments Dataset For Source Separation?

Dataset recommendation request:

I’m looking for any existing publicly available datasets with many examples of isolated instruments being played with no accompaniment and minimal ambient noise.

I need isolated instruments to train individual instrument source separation and detection models for [bar,ts,as,ss,tp,cl,dm,b,etc., etc.] – basically all of the most commonly found instruments in jazz sessions with the exception of piano (which I have no problem sourcing isolating recordings of).

I can probably source sufficient material from Youtube, but and hoping there are some new datasets I haven’t heard of yet with isolated instruments.

submitted by /u/returnstack
[link] [comments]

Looking For Twitter Dataset For A Research Project On Use Of Social Media And Online Mobilization

Hello,

I’m very new to this, so I’m extremely sorry for any beginner terminology used here. I plan to do my bachelor’s thesis on the use of Twitter for online mobilization during the “Dalit Lives Matter” movement (a movement based in India- very similar to the Black Lives Matter movement if not obvious tweets timeline from 2016- 2021). I am planning to do a content or sentiment analysis of the tweets.

I was looking for methods on how to access such datasets, I have heard X’s API has been put behind a paywall and the free version cannot support archival search. I contacted a third party for access to tweets and they are charging a hundred dollars for the same.

Please let me know what is the best way to go about this, if required I can connect in DMs to give out additional details.

Thank You 🙂

submitted by /u/Prestigious_Aioli140
[link] [comments]

Looking For A Dataset About Cerebral Palsy.

Particularly, I am looking for a dataset that studies about the academic success of students (any level) with cerebral palsy vs healthy students. (i.e., how well they do in sch, dropout rates etc.)

Other data (healthy vs ppl with cerebral palsy) about employment rates, or any other indicators of success in life is ok too.

It can even be about datasets about just people with disabilities vs healthy individuals as well.

submitted by /u/ExcellentWrap1208
[link] [comments]

Is There A Market For Selling Datasets?

I’m working on a marketplace for selling datasets and decided to discuss the idea with the community here. The goal is to connect ML teams/researchers with the exact datasets that they need. These would be high quality and like any other marketplace would be quality controlled via reviews/comments.

Would any of you find a need for this if the selection was robust enough and quality was good? Would you pay for it? Or are you finding what you need mostly free in the public domain? Curious to get your thoughts

submitted by /u/brequinn89
[link] [comments]

Dataset For Social Media Post Tagging (e.g. “Apple Just Released The IPhone 15 Pro” -> Tag: “technology”)

I am building a social platform and I want to use AI to predict what are some of the user’s interests. I imagined that when you post something on the platform an AI model would tag this post with example “funny”, “politics”, “technology”, “entertainment”, “other”, etc. Now I need a dataset with an example of a post and with a tag e.g. “politics”. Do you know any datasets that would meet my expectations and requirements.

submitted by /u/RokKuz3
[link] [comments]

[PAID $200+] AI Startup Requesting Datasets From SMBs! Will Pay $200+ For All Kinds Of Datasets

Hey r/datasets! I’m working on a startup and will pay $200+ for datasets from small & medium businesses!

All kinds of datasets related to SMBs are welcome — timesheets, balance sheets, payroll, expenses, etc.

Along with the dataset, please submit 15 questions which can be answered using your dataset. For example: “What was the best selling item in January 2022? Who is the top performing salesperson in this dataset? How many products were purchased in this dataset?”

Please comment if you’re interested — thank you so much in advance.

submitted by /u/mewolove
[link] [comments]

[Self-promotion] Dataset Translation Script: Is This A Problem You Commonly Face?

Is translating data something you have to deal with often? How do you typically solve this? I tried to build something that automates dataset translation, and I’m curious to understand if other folks struggle with this often. Would love to get your thoughts and input on the topic.

What is it: A script that automatically translates any dataset to your language of choice, using the Google Cloud Translation API. The example uses a dataset with dummy customer data, which gets translated from English to German.

Why use it: To create reports and dashboards in multiple languages. The output feeds directly into an embedded BI tool (in the project, I used Luzmo), and the script can be run on any dataset out of the box. With heavier modifications to the script, you could also store the translated data in a database, data warehouse or other destination.

Who it’s for: Software developers, product managers or data engineers who are working on multi-lingual apps, especially for analytical features, dashboards or reports.
How it works: There’s a GitHub repo you can clone, and a tutorial to walk you through the full set-up. Once you have the script up and running, you can run it repeatedly on any dataset, with any language.

Would love to get your feedback on whether this is useful, as well as any improvements that could make it better!

submitted by /u/InsightScripter
[link] [comments]