Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Rethinking Data Access: A Dive Into Decentralized Data Protocols

In today’s AI-driven world, data reigns supreme, fueling innovation and propelling technological advancements. However, a pressing challenge persists: the fragmented nature of data sources. Despite the abundance of data generated daily, accessing high-quality and diverse datasets remains a daunting task, impeding progress in AI/ML training and development.

The current situation of data sources is characterized by siloed datasets, proprietary restrictions, and limited accessibility. While large corporations and tech giants may have access to extensive datasets, smaller organizations and researchers often struggle to find relevant and comprehensive data for their projects. This scarcity of data not only impedes innovation but also exacerbates inequalities in the AI landscape, favoring those with access to privileged data sources.

Compounding this issue is the lack of compensation for data contributors, creating a lose-lose situation for all parties involved. However, platforms like Ocean, Streamr, and the emerging Nuklai are changing the game by offering compensation for data contributors and providing decentralized marketplaces for data enthusiasts.

Ocean Protocol leads the charge with its decentralized data exchange protocol, enabling secure and privacy-preserving data sharing. Through Ocean Market, users can discover, publish, and consume data assets transparently and in a decentralized manner, addressing the challenge of fragmented data by facilitating seamless data exchange across ecosystems.

On the other hand, Nuklai emerges as a disruptive force, leveraging blockchain technology to create a transparent and inclusive ecosystem for data storage, sharing, and monetization. By empowering data contributors to retain control over their data and receive fair compensation, Nuklai fosters more interaction and metadata availability, especially within data consortiums.

Meanwhile, Streamr stands out for its emphasis on real-time data monetization, providing a decentralized marketplace where users can stream and sell their data streams. With a focus on IoT (Internet of Things) data, Streamr enables devices to securely share data and receive instant compensation. Its data marketplace fosters innovation by providing a platform for buyers and sellers to engage in data transactions, thereby addressing the growing demand for timely and actionable data insights.

While all of these platforms offer unique features and strengths, they collectively contribute to the broader goal of democratizing data access and driving innovation in the AI/ML space. By fostering collaboration, transparency, and fair compensation, these decentralized data protocols are reshaping the data landscape and paving the way for a more inclusive and equitable data economy.

submitted by /u/kuonanaxu
[link] [comments]

Looking For Data Set On Fitness Programs

Hello, this is my first time in the subreddit. I’m looking for a data set that I find interesting to use for a project, and I’m pretty into fitness (more so on the muscle gaining / body building side). My idea is to work on data set with data on results / success of different traning programs. I’ve been on kaggle and awesome public datasets, but havent found anything yet. If anyone has any recommendations I would really appreciate it!

submitted by /u/noeffortnoreward
[link] [comments]

Where To Find Sub-industry Classification Of Stocks?

I’ve been looking all over but have not been able to find it anywhere. Best I can find is List of S&P 500 companies sub-industry GICS classification. Other than that, the Sector and Industry classification of thousands of stocks is readily obtainable.

Have you found a free resource that has the list of everything GICS classified? If not free, a paid resource is fine as long as it’s not crazy.

Thanks!

submitted by /u/AceDenied
[link] [comments]

Copy Content From ChatGPT/Bard/BingAI To Anywhere Without Loosing Formatting

Our developers have just created this amazing plugin called “MassiveMark” that allows users to input any markdown and render it to HTML.
So you no longer have to spend hours formatting and editing the content which you directly copied from ChatGPT/Bard/Bing etc.
It also renders all the equations, formulae, mathematics/physics/chemistry/, tables, code blocks, quotes, heading, bold, italics, underline and whatever formatting one gets.
Please check it out on MassiveMark playground at https://www.assignmenthelp.net/massivemark and provide us your feedback, thank you.

(update: We now allow you to download the output as a .Docx file for convenience)

submitted by /u/Professional-Dig-669
[link] [comments]

Looking For Photos/datasets Of The Nests Of Selenopsis Invicta (Fire Ant) Or Just Ant Nests

Hi, we are a small group of three students trying to train an AI to detect this specific kind of nest with cameras. Does anyone have a lot of photos of the nests of Selenopsis Invicta (Fire Ant)? This project is for educational purposes only.

Any dataset containg ant nests would fit our needs also.

We have already tried to contact some authors from papers in China that have already trained some AI with this specific nest, but we have been unsuccessful in obtaining the images yet.

Thank you all, any help is welcome.

submitted by /u/Beksito
[link] [comments]

RedditMods: Moderators Of Top-25’000 Subreddits

RedditMods is a dataset that anonymously lists moderators of 25’834 largest and most popular communities on Reddit. The dataset is ideal for studying Reddit as a bipartite graph, where a moderator-node and a community-node are connected if one the associated user moderates this subreddit. Clustering can then be performed to identify groups of subreddits with a particular leaning, or to recommend similar communities.

The data was publicly available and collected on 06 Feb 2024. All usernames were anonymised by hashing with SHA256, so that they cannot be linked to the moderators’ Reddit accounts.

Visualisations using this data have garnered interest. Other examples: 1, 2.

submitted by /u/OmOshIroIdEs
[link] [comments]

(Beginner Question) Having Troubles Obtaining Population Data By State

Hello! I’m not sure if this is the right place to ask, but I was given some feedback on my dashboard (https://public.tableau.com/views/UFOSightingsintheUS_17069361456020/Dashboard3?:language=en-US&publish=yes&:display_count=n&:origin=viz_share_link) to incorporate a metric that accounts for the population of each state to show sightings per capita instead of just highlighting areas with larger populations. I’m trying to get population data for the years 1990-2014, so I can create a map of the populations by state and then layer the number of sightings on top of this map.

However, I’ve been having an extremely difficult time doing this. I think I may be overthinking it, but I’ve tried to look for the data (Population by State) on the US Census website and haven’t been able to get any dataset for any of the years I want. I did find this dataset on GitHub, which I believe I can use (https://github.com/aaronpenne/data_visualization/blob/master/population/data/USA_Population_of_States_US_Census_Intercensal_Tables_1917-2017.csv) but from here, how do I create a map out of it and connect it to my UFO sightings data? This dataset also doesn’t get properly imported when I try to upload it in Tableau, so I’m also having that issue.

Sorry if any of this sounds confusing I can clarify if needed. I just don’t know what to do I’ve tried asking ChatGPT and looking through Reddit and Tableau Community, but I’m still lost and need to submit this dashboard today :/

Thank you!

submitted by /u/communityboyfriend
[link] [comments]

Casia-Face-Africa Request And Recommendations

I’m struggling to find any datasets focused on black people faces. I’m trying to find something similar to Labeled faces in the wild (LFW) that includes several identities and a bunch of images for each identity. AFAIK CASIA-Face-Africa (http://www.cripacsir.cn/dataset/casia-face-africa/) is the only dataset meeting this criteria but they don’t seem to be responsive to the access request.

Could anyone share CASIA-Face-Africa? Or do you know of any similar datasets?

Thanks!

submitted by /u/Kimy31
[link] [comments]

I Desperately Need The ToN-IoT Dataset (no More Available)

Hi there!
As a cybersecurity fellow researching IoT attacks, I’ve been looking into various datasets such as CIC-IoT, IIoT, and Aposemat-23. However, I’m still in need of a dataset that includes both telemetry and network data.

I came across the ToN IoT dataset (https://research.unsw.edu.au/projects/toniot-datasets) which seems to be a perfect fit for my research needs. Unfortunately, it seems that the cloud storage previously used for this dataset has been decommissioned. (They should update this because it’s linked to the DOI, but until now it is still down)
However, I tried to contact them and unfortunately did not receive a response (I was brutally ghosted). If any of you Redditors happen to have downloaded the whole dataset, I would really appreciate it if we could arrange to exchange the data. Please comment, and I will contact you!

submitted by /u/Azakamar
[link] [comments]

Dataset With List Of All Mountain Climbers Who Have Died While Climbing.

I was reading about the first woman to summit mount Everest without supplemental oxygen and I started down the wikipedia rabbit hole.

I found this on wikipedia:

https://en.wikipedia.org/wiki/List_of_people_who_died_climbing_Mount_Everest

I was wondering if there’s a master dataset in csv of all the people who have died climbing any mountain along with their demographic info and cause of death?

I found this too

https://en.wikipedia.org/wiki/List_of_deaths_on_eight-thousanders

But I don’t want to have to wrangle the data mysef. It usually takes me ten times as long to data wrangle as it does to do any data analysis.

I’m planning to regress the cause of death onto the demographic variables in a logistic classification.

submitted by /u/Many-Wasabi9141
[link] [comments]

Football/Soccer Game Dataset With Worn Jerseys

Hi all,

I have been tried to search for a dataset but no luck.

I am looking at a way to see game statistics and associate them with the jersey color worn by the players and the goalkeepers. Unfortunately, seems that the almost totality of the databases only includes game results and statistics but no information about the jerseys.

Are you aware of any dataset? Or can you point me out to a website that has the jersey information and that I can subsequently merge with another set of data that includes the statistics?

Thank you all in advance

submitted by /u/stephdaedalus
[link] [comments]

Dataset Of People Habits With Hour-per-hour Info

Hey everyone,

Im looking for a dataset that has information about peoples habits that includes information of hour per hour. That is, have a column that is hour_of_day or similar, with values from 0-23 or 1-24, the other variables can be things such as tv watching, headphone usage, when someone goes for a walk, etc (basically 1 or 0).

I am basically looking for a dataset where I can predict when people will do a certain action given the time of day.

Can be synthetic or mock.

submitted by /u/162739
[link] [comments]

Looking For A Political Compass Questionnaire Dataset

Hello, I’m looking for a dataset which holds answers to the quesitons asked in the politcal compass test or another similar test.

I’m building a fuzzy associative rules generator which would basically find strong correlations between subsets of the columns of the dataset, e.g.

[Strong Agree] I’d always support my country, whether it was right or wrong. => [Strong Disagree] No one chooses their country of birth, so it’s foolish to be proud of it.

which could be interpreted as that if someone were to answer [Strong Agree] to the question on the left, they will most likely answer [Strong Disagree] on the question on the right. This might seem fairly obvious for this simple example but things get interesting quickly once you realize that any subset of columns may belong to the LHS or the RHS.

submitted by /u/Play4u
[link] [comments]

[Q] Can Anyone Point Me To A Database Covering Statistics For The Number Of Ecstasy Pills Consumed Annually In The USA?

Most measures for ecstasy, and probably a few other drugs too, collect information for the number of total users, ie; users per year, and the number of people who use ecstasy over the course of their lives.
I’m trying to find data on the number of pills (and other delivery system forms of ecstasy) consumed in the USA every year.

For instance, you might have 90% of the total annual users only representing having consumed 1 or 2 pills that year, but there are also frequent consumers of the drug. If only say 5% of total annual users are considered frequent consumers and take an average of 10 pills a year, then that 5% total might reflect a considerable amount of the total pills consumed; possibly a quarter or more of the total pills.

To look at it in tabled form I have given some general guesstimates of pills consumed by users to gauge the potential total number of ecstasy pills consumed per year in the USA.

Statistics reflect about an average of 0.8% of Americans having consumed ecstasy in the past year.

Number of pills consumed in a year (n) Group average n Percentage make up of total annual users Number of pills consumed (representing total users) Note 1 1 60 0.6 2 2 20 0.4 Very common to consume 2 pills in one session and may not represent users consuming ecstasy in different sessions 3 – 5 4 10 0.4 Might do this 3 or so times per year 6 – 20 12 10 0.4 Once per month 21 – 40 30 3 0.9 Uses most weekends Total number of pills consumed (representing total users) Average of 3.1 pills consumed per user

Table summary:

If 0.8% of Americans consume 3.1 pills per year then the number of pills equates to whatever 2.5%~ of the USA population.

The total number of ecstasy pills consumed in the USA per year would be 8.3 million based on a country population of 332 million.

Note:

As I said, my calculations are guesses, but I would imagine statistics would be broken up in a similar way.

submitted by /u/Bishopfruiting
[link] [comments]