Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Seeking Website With Reviews On Place Accessibility For People With Disabilities

I’m currently searching for a website that provides user reviews and data on the accessibility of places for people with disabilities. I’m interested in finding information about the accessibility of places like airlines, hotels, and restaurants.

Are there any websites that offer comprehensive reviews, ratings, or details about the accessibility of various locations? If the website has an API to access the data, that would be great.

Thank you in advance.

submitted by /u/19datascientist
[link] [comments]

Ethical Big Data Providers/Sources – Request For Help!

Apologies for cross-posting but can’t seem to find answers anywhere…I work as a researcher and have been wondering if anyone knew of big data providers/sources which are nationally representative (country agnostic) and work within ethical collection parameters.
We use YouGov a lot and outside of panel providers, we’ve tried to do market mapping but find that most companies have ML/AI aided profile scraping which classifies people by gender/race etc. which is problematic, as it’s just physiognomy/face reading and subject to its own biases and reproduces the inequalities we’re working hard to lessen. I’ve been searching but can’t seem to find anything conclusive in the way of providers or datasets.
Anyone know of anything that could help?
TL;DR – There’s a lot of structural inequalities around collecting data (as per Data Feminism) and was wondering how I can collect data with greater sensitivity to ethics, but at relative scale?

submitted by /u/Mundane-Mark2403
[link] [comments]

What Is Data Trust? How Do You Build Trust With Data Across Your Company?

Today data is becoming an integral part of a company’s operations. And trust in data is becoming an increasingly fraught issue, especially while gathering more information.

The first step in building trust in data is ensuring that the data supply chain is aware of the critical role data plays in every operation. The second step is to focus on accuracy and eliminate the restrictions on the volume of data required for accuracy and performance needed for actions. The third step involves building trust to ensure that all analytics and actions that are taken can be replicated and proven.

The power of a data-driven world is likely to outweigh the risks and encourage trust in the technology that makes innovation possible. For organizations to walk the walk on data, they should start trusting their analytics, promoting the kind of data-focused operation that every business needs.

Continue Reading – https://us.sganalytics.com/blog/what-is-data-trust-and-how-do-you-build-trust-with-data/

submitted by /u/Beautiful-Ad-7743
[link] [comments]

Accessing The CFEE Dataset For Emotion Recognition

0

I’m trying to access the CFEE Dataset to classify compound facial emotions from this website: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. I have filled up the request form, and have been granted access. However, when I click the download dataset link, the page times out. I am fairly new to these kinds of things, so I apologize if my question comes off as kind of lame.

I’ve tried contacting the people mentioned on their website, but to no avail. Is there something I can do?

submitted by /u/grumpyowlgirl
[link] [comments]

Suitable Aligners To Create A Three-language Parallel Corpus?

Hey everyone! I’m currently working on my MA dissertation on Anthony Burgess’ “A Clockwork Orange”. My supervisor asked me to create a parallel corpus for the versions of the book which I’m analysing (source text in English, target text 1 in German, and target text 2 in Russian). The aim is to analyse the fictional language “Nadsat” and its translations into German and Russian. However, I have no previous experience with corpus linguistics, and therefore don’t know how to create a parallel corpus for three languages. I’ve been using Sketch Engine, for which I’ve started to align the texts manually, but it’s obviously taking ages, so I was wondering whether you could recommend any more efficient ways to align the three texts?

submitted by /u/mehhloni
[link] [comments]

Looking For A Free Instagram Dataset

i’m looking for instagram dataset, where i need at least for each
account, i need each post with number of followers at the moment of the
post and number people reach with each publication.
I don’t need personal information, only numeric value. The project is to
try to predict the number of people reach with the help of other data.
thx.

submitted by /u/yannis_heguy
[link] [comments]

What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

So far I’m using two python libraries

https://pypi.org/project/wikitextparser/ https://mwclient.readthedocs.io/en/latest/

to get pages from categories from a media Wiki architectured website (https://nethackwiki.com). However the parser that I’m using does not offer the ability to interpolate the templates

So I’m either stuck with plain text that removes all the templates and removes valuable data, or I have the raw contents that still have all of the templating syntax.

I have no desire to write an interpolation parsing engine, is my only option to go in and strip the syntax manually?

submitted by /u/ArthurFischel
[link] [comments]

Villages, Cities, States, Countries Database Of The World, And Crops Grown In That Country

I have found a few DBs,
https://github.com/dr5hn/countries-states-cities-database

https://simplemaps.com/data/world-cities

But, I was wondering if there existed better DBs for the same. Specially the crops that are grown in a specific country, the fao one is very broadly defined, for example fruits and vegetables are just classified as fruits and vegetables but I want them to be exhaustible.

submitted by /u/P_H_i_X
[link] [comments]

Movie’s Explicit Content – Scraped Data From VidAngel

https://www.kaggle.com/datasets/benjameeper/movie-violencesexprofanity-data

I scraped and aggregated content filters for 1,700 movies from VidAngel. I think there is some good potential in this data to evaluate how well movie ratings (PG, PG-13, R etc) describe how much explicit content a movie contains.

My data analysis skills only took me so far, I would love to see what insights other people can dig up. Let me know if you think more granularity in the data is needed (number of f-word occurrences, etc.)

submitted by /u/stringofsense
[link] [comments]

Building A Dataset Indexing Platform – Love To Get Feedback

Hi, I am currently building a dataset indexing platform. The purpose is to enable users to list and find datasets more easily as compared to existing options such as Kaggle and Google Dataset Search. As a dataset owner, you can freely list your valuable data; as a dataset user, you can have an effective and exploratory search experience.

I love to get feedback from this community and/or schedule a 1:1 session to find out more about how you currently list or search for datasets and share with you our idea, which is to tokenize the dataset and store the dataset’s attributes as metadata for easy indexing. I am also looking for early adopters – applicable to anyone who has data or is searching for data!

Anyone who is keen to explore further, please let me know. Thank you.

submitted by /u/bdx_cbtan
[link] [comments]

Severe Lack Of Data For My Reaserch Project, Wind And Solar Including Coordinates

Hi guys,

Ive never posted on this but im pretty desperate right now. Im doing a reserach project where im using ML algorithms to classify sites for renewable energy potential. Ive searched everywhere and even tried making an api requester code in python but with the amount of data I need (50-100k rows) it would take waaay to long. So I come here to ask if anyone has a dataset with lat and lon, wind speed, direction, at minimum. Pressure and temp would be nice as well if possible. For solar, GHI and DNI, and maybe lateral tilt. But I want it to have random lat and lon coordinates not all in one spot.

Please guys, i need your help

dm me if you need more info

submitted by /u/phoenixducky1
[link] [comments]

Cement Factory Enegry Emmission, Electrical

I am doing a research on the energy emissions of cement plants and I need data on this. Where can I find it.

I need energy emissions suitable for any sectoral distribution. When I increased in the subreddit, I found only one website, but still, if there is a higher quality data set, I would like to obtain it as well.

submitted by /u/hyyperi
[link] [comments]

Finding 3D Non-Image Datasets Online

Recently, I’ve been exploring the area of 3-dimensional data in machine learning. By that, I mean arrays with shape (x, x, x). As an example:

All the numbers are randomized, but hopefully, this will give you a gist of what I’m looking for

I have only encountered image datasets in my search, which I am not looking for. In addition, I want to find data already in three dimensions instead of two-dimensional time series data that can be made into three-dimensional data. Where could I find datasets like the ones I’m looking for?

Links or search terms would be greatly appreciated.

submitted by /u/Figsups
[link] [comments]

Common Aisles To Find Grocery Store Item

As the title suggests, I’m looking for a dataset that provides the grocery item and maybe the most common aisle it’s found in, followed by the potentially the next most common aisle.

Ideally it’s something like item, category, image, aisle_1, aisle_2.

If something like that doesn’t exist, an acceptable alternative would be in paragraph form like the example below.

Tahini
In most grocery stores, tahini is either in the aisle with other condiments like peanut butter or in the aisle with international foods. You can also find it at a specialty or Middle Eastern grocery. It is sold shelf-stable in glass or plastic jars and is not refrigerated.

submitted by /u/yankpat9
[link] [comments]

Are There Any Arbitrage Opportunities For Datasets?

Doing some research for a project I am working on and started thinking:

What are the different types of proprietary data that can be accessed more cheaply in other geographies?

Why is it hard to access that data in the US/UK and not anywhere else? Is it because the data creator has a monopoly? Or are there regulatory issues? Is the cost too high to gather and store?

Any advice, leads, or tips would be greatly appreciated!!

submitted by /u/young-litty
[link] [comments]