Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Suitable Aligners To Create A Three-language Parallel Corpus?

Hey everyone! I’m currently working on my MA dissertation on Anthony Burgess’ “A Clockwork Orange”. My supervisor asked me to create a parallel corpus for the versions of the book which I’m analysing (source text in English, target text 1 in German, and target text 2 in Russian). The aim is to analyse the fictional language “Nadsat” and its translations into German and Russian. However, I have no previous experience with corpus linguistics, and therefore don’t know how to create a parallel corpus for three languages. I’ve been using Sketch Engine, for which I’ve started to align the texts manually, but it’s obviously taking ages, so I was wondering whether you could recommend any more efficient ways to align the three texts?

submitted by /u/mehhloni
[link] [comments]

Looking For A Free Instagram Dataset

i’m looking for instagram dataset, where i need at least for each
account, i need each post with number of followers at the moment of the
post and number people reach with each publication.
I don’t need personal information, only numeric value. The project is to
try to predict the number of people reach with the help of other data.
thx.

submitted by /u/yannis_heguy
[link] [comments]

What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

So far I’m using two python libraries

https://pypi.org/project/wikitextparser/ https://mwclient.readthedocs.io/en/latest/

to get pages from categories from a media Wiki architectured website (https://nethackwiki.com). However the parser that I’m using does not offer the ability to interpolate the templates

So I’m either stuck with plain text that removes all the templates and removes valuable data, or I have the raw contents that still have all of the templating syntax.

I have no desire to write an interpolation parsing engine, is my only option to go in and strip the syntax manually?

submitted by /u/ArthurFischel
[link] [comments]

Villages, Cities, States, Countries Database Of The World, And Crops Grown In That Country

I have found a few DBs,
https://github.com/dr5hn/countries-states-cities-database

https://simplemaps.com/data/world-cities

But, I was wondering if there existed better DBs for the same. Specially the crops that are grown in a specific country, the fao one is very broadly defined, for example fruits and vegetables are just classified as fruits and vegetables but I want them to be exhaustible.

submitted by /u/P_H_i_X
[link] [comments]

Movie’s Explicit Content – Scraped Data From VidAngel

https://www.kaggle.com/datasets/benjameeper/movie-violencesexprofanity-data

I scraped and aggregated content filters for 1,700 movies from VidAngel. I think there is some good potential in this data to evaluate how well movie ratings (PG, PG-13, R etc) describe how much explicit content a movie contains.

My data analysis skills only took me so far, I would love to see what insights other people can dig up. Let me know if you think more granularity in the data is needed (number of f-word occurrences, etc.)

submitted by /u/stringofsense
[link] [comments]

Building A Dataset Indexing Platform – Love To Get Feedback

Hi, I am currently building a dataset indexing platform. The purpose is to enable users to list and find datasets more easily as compared to existing options such as Kaggle and Google Dataset Search. As a dataset owner, you can freely list your valuable data; as a dataset user, you can have an effective and exploratory search experience.

I love to get feedback from this community and/or schedule a 1:1 session to find out more about how you currently list or search for datasets and share with you our idea, which is to tokenize the dataset and store the dataset’s attributes as metadata for easy indexing. I am also looking for early adopters – applicable to anyone who has data or is searching for data!

Anyone who is keen to explore further, please let me know. Thank you.

submitted by /u/bdx_cbtan
[link] [comments]

Severe Lack Of Data For My Reaserch Project, Wind And Solar Including Coordinates

Hi guys,

Ive never posted on this but im pretty desperate right now. Im doing a reserach project where im using ML algorithms to classify sites for renewable energy potential. Ive searched everywhere and even tried making an api requester code in python but with the amount of data I need (50-100k rows) it would take waaay to long. So I come here to ask if anyone has a dataset with lat and lon, wind speed, direction, at minimum. Pressure and temp would be nice as well if possible. For solar, GHI and DNI, and maybe lateral tilt. But I want it to have random lat and lon coordinates not all in one spot.

Please guys, i need your help

dm me if you need more info

submitted by /u/phoenixducky1
[link] [comments]

Cement Factory Enegry Emmission, Electrical

I am doing a research on the energy emissions of cement plants and I need data on this. Where can I find it.

I need energy emissions suitable for any sectoral distribution. When I increased in the subreddit, I found only one website, but still, if there is a higher quality data set, I would like to obtain it as well.

submitted by /u/hyyperi
[link] [comments]

Finding 3D Non-Image Datasets Online

Recently, I’ve been exploring the area of 3-dimensional data in machine learning. By that, I mean arrays with shape (x, x, x). As an example:

All the numbers are randomized, but hopefully, this will give you a gist of what I’m looking for

I have only encountered image datasets in my search, which I am not looking for. In addition, I want to find data already in three dimensions instead of two-dimensional time series data that can be made into three-dimensional data. Where could I find datasets like the ones I’m looking for?

Links or search terms would be greatly appreciated.

submitted by /u/Figsups
[link] [comments]

Common Aisles To Find Grocery Store Item

As the title suggests, I’m looking for a dataset that provides the grocery item and maybe the most common aisle it’s found in, followed by the potentially the next most common aisle.

Ideally it’s something like item, category, image, aisle_1, aisle_2.

If something like that doesn’t exist, an acceptable alternative would be in paragraph form like the example below.

Tahini
In most grocery stores, tahini is either in the aisle with other condiments like peanut butter or in the aisle with international foods. You can also find it at a specialty or Middle Eastern grocery. It is sold shelf-stable in glass or plastic jars and is not refrigerated.

submitted by /u/yankpat9
[link] [comments]

Are There Any Arbitrage Opportunities For Datasets?

Doing some research for a project I am working on and started thinking:

What are the different types of proprietary data that can be accessed more cheaply in other geographies?

Why is it hard to access that data in the US/UK and not anywhere else? Is it because the data creator has a monopoly? Or are there regulatory issues? Is the cost too high to gather and store?

Any advice, leads, or tips would be greatly appreciated!!

submitted by /u/young-litty
[link] [comments]

Unlocking The Power Of Data Management Analytics Services 🗝️📊

Hey, Reddit community!

I stumbled upon a game-changer for businesses striving to harness the full potential of their data – Data Management Analytics Services! SG Analytics has put together an insightful article shedding light on how these services can revolutionize the way organizations handle and utilize their data.

🔗 Link: Data Management Analytics Services

In this comprehensive blog post, you’ll explore:

🗝️ The key components of robust data management strategies. 📊 How analytics-driven data management can optimize decision-making processes. 💼 Real-life examples of companies benefiting from data-driven insights. 🌐 The role of data management in enhancing overall business efficiency.

Whether you’re a data enthusiast, a business owner, or an aspiring analyst, this read will undoubtedly provide valuable knowledge and fresh perspectives.

Let’s engage in a discussion about the significance of data management in today’s fast-paced world. Share your thoughts, questions, and experiences in the comments below. Don’t forget to upvote if you find this topic as exciting as I do – let’s bring this valuable information to more people’s attention!

Stay curious and data-driven! 🗝️📊

submitted by /u/David_starc150
[link] [comments]