Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Big Data Analytics And Its Significance For Top Edtech Organizations

What is Big Data Analytics for the Edtech Industry?

Big data is an extensive data volume containing texts, images, sounds, audiovisuals, and program-specific files. Businesses manage these ever-expanding databases using data warehouses or lakes. Big data analytics involves determining a recurring pattern based on those repositories’ structured, semi-structured, and unstructured data objects.

Meanwhile, Educational technology, or Edtech, encompasses all the software and hardware innovations that teachers, corporate trainers, and students employ to streamline academic or professional training activities. Therefore, brands working in automated translation, virtual reality (VR) laboratories, or e-libraries are also EdTech businesses.

Still, integrating edtech tools and big data analytics trends varies from company to company. For instance, a smartboard developer will likely use analytical models to explore how educators and learners interact with their hardware. If a company offers remote skill development opportunities, it can employ marketing analytics to attract more students. https://us.sganalytics.com/blog/top-edtech-companies-using-big-data-analytics/

submitted by /u/Beautiful-Ad-7743
[link] [comments]

[self-promotion] Online Multi Video Saver | Comprehensive Video Organization

I have created a free Centralized Lightweight Digital software program that lets you to save your favorite internet video links so that you may easily retrieve them later. You can Use it to keep track of any type of link that you need to keep track of. That is, you do not need to have separate accounts and playlists for each tube site you visit. Using the url you specify, this software takes the image of the website you wish to preserve, you may then provide a searchable title. For simpler navigation, the stored videos can be shuffled and reversed. You may name the videos whatever you like, making it simpler to find them than depending on the original title. Furthermore, the playlist is superior even if you only use it for one site.

https://www.adult-video-saver.com/

submitted by /u/Luktred
[link] [comments]

Accessing The CFEE Dataset For Compound Emotion Recognition

I’m trying to access the CFEE Dataset to classify compound facial emotions from this website: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. I have filled up the request form, and have been granted access. However, when I click the download dataset link, the page times out.

I’ve tried contacting the people mentioned on their website, but to no avail. Is there something I can do?

submitted by /u/grumpyowlgirl
[link] [comments]

LLM Training With PHP Improved Using Txt Datasets!

Hi guys how are you doing?
last week I share my first version of this simple Languaje model training with php.

For thoose who missed, it use a simple Markov Chain for calculate the probabilities for the next word based on the previous words.

Now I have improved the training dataset and the next word selector.

Here’s is the link:

https://github.com/AcidBurn86/LM-nGram-with-php/

is a good way to start understand how big LLM works. And of course I know this could never perform like GPT or Llama.

Is just an educational code for php fans.

Shares and github stars are welcome!

submitted by /u/OficialPimento
[link] [comments]

Seeking Website With Reviews On Place Accessibility For People With Disabilities

I’m currently searching for a website that provides user reviews and data on the accessibility of places for people with disabilities. I’m interested in finding information about the accessibility of places like airlines, hotels, and restaurants.

Are there any websites that offer comprehensive reviews, ratings, or details about the accessibility of various locations? If the website has an API to access the data, that would be great.

Thank you in advance.

submitted by /u/19datascientist
[link] [comments]

Ethical Big Data Providers/Sources – Request For Help!

Apologies for cross-posting but can’t seem to find answers anywhere…I work as a researcher and have been wondering if anyone knew of big data providers/sources which are nationally representative (country agnostic) and work within ethical collection parameters.
We use YouGov a lot and outside of panel providers, we’ve tried to do market mapping but find that most companies have ML/AI aided profile scraping which classifies people by gender/race etc. which is problematic, as it’s just physiognomy/face reading and subject to its own biases and reproduces the inequalities we’re working hard to lessen. I’ve been searching but can’t seem to find anything conclusive in the way of providers or datasets.
Anyone know of anything that could help?
TL;DR – There’s a lot of structural inequalities around collecting data (as per Data Feminism) and was wondering how I can collect data with greater sensitivity to ethics, but at relative scale?

submitted by /u/Mundane-Mark2403
[link] [comments]

Accessing The CFEE Dataset For Emotion Recognition

0

I’m trying to access the CFEE Dataset to classify compound facial emotions from this website: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. I have filled up the request form, and have been granted access. However, when I click the download dataset link, the page times out. I am fairly new to these kinds of things, so I apologize if my question comes off as kind of lame.

I’ve tried contacting the people mentioned on their website, but to no avail. Is there something I can do?

submitted by /u/grumpyowlgirl
[link] [comments]

What Is Data Trust? How Do You Build Trust With Data Across Your Company?

Today data is becoming an integral part of a company’s operations. And trust in data is becoming an increasingly fraught issue, especially while gathering more information.

The first step in building trust in data is ensuring that the data supply chain is aware of the critical role data plays in every operation. The second step is to focus on accuracy and eliminate the restrictions on the volume of data required for accuracy and performance needed for actions. The third step involves building trust to ensure that all analytics and actions that are taken can be replicated and proven.

The power of a data-driven world is likely to outweigh the risks and encourage trust in the technology that makes innovation possible. For organizations to walk the walk on data, they should start trusting their analytics, promoting the kind of data-focused operation that every business needs.

Continue Reading – https://us.sganalytics.com/blog/what-is-data-trust-and-how-do-you-build-trust-with-data/

submitted by /u/Beautiful-Ad-7743
[link] [comments]

Suitable Aligners To Create A Three-language Parallel Corpus?

Hey everyone! I’m currently working on my MA dissertation on Anthony Burgess’ “A Clockwork Orange”. My supervisor asked me to create a parallel corpus for the versions of the book which I’m analysing (source text in English, target text 1 in German, and target text 2 in Russian). The aim is to analyse the fictional language “Nadsat” and its translations into German and Russian. However, I have no previous experience with corpus linguistics, and therefore don’t know how to create a parallel corpus for three languages. I’ve been using Sketch Engine, for which I’ve started to align the texts manually, but it’s obviously taking ages, so I was wondering whether you could recommend any more efficient ways to align the three texts?

submitted by /u/mehhloni
[link] [comments]

Looking For A Free Instagram Dataset

i’m looking for instagram dataset, where i need at least for each
account, i need each post with number of followers at the moment of the
post and number people reach with each publication.
I don’t need personal information, only numeric value. The project is to
try to predict the number of people reach with the help of other data.
thx.

submitted by /u/yannis_heguy
[link] [comments]

What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

So far I’m using two python libraries

https://pypi.org/project/wikitextparser/ https://mwclient.readthedocs.io/en/latest/

to get pages from categories from a media Wiki architectured website (https://nethackwiki.com). However the parser that I’m using does not offer the ability to interpolate the templates

So I’m either stuck with plain text that removes all the templates and removes valuable data, or I have the raw contents that still have all of the templating syntax.

I have no desire to write an interpolation parsing engine, is my only option to go in and strip the syntax manually?

submitted by /u/ArthurFischel
[link] [comments]

Villages, Cities, States, Countries Database Of The World, And Crops Grown In That Country

I have found a few DBs,
https://github.com/dr5hn/countries-states-cities-database

https://simplemaps.com/data/world-cities

But, I was wondering if there existed better DBs for the same. Specially the crops that are grown in a specific country, the fao one is very broadly defined, for example fruits and vegetables are just classified as fruits and vegetables but I want them to be exhaustible.

submitted by /u/P_H_i_X
[link] [comments]

Movie’s Explicit Content – Scraped Data From VidAngel

https://www.kaggle.com/datasets/benjameeper/movie-violencesexprofanity-data

I scraped and aggregated content filters for 1,700 movies from VidAngel. I think there is some good potential in this data to evaluate how well movie ratings (PG, PG-13, R etc) describe how much explicit content a movie contains.

My data analysis skills only took me so far, I would love to see what insights other people can dig up. Let me know if you think more granularity in the data is needed (number of f-word occurrences, etc.)

submitted by /u/stringofsense
[link] [comments]

Building A Dataset Indexing Platform – Love To Get Feedback

Hi, I am currently building a dataset indexing platform. The purpose is to enable users to list and find datasets more easily as compared to existing options such as Kaggle and Google Dataset Search. As a dataset owner, you can freely list your valuable data; as a dataset user, you can have an effective and exploratory search experience.

I love to get feedback from this community and/or schedule a 1:1 session to find out more about how you currently list or search for datasets and share with you our idea, which is to tokenize the dataset and store the dataset’s attributes as metadata for easy indexing. I am also looking for early adopters – applicable to anyone who has data or is searching for data!

Anyone who is keen to explore further, please let me know. Thank you.

submitted by /u/bdx_cbtan
[link] [comments]