Where Do U Guys Find Datasets For Ur Projects?

Hi everyone! I’m new in the data field. So, I’m confused quite alot, where to find datasets? ik about kaggle nd Google dataset search engine BUT what are the other resources where u guys get datasets?

submitted by /u/Emergency_Island_668
[link] [comments]

0

B2B Partnership For Building An Awesome AI Video Training Data Set

We’re exploring partnerships with companies that already have massive video data catalogues, covering everything from news content to home shopping, YouTube influencers, and sports. People-oriented content is our focus – think talking to the camera and engaging conversations.

submitted by /u/C0tt
[link] [comments]

0

Dataset For The Titles Of Reddit Posts

I refuse to believe there is no dataset for the titles of reddit posts, caveat, they should have been contemporaneously captured.

Is there a dataset that shows, day by day, or hour by hour, the titles in context for some top subs?

submitted by /u/OH-YEAH
[link] [comments]

0

Big Data Analytics And Its Significance For Top Edtech Organizations

What is Big Data Analytics for the Edtech Industry?

Big data is an extensive data volume containing texts, images, sounds, audiovisuals, and program-specific files. Businesses manage these ever-expanding databases using data warehouses or lakes. Big data analytics involves determining a recurring pattern based on those repositories’ structured, semi-structured, and unstructured data objects.

Meanwhile, Educational technology, or Edtech, encompasses all the software and hardware innovations that teachers, corporate trainers, and students employ to streamline academic or professional training activities. Therefore, brands working in automated translation, virtual reality (VR) laboratories, or e-libraries are also EdTech businesses.

Still, integrating edtech tools and big data analytics trends varies from company to company. For instance, a smartboard developer will likely use analytical models to explore how educators and learners interact with their hardware. If a company offers remote skill development opportunities, it can employ marketing analytics to attract more students. https://us.sganalytics.com/blog/top-edtech-companies-using-big-data-analytics/

submitted by /u/Beautiful-Ad-7743
[link] [comments]

0

Looking For Dataset That Contains Food Ingredients

Think of a kitchen counter with salt, butter, chicken, lettuce, etc on it. Looking to train an object detection model that can recognize common ingredients, not complete dishes.

submitted by /u/d_Milt
[link] [comments]

0

Dataset For Ships Flashing Morse Code Or Lights On And Off – Help!

I am creating synthetic data by stitching images of a ship with lights on or off for a LSTM model that can decode morse. I can’t find any publicly available dataset, please help

submitted by /u/BANANATHEGREAT
[link] [comments]

0

[self-promotion] Online Multi Video Saver | Comprehensive Video Organization

I have created a free Centralized Lightweight Digital software program that lets you to save your favorite internet video links so that you may easily retrieve them later. You can Use it to keep track of any type of link that you need to keep track of. That is, you do not need to have separate accounts and playlists for each tube site you visit. Using the url you specify, this software takes the image of the website you wish to preserve, you may then provide a searchable title. For simpler navigation, the stored videos can be shuffled and reversed. You may name the videos whatever you like, making it simpler to find them than depending on the original title. Furthermore, the playlist is superior even if you only use it for one site.

https://www.adult-video-saver.com/

submitted by /u/Luktred
[link] [comments]

0

Accessing The CFEE Dataset For Compound Emotion Recognition

I’m trying to access the CFEE Dataset to classify compound facial emotions from this website: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. I have filled up the request form, and have been granted access. However, when I click the download dataset link, the page times out.

I’ve tried contacting the people mentioned on their website, but to no avail. Is there something I can do?

submitted by /u/grumpyowlgirl
[link] [comments]

0

Data Request For Medicine In The United States

Does someone knows where I could find the drug sales by quantity sold with prices in United States.

Thanks a lot.

submitted by /u/fuseraga
[link] [comments]

0

LLM Training With PHP Improved Using Txt Datasets!

Hi guys how are you doing?
last week I share my first version of this simple Languaje model training with php.

For thoose who missed, it use a simple Markov Chain for calculate the probabilities for the next word based on the previous words.

Now I have improved the training dataset and the next word selector.

Here’s is the link:

https://github.com/AcidBurn86/LM-nGram-with-php/

is a good way to start understand how big LLM works. And of course I know this could never perform like GPT or Llama.

Is just an educational code for php fans.

Shares and github stars are welcome!

submitted by /u/OficialPimento
[link] [comments]

0

Seeking Website With Reviews On Place Accessibility For People With Disabilities

I’m currently searching for a website that provides user reviews and data on the accessibility of places for people with disabilities. I’m interested in finding information about the accessibility of places like airlines, hotels, and restaurants.

Are there any websites that offer comprehensive reviews, ratings, or details about the accessibility of various locations? If the website has an API to access the data, that would be great.

Thank you in advance.

submitted by /u/19datascientist
[link] [comments]

0

Ethical Big Data Providers/Sources – Request For Help!

Apologies for cross-posting but can’t seem to find answers anywhere…I work as a researcher and have been wondering if anyone knew of big data providers/sources which are nationally representative (country agnostic) and work within ethical collection parameters.
We use YouGov a lot and outside of panel providers, we’ve tried to do market mapping but find that most companies have ML/AI aided profile scraping which classifies people by gender/race etc. which is problematic, as it’s just physiognomy/face reading and subject to its own biases and reproduces the inequalities we’re working hard to lessen. I’ve been searching but can’t seem to find anything conclusive in the way of providers or datasets.
Anyone know of anything that could help?
TL;DR – There’s a lot of structural inequalities around collecting data (as per Data Feminism) and was wondering how I can collect data with greater sensitivity to ethics, but at relative scale?

submitted by /u/Mundane-Mark2403
[link] [comments]

0

Command{Extraction | Transformation ~ Load}

https://github.com/hkpeaks/peaks-consolidation

submitted by /u/100GB-CSV
[link] [comments]

0

Accessing The CFEE Dataset For Emotion Recognition

0

I’m trying to access the CFEE Dataset to classify compound facial emotions from this website: http://cbcsl.ece.ohio-state.edu/dbform_compound.html. I have filled up the request form, and have been granted access. However, when I click the download dataset link, the page times out. I am fairly new to these kinds of things, so I apologize if my question comes off as kind of lame.

I’ve tried contacting the people mentioned on their website, but to no avail. Is there something I can do?

submitted by /u/grumpyowlgirl
[link] [comments]

0

What Is Data Trust? How Do You Build Trust With Data Across Your Company?

Today data is becoming an integral part of a company’s operations. And trust in data is becoming an increasingly fraught issue, especially while gathering more information.

The first step in building trust in data is ensuring that the data supply chain is aware of the critical role data plays in every operation. The second step is to focus on accuracy and eliminate the restrictions on the volume of data required for accuracy and performance needed for actions. The third step involves building trust to ensure that all analytics and actions that are taken can be replicated and proven.

The power of a data-driven world is likely to outweigh the risks and encourage trust in the technology that makes innovation possible. For organizations to walk the walk on data, they should start trusting their analytics, promoting the kind of data-focused operation that every business needs.

Continue Reading – https://us.sganalytics.com/blog/what-is-data-trust-and-how-do-you-build-trust-with-data/

submitted by /u/Beautiful-Ad-7743
[link] [comments]

0

A List Of Forums Where The Data Communities Hang Out

does anyone have a list? other than r/datasets of course.

submitted by /u/bdx_cbtan
[link] [comments]

0

Suitable Aligners To Create A Three-language Parallel Corpus?

Hey everyone! I’m currently working on my MA dissertation on Anthony Burgess’ “A Clockwork Orange”. My supervisor asked me to create a parallel corpus for the versions of the book which I’m analysing (source text in English, target text 1 in German, and target text 2 in Russian). The aim is to analyse the fictional language “Nadsat” and its translations into German and Russian. However, I have no previous experience with corpus linguistics, and therefore don’t know how to create a parallel corpus for three languages. I’ve been using Sketch Engine, for which I’ve started to align the texts manually, but it’s obviously taking ages, so I was wondering whether you could recommend any more efficient ways to align the three texts?

submitted by /u/mehhloni
[link] [comments]

0

How Do You Usually Use The Filter/keywords To Find The Dataset You Want On Sites Like Kaggle And Google Dataset Search?

trying to find out how to get effective results, instead of the most commonly used datasets.

submitted by /u/bdx_cbtan
[link] [comments]

0

Looking For A Free Instagram Dataset

i’m looking for instagram dataset, where i need at least for each
account, i need each post with number of followers at the moment of the
post and number people reach with each publication.
I don’t need personal information, only numeric value. The project is to
try to predict the number of people reach with the help of other data.
thx.

submitted by /u/yannis_heguy
[link] [comments]

0

Does Anyone Have The “Great American Word Mapper” Dataset

The website is defunct but it says “If you’d like to dive deeper, the full dataset of words and their values for each county is available for download here.”

I assume at least a few people have downloaded this full dataset on their hard drives or in a cloud. Is there anyone here who does have it and could share?

submitted by /u/Inevitable-Bath9142
[link] [comments]

0

Top 20 European Clubs All Time Players From 2000s Dataset

Looking for the data set of players that played for top european clubs from 2000s, any website where i can get the csvs. Thanks

submitted by /u/hopefull420
[link] [comments]

0

What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

So far I’m using two python libraries

https://pypi.org/project/wikitextparser/ https://mwclient.readthedocs.io/en/latest/

to get pages from categories from a media Wiki architectured website (https://nethackwiki.com). However the parser that I’m using does not offer the ability to interpolate the templates

So I’m either stuck with plain text that removes all the templates and removes valuable data, or I have the raw contents that still have all of the templating syntax.

I have no desire to write an interpolation parsing engine, is my only option to go in and strip the syntax manually?

submitted by /u/ArthurFischel
[link] [comments]

0

Step-by-Step Guide To Preparing Datasets For Object Detection In Video And Images: A Detailed Analysis

submitted by /u/moseich
[link] [comments]

0

Villages, Cities, States, Countries Database Of The World, And Crops Grown In That Country

I have found a few DBs,
https://github.com/dr5hn/countries-states-cities-database

https://simplemaps.com/data/world-cities

But, I was wondering if there existed better DBs for the same. Specially the crops that are grown in a specific country, the fao one is very broadly defined, for example fruits and vegetables are just classified as fruits and vegetables but I want them to be exhaustible.

submitted by /u/P_H_i_X
[link] [comments]

0

Movie’s Explicit Content – Scraped Data From VidAngel

https://www.kaggle.com/datasets/benjameeper/movie-violencesexprofanity-data

I scraped and aggregated content filters for 1,700 movies from VidAngel. I think there is some good potential in this data to evaluate how well movie ratings (PG, PG-13, R etc) describe how much explicit content a movie contains.

My data analysis skills only took me so far, I would love to see what insights other people can dig up. Let me know if you think more granularity in the data is needed (number of f-word occurrences, etc.)

submitted by /u/stringofsense
[link] [comments]

0

[self-promotion] Text And Metadata Of US Patents For Fine-tuning, Training, & Inference Of LLMs

Cybersyn just launched a free dataset that includes text and metadata for patents granted by the USPTO. We’ve continued seeing strong demand for document/text based datasets – more to come.

submitted by /u/aiatco2
[link] [comments]

0

Diversify.fyi – A Dashboard Of USA Employee Gender And Race Statistics For 20,000+ Companies

https://www.diversify.fyi

The information is gathered from company-reported diversity reports (mainly EEO-1 data). Most of the raw data displayed in the site was originally from here: https://www.dol.gov/agencies/ofccp/foia/library/Employment-Information-Reports

In full disclosure, I created the site, but it is completely free.

submitted by /u/teamongered
[link] [comments]

0

I Need A Dataset For Beat Counting In Songs, Where Can I Get A Labelled Dataset ?

Also, are there any pre-trained models that can do this which i can get access to for free or for not a lot of money??

submitted by /u/ProblemGupta
[link] [comments]

0

Building A Dataset Indexing Platform – Love To Get Feedback

Hi, I am currently building a dataset indexing platform. The purpose is to enable users to list and find datasets more easily as compared to existing options such as Kaggle and Google Dataset Search. As a dataset owner, you can freely list your valuable data; as a dataset user, you can have an effective and exploratory search experience.

I love to get feedback from this community and/or schedule a 1:1 session to find out more about how you currently list or search for datasets and share with you our idea, which is to tokenize the dataset and store the dataset’s attributes as metadata for easy indexing. I am also looking for early adopters – applicable to anyone who has data or is searching for data!

Anyone who is keen to explore further, please let me know. Thank you.

submitted by /u/bdx_cbtan
[link] [comments]

0

New Tools Added To Our List Of Open Source Tools In Data Centric AI

submitted by /u/AdventurousSea4079
[link] [comments]

0

Category: Datatards

Where Do U Guys Find Datasets For Ur Projects?

B2B Partnership For Building An Awesome AI Video Training Data Set

Dataset For The Titles Of Reddit Posts

Big Data Analytics And Its Significance For Top Edtech Organizations

What is Big Data Analytics for the Edtech Industry?

Looking For Dataset That Contains Food Ingredients

Dataset For Ships Flashing Morse Code Or Lights On And Off – Help!

[self-promotion] Online Multi Video Saver | Comprehensive Video Organization

Accessing The CFEE Dataset For Compound Emotion Recognition

Data Request For Medicine In The United States

LLM Training With PHP Improved Using Txt Datasets!

Seeking Website With Reviews On Place Accessibility For People With Disabilities

Ethical Big Data Providers/Sources – Request For Help!

Command{Extraction | Transformation ~ Load}

Accessing The CFEE Dataset For Emotion Recognition

What Is Data Trust? How Do You Build Trust With Data Across Your Company?

A List Of Forums Where The Data Communities Hang Out

Suitable Aligners To Create A Three-language Parallel Corpus?

How Do You Usually Use The Filter/keywords To Find The Dataset You Want On Sites Like Kaggle And Google Dataset Search?

Looking For A Free Instagram Dataset

Does Anyone Have The “Great American Word Mapper” Dataset

Top 20 European Clubs All Time Players From 2000s Dataset

What Is The Best Way To Get Information Off Of A Wiki For Natural Language Processing?

Step-by-Step Guide To Preparing Datasets For Object Detection In Video And Images: A Detailed Analysis

Villages, Cities, States, Countries Database Of The World, And Crops Grown In That Country

Movie’s Explicit Content – Scraped Data From VidAngel

[self-promotion] Text And Metadata Of US Patents For Fine-tuning, Training, & Inference Of LLMs

Diversify.fyi – A Dashboard Of USA Employee Gender And Race Statistics For 20,000+ Companies

I Need A Dataset For Beat Counting In Songs, Where Can I Get A Labelled Dataset ?

Building A Dataset Indexing Platform – Love To Get Feedback

New Tools Added To Our List Of Open Source Tools In Data Centric AI

Recent Posts

Recent Comments

18+ Content

What is Big Data Analytics for the Edtech Industry?

Recent Posts

Recent Comments