Dataset For NILM Disagregation Project

Hi, i am searching for datasets for my nilm disaggregation project , but all the links i found are down.Can anyone share a link or send a dataset to me ?

submitted by /u/marouska91
[link] [comments]

0

Political Party Co-Preference Dataset

I’m running simulations of ranked-choice and other voting methods and I want to find a survey-supported dataset of related preferences between US politial parties. e.g. people who prefer the green party have some proportional preference for the democratic party. I would also accept a survey-supported metric or principal component analysis on quantitative or qualitative e.g. a political spectrum which captures meaningful variations of preference in survey samples. I would very strongly prefer non-partizan research, however if that is simply not possible to find, it would be at least necessary to find studies from multiple partizan organizations to compare.

(I’m also looking to learn more about who is doing research in this area so I can follow and look for more datasets that come up)

submitted by /u/bduxbellorum
[link] [comments]

0

Seeking Health-Related Longitudinal Datasets

Hi all,

We’re looking for good sources of longitudinal/time-series datasets in the area of health. The datasets have to include repeated entries (e.g., one person through a long time period). The domains we are interested include:

– exercise decisions (e.g., which days people choose to exercise/run etc)

– gym and fitness class attendance

– male/female birth order (per family) or in a delivery room

– dieting & nutrition (e.g., the order that people consume healthy or unhealthy foods each day)

– pain intensity

– weight development and progression

We have searched quite a bit on common repositories like Kaggle, Data World, and UCI Machine Learning, but we have not had much luck in finding data that meets our requirements and is a decent time-series. Any specific suggestions (e.g., organisations or repositories that have publicly available health data ) would be very helpful.

Please note that we are excluding datasets that show trends that are monotonically increasing or decreasing. This generally removes broader health domains like disease spread (e.g., Covid case numbers), worldwide health development (e.g., global nutrition), life expectancy, and mortality rates.

Thank you!

submitted by /u/Remarkable_Review327
[link] [comments]

0

Dataset With 10-15 Tables For SQL Project

Hi,

I’m a Masters student and need to do a project on SQL.

As a part of project, I need to work on a dataset, perform BCNF or other normalization.

After performing a normalization (Preferable BCNF), I should come up with at least 10 tables.

It should contain minimum of 5000 rows. It’s better if its a realistic data or practice commercial datasets from sources like Kaggle etc.

If anyone knows datasets like that can you please share the details?

Thanks 🙂

submitted by /u/Eren_94
[link] [comments]

0

Seeking Audio Data For Multilingual Project – 1000 Hours Needed In Various Languages

I hope you guys are doing well. I’m in need of audio data in several languages. Specifically, I’m looking for 1000 hours of data in each of the following languages:

Australian English Czech Danish Finnish Hungarian Portuguese Romanian Norwegian Bulgarian Croatian Serbian Iranian Persian Swedish Indonesian Chinese (Taiwan) Chinese (Hong Kong) Tamil Japanese

The audio data needs to meet the following specifications: – Audio file format: 16bit, 16khz or 16khz + (or any), WAV, 2 channels or 1 channel – Duration: Minimum 5 minutes and maximum 7 minutes (if other ranges are available, please provide samples and pricing) – Transcription file format: JSON or any other suitable format

Additionally, if you have transcribed files of the same audio data, please provide samples of those as well.

We will be using the data to train an LLM model to recognize events in text, and we will also require validation along with it.

If you have any leads, suggestions, or if you can provide the data yourself, please comment below or send me a direct message. Your assistance would be greatly appreciated.

Thank you in advance for your help!

submitted by /u/Disastrous_Piano7831
[link] [comments]

0

Blinkist, Shortform, Instaread, GetAbstract Data [paid]

Book summaries data from below sites available: – blinkist – shortform – instaread – getabstract

Data format: text + audio

Text is in epub & pdf format for each book. Audio is in mp3 format.

Last Updated: march, 2024

Update frequency: approximately ~2-3 months.

Dm me for access.

submitted by /u/waqarHocain
[link] [comments]

0

Regarding Alexa Topical Chat Dataset

Hi all,

Has anybody tried accessing the Alexa Topical Chat dataset? I don’t have the Reddit API at this moment. Is there an alternative to getting it?

submitted by /u/pickuplimesss
[link] [comments]

0

Where/How To Download All Genome Sequence Out Of NCBI?

I am planning to compare genome sequences, but for that I need data. So I came across National Center for Biotechnology Information. Which is an awesome organization.

But we have an issue here, we need to download them one by one. Is there any way we can download the whole thing into my server at once. Like all the available sequences.

I looked into there FTP page as well. But it downloaded data in different formats, like, gbff, faa, gpff, fna. And I’m pretty sure, there is more data than these, as it was just 8ish M.

Ref:

https://www.ncbi.nlm.nih.gov/datasets/taxonomy/37653/ https://www.ncbi.nlm.nih.gov/genome/doc/ftpfaq/

Any kind help or suggestions are appreciated.

submitted by /u/maifee
[link] [comments]

0

Where To Find Big Datasets With “updated_at” Date Column?

Hi,

I want to create a sample SCD Type 2 table. To do so, I am looking for some big dataset (>5GB) which updates daily and has “updated_at” date attribute that represents a date when a row has changed.

Example:

Today dataset looks like this

id color updated_at 1 blue 01.01.2024 2 red 01.01.2024

Tomorrow dataset looks like this:

id color updated_at 1 yellow 10.03.2024 2 red 01.01.2024

Do you know where I could find such datasets?

submitted by /u/Betelgeitze
[link] [comments]

0

[request] Can Someone Help Me Get Access To The LUPerson-T Dataset ?

I would be very grateful if someone could help me get access to the LUPerson-T dataset. I am unable to access it since I cannot create a baidu account and the authors are not responding.
https://pan.baidu.com/s/16hrzG6498HQs40gMozGAzA#list/path=%2F
Access code : hmyb

submitted by /u/Ash-11103
[link] [comments]

0

A Shared Scorecard To Evaluate Data Annotation Vendors

Evaluating and choosing an annotation partner is not an easy task. There are a lot of options, and it’s not straightforward to know who will be the best fit for a project.
We recently stumbled upon this paper by Andrew Greene titled – “Towards a shared rubric for Dataset Annotation”, that talks about a set of metrics which can be used to quantitatively evaluate data annotation vendors. So we decided to turn it into an online tool.
A big reason for building this tool is to also bring welfare of annotators to the attention of all stakeholders.
Until end users start asking for their data to be labeled in an ethical manner, labelers will always be underpaid and treated unfairly, because the competition boils down solely to price. Not only does this “race to the bottom” lead to lower quality annotations, it also means vendors have to “cut corners” to increase their margins.
Our hope is that by using this tool, ML teams will have a clear picture of what to look for when evaluating data annotation service providers, leading to better quality data as well as better treatment of the unsung heroes of AI – the data labelers.
Access the tool here https://mindkosh.com/annotation-services/annotation-service-provider-evaluation.html

submitted by /u/AdventurousSea4079
[link] [comments]

0

Python Function To Return Racial Data From Census.gov

Can someone please help me with this:

I need to make a python function which will take in a location and it will use the census.gov api to gather data on the race percentages at that location and then return them to me.

Thanks

submitted by /u/Ineedtoknow777
[link] [comments]

0

Looking For A Large Unlabelled Handwritten Text Dataset

I’m looking for a large number of handwritten text (in image format) and they don’t have to be labelled. Simply put, scanned images of handwritten pages, raw, untouched, but lots of them. I’m not even very particular on the language. I mean it would be nice if the images are separated based on their language but even a total mess would be acceptable.

The ones I’ve found so far are all labelled and as the result, there are not that many samples in them. I was hoping if the dataset is not labelled, it would be easier to find ones with a large number of samples.

These are the ones I’ve found:

CENSUS-HWR (1,812,014 samples)

IAM (16,752 samples)

submitted by /u/Ziadloo
[link] [comments]

0

A Dataset Of All Gaming Youtube Channels With Over 1 Million Subscribers

Does it exist?

submitted by /u/AlTiSsS
[link] [comments]

0

Are There Any Ways To Get The Data About The Offers From The Various Fast Food Chains?

The project requires the information about the menu, the prices, the ingredients, and the offers from some of the fast food chains across the US.

Thanks in advance for your help!

submitted by /u/VastDragonfruit847
[link] [comments]

0

Dataset For Multiple Sclerosis Mimics

hello, where can I find brain MRI for multiple sclerosis mimics? I have tried sending emails to the authors of the studies I found but so far, no one has replied yet.

submitted by /u/reynapatata
[link] [comments]

0

XGLM-564M – Fine Tuning For Ayacucho Quechua

Hi everyone,

I’m trying to perform fine-tuning on an XGLM-564 model on the Ayacucho Quechua language. Up until now, I’ve found two datasets from Huggingface that could be used to do this.

wikipedia/wikipedia hackathon-pln-es/spanish-to-quechua

I’m facing some problems with the first one and I’m not able to download it because of a missing package called apache_beam. I tried installing it but without any success (I’m using the latest PopOS).

For the second dataset, I’m mainly worried about the quality since I don’t have any knowledge of that language and I’m doing this fine-tuning as part of my uni assignment.

Any help will be greatly appropriated.

Thank you.

submitted by /u/dduka99
[link] [comments]

0

Dataset Of Books, Novels, And Other Literary Sources That Have Been Adapted Into Movies/tv Shows

I’m conducting exploratory data analysis on streaming platforms like Netflix, Amazon Prime, and others to guide content acquisition strategies for a new streaming service. Specifically, I’m investigating the performance of movies and TV shows that are adapted from literary sources compared to original content. By ‘perform better,’ I mean whether these adaptations, on average, receive higher ratings on the streaming platforms themselves or on external rating sites such as IMDb.

A similar question was asked before but never received a response: https://www.reddit.com/r/datasets/comments/gscwtz/request_is_there_a_comprehensive_database_of/

I would appreciate any assistance on this!

submitted by /u/2bapesrealm
[link] [comments]

0

I Made OMDB, The World’s Largest Downloadable Music Database (154,000,000 Songs)

submitted by /u/OatsCG
[link] [comments]

0

Request: USDA 12 Basic Soil Class Dataset For Mapping

Hello,

I am a student researching the precontact cultivation of tobacco by Indian tribes in western North America. I am trying to find a map of the 12 basic soil classes (clay, loam, silt loam, etc) but am having trouble. This would allow me to note where nicotiana species have proliferated despite regions being outside of their “natural” range. I am accounting for other geospacial factors as well, but this would be extremely helpful. Any assistance would be greatly appreciated 🙂

submitted by /u/infernoparadiso
[link] [comments]

0

[Mock] Ideas For A Dummy Inventory Dataset

I’m about to launch into building a dummy warehouse inventory dataset. I’m trying to come up with a playful type of company and product line upon which to base it. I’m after something whimsical, but meaty enough to build a demo around. I’m thinking at least 400-500 SKUs (products), with a compelling set of product categories (2-3 levels of hierarchy, a few dozen total categories). I’ve thought of things like:

a surf shop chain, with swimming and snorkeling equipment, T-shirts, beach toys and accessories. a “Flintstonesque” shop with all sorts of sticks and rocks something inspired by Wiley Coyote’s “ACME” (bird seed, exploding tennis balls, anvils…) maybe something inspired by Sponge Bob Square Pants (shell emporium….)

Any ideas?

(I realize that this isn’t quite the normal fare here. If it’s not close enough, could you suggest another subreddit?)

submitted by /u/waitak
[link] [comments]

0

Need Help In A Timeseries Satellite Dataset For A GAN Based Simulation

Hello all,

I am working on an academic project where I am using a GAN to train my synthetic satellite data of a city / vegetation land. I am then changing my labels (air quality, water supply, urbanization parameters etc)to predict what will the new image look like after the feature changes. I am currently working on synthetic satellite data so the results are more or less good. However I want to scale my project to a timeseries data of either a city or a vegetation land so that I can train my model on real time data. Can you point me to the right direction if any such dataset exists ?

submitted by /u/ultrainstinctmasters
[link] [comments]

0

Help On Finding A Text Summarization Dataset

I’m working on a research idea which can summarize a content for different audiences. For example particular company document summary for marketing, HR or developers which highlight the most relevant content for them. Right now I’m having a difficulty finding a text summarization dataset which has ground truth for different audiences as such. Can anyone point me to the right direction finding this dataset?

submitted by /u/AGENT_SAT
[link] [comments]

0

Data Analysis For Survey Responses Help

working on analyzing data and not sure where to start. data is from a survey. i have the participants’ ages and their selected responses (very often, sometimes, and never) to 14 questions. how do i find if there is a correlation between the ages and the responses?

submitted by /u/Junior_Band_7503
[link] [comments]

0

Looking For American Sign Language (ASL) Video Dataset With Both Gloss And Text Translations

I’m in search of videos featuring individuals using American Sign Language (ASL) to translate spoken English, with the sign language aligning accurately with the English words. The intermediary gloss (sign language transcription) in the videos should hopefully be included as well.

Willing to pay for the video/dataset!

submitted by /u/nobilis_rex_
[link] [comments]

0

Any Interest In CSGO Datasets(specifically From HLTV)?

I spent a lot of time accumulating historical match information for all available teams on HLTV. I’d like to know if this is something of any value for fellow researchers. I’d be happy to host it but I just wanna know if the interest is there. If anyone is interested, I scraped a lot of this data for purposes of generating a discord bot that does match predictions for CSGO matches. If you wanna hear more about the project or dataset just PM me or add ur contact here: https://yhzshsg2ee.us-east-1.awsapprunner.com/

submitted by /u/smackcam20
[link] [comments]

0

Geocities Data. Including Unique Buttons

submitted by /u/cavedave
[link] [comments]

0

Seeking Dataset: Active US Businesses

Hey everyone,

I’m looking for a free public dataset similar to Data Axle, Buzzfile, or OpenCorporates that provides detailed information on active businesses in the US. Any recommendations? Are there any reliable sources offering such datasets for free?

Thanks!

submitted by /u/madhatter349
[link] [comments]

0

[request]: Looking For Dna/rna Similarity Dataset In Pair

I’m looking for dataset, which includes DNA pair or RNA pair as strings, and it will return us the similarities in between them. Can you guys please refer some datasets like this?

Context: Trying to build a machine learning model.

submitted by /u/maifee
[link] [comments]

0

Looking For Open-source/public Client-therapist Transcripts Dataset

I put out an AI therapy chatbot, and I’ve used a few publicly available transcripts I’ve scraped together from here and there, but nowhere near enough for a proper fine-tuning and real analysis of it’s ability to approximate ‘real’ therapists. The one place I found, which actually feels extremely convincing, is fiction.

There is the publication by alexander street, Counseling and psychotherapy transcripts: volumes 1-3, but always blocked by university students/researchers only.

Anyone know of alternatives or a way to access that?

submitted by /u/naftalibp
[link] [comments]

0

Category: Datatards

Dataset For NILM Disagregation Project

Political Party Co-Preference Dataset

Seeking Health-Related Longitudinal Datasets

Dataset With 10-15 Tables For SQL Project

Seeking Audio Data For Multilingual Project – 1000 Hours Needed In Various Languages

Blinkist, Shortform, Instaread, GetAbstract Data [paid]

Regarding Alexa Topical Chat Dataset

Where/How To Download All Genome Sequence Out Of NCBI?

Where To Find Big Datasets With “updated_at” Date Column?

[request] Can Someone Help Me Get Access To The LUPerson-T Dataset ?

A Shared Scorecard To Evaluate Data Annotation Vendors

Python Function To Return Racial Data From Census.gov

Looking For A Large Unlabelled Handwritten Text Dataset

A Dataset Of All Gaming Youtube Channels With Over 1 Million Subscribers

Are There Any Ways To Get The Data About The Offers From The Various Fast Food Chains?

Dataset For Multiple Sclerosis Mimics

XGLM-564M – Fine Tuning For Ayacucho Quechua

Dataset Of Books, Novels, And Other Literary Sources That Have Been Adapted Into Movies/tv Shows

I Made OMDB, The World’s Largest Downloadable Music Database (154,000,000 Songs)

Request: USDA 12 Basic Soil Class Dataset For Mapping

[Mock] Ideas For A Dummy Inventory Dataset

Need Help In A Timeseries Satellite Dataset For A GAN Based Simulation

Help On Finding A Text Summarization Dataset

Data Analysis For Survey Responses Help

Looking For American Sign Language (ASL) Video Dataset With Both Gloss And Text Translations

Any Interest In CSGO Datasets(specifically From HLTV)?

Geocities Data. Including Unique Buttons

Seeking Dataset: Active US Businesses

[request]: Looking For Dna/rna Similarity Dataset In Pair

Looking For Open-source/public Client-therapist Transcripts Dataset

Recent Posts

Recent Comments

18+ Content

Recent Posts

Recent Comments