I need a dataset(s) that contains the recordings of one person: -temperature -heart rate -oxygen saturation.
For 24 hours with a timestamp of 1 min
submitted by /u/Lili23data
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I need a dataset(s) that contains the recordings of one person: -temperature -heart rate -oxygen saturation.
For 24 hours with a timestamp of 1 min
submitted by /u/Lili23data
[link] [comments]
Where can I find labelled spam tweets dataset that has at least 50000 rows. I’ve searched everywhere, but there are very few datasets available, and those datasets are way too small. It is impossible to use the API to get the tweets without paying π
I am in urgent need of one, as I will need it for my MSc dissertation, any leads are greatly appreciated.
β
Thanks
submitted by /u/kingsterkrish
[link] [comments]
I think this is an interesting dataset that I generated from ChatGPT, but I am not sure how to generate visuals for it. Does anyone have any suggestions?
Height (Wealth Level) Percentage of Population Rough Number of People Rough Wealth Range 1 inch (Poverty Level) 10.5% ~34.8 million $0 – $10,000 5 feet (Median Wealth) 50% ~165.5 million $10,000 – $100,000 6 feet (Affluent) 25% ~82.75 million $100,000 – $1,000,000 10 feet (Wealthy) 10% ~33.1 million $1,000,000 – $10,000,000 100 feet (Ultra-Wealthy) 1% ~3.31 million $10,000,000 – $1,000,000,000 1000 feet (Billionaire Class) 0.0002% ~660 individuals $1,000,000,000 and above
submitted by /u/eagle_eye_johnson
[link] [comments]
Looking to put together a searchable website where users can search a product and see the price / quantity deferential over the past 2-3 years. Any help pointing me in the right direction would be greatly appreciated.
submitted by /u/mortalhal
[link] [comments]
Anyone have another source to download the REDD dataset? The original source http://redd.csail.mit.edu/, is not working and cant access. We badly need the full dataset. Thank you!
submitted by /u/luna_anabanana
[link] [comments]
Hello, everyone! Does anyone here has data set for mushroom yield production that includes temperature and humidity data? We need at least 1,500 data for our simulation as part of our capstone project. Thank you.
submitted by /u/Ill-Moose4794
[link] [comments]
I would really appreciate feedback on a version control for tabular datasets I am building, the Data Manager.
Main characteristics:
Like DVC and Git LFS, integrates with Git itself. Like DVC and Git LFS, can store large files on AWS S3 and link them in Git via an identifier. Unlike DVC and Git LFS, calculates and commits diffs only, at row, column, and cell level. For append scenarios, the commit will include new data only; for edits and deletes, a small diff is committed accordingly. With DVC and Git LFS, the entire dataset is committed again, instead: committing 1 MB of new data 1000 times to a 1 GB dataset yields more than 1 TB in DVC (a dataset that increases linearly in size between 1 GB and 2 GB, committed 1000 times, results in a repository of ~1.5 TB), whereas it sums to 2 GB (1 GB original dataset, plus 1000 times 1 MB changes) with the Data Manager. Unlike DVC and Git LFS, the diffs for each commit remain visible directly in Git. Unlike DVC and Git LFS, the Data Manager allows committing changes to datasets without full checkouts on localhost. You check out kilobytes and can append data to a dataset in a repository of hundreds of gigabytes. The changes on a no-full-checkout branch will need to be merged into another branch (on a machine that does operate with full checkouts, instead) to be validated, e.g., against adding a primary key that already exists. Since the repositories will contain diff histories, snapshots of the datasets at a certain commit have to be recreated to be deployable. These can be automatically uploaded to S3 and labeled after the commit hash, via the Data Manager.
Links:
https://news.ycombinator.com/item?id=35930895 [no full checkout] https://youtu.be/BxvVdB4-Aqc https://news.ycombinator.com/item?id=35806843 [general intro] https://youtu.be/J0L8-uUVayM
This paradigm enables hibernating or cleaning up history on S3 for old datasets, if these are deleted in Git and snapshots of earlier commits are no longer needed. Individual data entries can also be removed for GDPR compliance using versioning on S3 objects, orthogonal to git.
I built the Data Manager for a pain point I was experiencing: it was impossible to (1) uniquely identify and (2) make available behind an API multiple versions of a collection of datasets and config parameters, (3) without overburdening HDDs due to small, but frequent changes to any of the datasets in the repo and (4) while being able to see the diffs in git for each commit in order to enable collaborative discussions and reverting or further editing if necessary.
Some background: I am building natural language AI algorithms (a) easily retrainable on editable training datasets, meaning changes or deletions in the training data are reflected fast, without traces of past training and without retraining the entire language model (sounds impossible), and (b) that explain decisions back to individual training data.
I look forward to constructive feedback and suggestions!
submitted by /u/Usual-Maize1175
[link] [comments]
I am creating Career Recommender for my Computer Science Project and I am looking for a dataset that contains information about all the different Careers/Jobs and stats about them like salary, field, and Education needed for it. Where can I find a dataset which would include these are at least some?
submitted by /u/parrotpo
[link] [comments]
I am working on a project to collect data from Different sources (distributors, retail stores, etc.) thru different approaches (ftp, api, scrapping, excel, etc.). I would like to consolidate all the information and create dynamic reports. I would like to add all the offers and discounts suggested by these various vendors.
How do I get all this data? Is there a data provider who can provide the data? I would like to start with IT hardware and IT Electronic Consumers goods.
Any help is highly appreciated. TIA
submitted by /u/BeGood9170
[link] [comments]
i need a small dataset of vehicles detection have at least 6 classes contain ambulance class for my project “smart traffic control” , I am use YOLO v8
submitted by /u/ibrahim-elsadat
[link] [comments]
Hey everyone!
We are a group of design students currently conducting academic research on an intriguing topic: the democratization of data and its potential of data to benefits the public. We believe that data can play a vital role in improving people’s lives outside the realm of business, and we would love to hear your thoughts and experiences on this subject.
If you have a moment, we kindly invite you to answer one or more of the following questions either privately or as a comment:
– Please share your most recent experience using datasets for self– worth or public value (non-business purposes). For example, a project that makes data accessible or extracts insights that can help the general public?
– Working on the project, what worked and what didn’t work? Were there barriers and challenges that you can share?
– Are there any insights or tips you would like to share following the project?
– Do you have any insights or thoughts regarding the use or accessibility of data for the public good?
Your contribution can be as brief or as detailed as you like. We greatly appreciate any answers, thoughts, or perspectives you are willing to share.
Thank you all!
submitted by /u/Direct-Goat-2072
[link] [comments]
Any data on hate speech of about 3000 to 5000 records such as: on inflation in pakistan, power outage in pakistan etc is required
:
submitted by /u/Muted_Researcher_785
[link] [comments]
import os import cv2 import numpy as np
dataset_root = ‘path_to_dataset_root_directory’ categories = [‘healthy’, ‘diseased’]
for category in categories: category_dir = os.path.join(dataset_root, category) images = os.listdir(category_dir) for image_name in images: image_path = os.path.join(category_dir, image_name) # Perform further processing on each image
def preprocess_and_detect_disease(image_path): # Load the image image = cv2.imread(image_path)
# Preprocess the image gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0) # Apply image processing techniques (e.g., thresholding) _, thresholded_image = cv2.threshold(blurred_image, 100, 255, cv2.THRESH_BINARY) # Find contours in the image contours, _ = cv2.findContours(thresholded_image.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # Loop over the contours and detect fruit disease for contour in contours: # Calculate the area of the contour area = cv2.contourArea(contour) # Set a threshold for disease detection threshold_area = 5000 if area > threshold_area: # Draw a bounding box around the detected fruit x, y, w, h = cv2.boundingRect(contour) cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2) # Display the output image cv2.imshow(‘Fruit Disease Detection’, image) cv2.waitKey(0) cv2.destroyAllWindows()
image_path = ‘path_to_your_image.jpg’ preprocess_and_detect_disease(image_path)
submitted by /u/No-Bad-5051
[link] [comments]
i need a dataset of hardware componnent like mouse ,ram,pc,laptop etc…. for a school project
submitted by /u/the_one777777897
[link] [comments]
Your value to Reddit is your free posts and comments so remember to delete your posts before deleting your account!
I used this Reddit API script to delete 10 years of my comments and posts: https://codepen.io/j0be/full/WMBWOW/
Bye Reddit! It has been fun!
submitted by /u/wickedevil
[link] [comments]
I’ve heard the entire reddit post and comment dataset is released monthly somewhere but can’t find it.
Does anyone know where?
β
submitted by /u/Fedude99
[link] [comments]
https://forms.microsoft.com/e/5gBkjDhYKN
I will make this public after some time, and I really appreciate it for anyone who complete it.
submitted by /u/rayofhope313
[link] [comments]
Lots of subs are going to go dark/private because reddit will raise the price of api calls to them.
β
/r/datasets is more pro cheap/free data than most subs. What do you think of the idea of going dark? Example explanation from another sub.
https://old.reddit.com/r/redditisfun/comments/144gmfq/rif_will_shut_down_on_june_30_2023_in_response_to/
submitted by /u/cavedave
[link] [comments]
Hi friends, I’m looking for a dataset of videos that can be used to understand human talking/chatting, very much appreciated for any suggestions. Thanks.
submitted by /u/Character-Size-3083
[link] [comments]
I’m currently in search of datasets that contain historical cyberattacks and their features. More specifically, I am looking at these columns: Type of Malware, Attack Vector, Purpose, Attacker or Groups, Damages Done (in USD or number of people affected), Type of Sector, and Size of the Organization Affected. Any recommendations or sources where I can find such datasets?
submitted by /u/Much_Pineapple_6027
[link] [comments]
Hi I am planning to create personalized learning system as major project for my comp science degree and facing difficulty finding proper datasets. I need 10,000+ datas on students (preferrably higher education) for the project.
submitted by /u/No_Development2058
[link] [comments]
Looking for business related data sets for tableau practice. Not too worried if I have to pay for access, just looking for something high quality that I can use to pair along with some research. Ultimate goal is to showcase data visualization skills with business related data.
submitted by /u/ChoiceChicken
[link] [comments]
Looking for a dataset containing cyclone/storm damage to apply machine learning. All the damage data that I can find is a single number for each event. Ideally, I would like to know the damage for each event split by region (by region, this could be by postcode/zip code, suburb, etc). To specifically describe what I am after:
Time period: At least for the last 20 years, but more the better. County: Preferably Australia but happy for it to be any other county if that county has the required data available. Event: As mentioned in the title, interested in cyclone and storms. Note, I use the term cyclone to include events such as hurricanes, typhoons, etc. Damage: This could be total economic damage, recovery cost, lives lost, casualties or any other reasonable metric. Granularity: This is the most important feature I am after. The more granular the better. Ideally the damage data would be by postcode/zip code. Though perhaps that is too much to hope for so will take what I can get.
Thanks in advance!
submitted by /u/Nanoputian8128
[link] [comments]
Hi guys.
I have developed a website https://twitter.cworld.ai
You can search twitter user and download their all tweets on this
the tweets can be three format
rawtxt every tweet is split by two newlines nn alpaca alpaca format json file the instruction is fixed play a role the input is the name of this user, maybe it will contains it’s intro origin tweets json the origin tweets json file
submitted by /u/Separate-Awareness53
[link] [comments]
Just looking for a cool dataset I can throw into Python and do a multiple regression on. Ideally, just to add to my GitHub for a job application. What would you do? This is for a entry level DS position and they want to see a couple of projects.
submitted by /u/PSKGM
[link] [comments]
I want to get hold of threaded communication that happens at work.
I have taken a look at,
Mailing lists, but mails are elaborate and I want to specifically train a model on shorter day to day conversations.
IRC archives don’t contain information about the message replied to.
Any open platforms/data sets you have come across where I can find the information containing regular day to day chats?
submitted by /u/lambainsaan
[link] [comments]
Hi everyone, in case youβre working on some projects based on web scraped data from e-commerce fashion websites, you can buy them on databoutique.com for few dollars. Available websites: Zara, Mango, H&M for fast fashion. Gucci, Prada, Balenciaga, Farfetch and more for luxury.
submitted by /u/Pigik83
[link] [comments]