Common Anti-bot (anti-scraping) Measures On Websites And How To Bypass Them

submitted by /u/9millionrainydays_91
[link] [comments]

0

DiffusionDB: A Large-scale Text-to-image Prompt Gallery Dataset Based On Stable Diffusion

submitted by /u/yaph
[link] [comments]

0

Occupations With Demographic Information Dataset

Hello!

I’m looking for a dataset about occupations by race and ethnicity, gender and age (or any related demographic information) in the state of California. If there is a national dataset that would work too.

Thank you!

submitted by /u/htxastrowrld
[link] [comments]

0

Number Of Direct Flights Between Domestic Airports

Hello,

I’m looking for air traffic data for a personal project. I’d like to be able find the number of direct flights (ideally passengers too, but this is probably too granular) between two US airports. The way I’m envisioning it, the data for a given period of time would look something like the matrix below, where departure airports are column headers, destination airports are row headers, and the values are the number of flights (or passengers):

Airport ATL DFW DEN … ATL XXX 12 8 … DFW 11 XXX 10 … DEN 9 10 XXX … … … … … XXX

I know the data needed to create something like this must exist somewhere, but the closer to the end product displayed above the better. Thanks!

submitted by /u/dobby_bodd
[link] [comments]

0

Struggling With Finding A Use Case To Work On For My Course Work

Hi,
I am a seasoned market researcher and got intrigued with data science and machine learning since most of my job is about dealing with data. I am currently pursuing my MSc in Data Science. Before this we were provided with datasets to work on. It was initially a struggle to define my own use case based on the data that was shared, however, I was able to deliver with average results.
However, for my next coursework we should be using our own datasets which should be supervised learning in nature and they cannot be from Kaggle or UCI (we lose 30 points if we use any of these sources for our datasets. I have spent about a week to look for datasets and I am a bit confused and also unable to understand which dataset to use or what kind of use cases should I look at. I did explore data.gov but I kind of just freeze because I am unable to understand what use case I can create of the database. I can’t use clustering problem because that would be unsupervised in nature.

I tried a couple of regional sites for web scraping – use cases tried were second hand car and predicting their price and rental price prediction based on area selected. However the websites did not allow web scraping and I would like to respect that.
Would you know any publicly available datasets that I can potentially explore for my supervised machine learning coursework?
Do let me know if you have any idead that I can explore and thanks in advance.

submitted by /u/jknotra
[link] [comments]

0

Starting A New Job, I Am Required To Work On Old XLSX Datasets From 1990s. Need Help!

Hello everyone,

I am starting a new job for a company, my role is Data Specialist and I will be responsible for working on helping the team with Data migration.

The task is to do data migration from legacy system to a modern data structure on SharePoint. I am very much aware of the steps involved in this process, however, I have been out of touch with the tools and techniques as last time I studied Python, SSIS, visualization and Excel tools was a few years back and I think it will be difficult for me to contribute immediately as I join them. I am starting my work next week. I wanted to ask you professionals if I would have a hard time at my work with the skills I don’t possess right now and what are the steps I need to take to make sure my employer can count on me going forward.

PS: This is my first day working for an IT company and I have no idea how an IT project works.

Thanks!

submitted by /u/shanke_y8
[link] [comments]

0

HR Datasets For SQL Exploratory Data Analysis

Hello everyone!

So I wanted to do an EDA project on SQL but I don’t know where to start. I would prefer to work with an HR related dataset as I’m more familiar with HR and find the data already interesting.

Can someone point to a good dataset for this? Thanks!

submitted by /u/cam171811
[link] [comments]

0

Chef’s Special – Documenting Equipment Losses During The 2023 Wagner Group Mutiny

submitted by /u/cavedave
[link] [comments]

0

US Ethnic Demographics By County And State

Hi,
I’m working on my thesis and I wonder if anyone has this data or can lead me to relevant governmental web page that contains this?
Cheers!

submitted by /u/rolanddes1
[link] [comments]

0

Dataset With Large Amount Of Mixed Or Categorical For Cluster Analysis?

I’m looking for a dataset that contains a large amount of features (100+) for clustering purposes. I’m applying sparse clustering, which means that my method removes features that are not important for the clusters. Either a data set that only contains categorical variables, or a mix of numerical and categorical variables is fine.

Does anyone have any ideas?

submitted by /u/y_zh
[link] [comments]

0

I Need 24 Hours (temperature /heart Rate /oxygen Saturation) Dataset????

I need a dataset(s) that contains the recordings of one person: -temperature -heart rate -oxygen saturation.

For 24 hours with a timestamp of 1 min

submitted by /u/Lili23data
[link] [comments]

0

Twitter Spam Dataset Needed With Actual Tweet Text

Where can I find labelled spam tweets dataset that has at least 50000 rows. I’ve searched everywhere, but there are very few datasets available, and those datasets are way too small. It is impossible to use the API to get the tweets without paying 🙁
I am in urgent need of one, as I will need it for my MSc dissertation, any leads are greatly appreciated.

Thanks

submitted by /u/kingsterkrish
[link] [comments]

0

A Dataset Contrasting US Wealth In The Context Of Height

I think this is an interesting dataset that I generated from ChatGPT, but I am not sure how to generate visuals for it. Does anyone have any suggestions?

Height (Wealth Level) Percentage of Population Rough Number of People Rough Wealth Range 1 inch (Poverty Level) 10.5% ~34.8 million $0 – $10,000 5 feet (Median Wealth) 50% ~165.5 million $10,000 – $100,000 6 feet (Affluent) 25% ~82.75 million $100,000 – $1,000,000 10 feet (Wealthy) 10% ~33.1 million $1,000,000 – $10,000,000 100 feet (Ultra-Wealthy) 1% ~3.31 million $10,000,000 – $1,000,000,000 1000 feet (Billionaire Class) 0.0002% ~660 individuals $1,000,000,000 and above

submitted by /u/eagle_eye_johnson
[link] [comments]

0

In(greed)flation: Comparing Grocery Food Skus Year-over-year

Looking to put together a searchable website where users can search a product and see the price / quantity deferential over the past 2-3 years. Any help pointing me in the right direction would be greatly appreciated.

submitted by /u/mortalhal
[link] [comments]

0

REDD Dataset (A Public Data Set For Energy Disaggregation) New Link?

Anyone have another source to download the REDD dataset? The original source http://redd.csail.mit.edu/, is not working and cant access. We badly need the full dataset. Thank you!

submitted by /u/luna_anabanana
[link] [comments]

0

Dataset For Mushroom Yield Prediction Using 3 Predictive Model

Hello, everyone! Does anyone here has data set for mushroom yield production that includes temperature and humidity data? We need at least 1,500 data for our simulation as part of our capstone project. Thank you.

submitted by /u/Ill-Moose4794
[link] [comments]

0

[self-promotion] Feedback Needed: Building Git For Data That Commits Only Diffs (for Storage Efficiency On Large Repositories), Even Without Full Checkouts Of The Datasets

I would really appreciate feedback on a version control for tabular datasets I am building, the Data Manager.

Main characteristics:

Like DVC and Git LFS, integrates with Git itself. Like DVC and Git LFS, can store large files on AWS S3 and link them in Git via an identifier. Unlike DVC and Git LFS, calculates and commits diffs only, at row, column, and cell level. For append scenarios, the commit will include new data only; for edits and deletes, a small diff is committed accordingly. With DVC and Git LFS, the entire dataset is committed again, instead: committing 1 MB of new data 1000 times to a 1 GB dataset yields more than 1 TB in DVC (a dataset that increases linearly in size between 1 GB and 2 GB, committed 1000 times, results in a repository of ~1.5 TB), whereas it sums to 2 GB (1 GB original dataset, plus 1000 times 1 MB changes) with the Data Manager. Unlike DVC and Git LFS, the diffs for each commit remain visible directly in Git. Unlike DVC and Git LFS, the Data Manager allows committing changes to datasets without full checkouts on localhost. You check out kilobytes and can append data to a dataset in a repository of hundreds of gigabytes. The changes on a no-full-checkout branch will need to be merged into another branch (on a machine that does operate with full checkouts, instead) to be validated, e.g., against adding a primary key that already exists. Since the repositories will contain diff histories, snapshots of the datasets at a certain commit have to be recreated to be deployable. These can be automatically uploaded to S3 and labeled after the commit hash, via the Data Manager.

Links:

https://news.ycombinator.com/item?id=35930895 [no full checkout] https://youtu.be/BxvVdB4-Aqc https://news.ycombinator.com/item?id=35806843 [general intro] https://youtu.be/J0L8-uUVayM

This paradigm enables hibernating or cleaning up history on S3 for old datasets, if these are deleted in Git and snapshots of earlier commits are no longer needed. Individual data entries can also be removed for GDPR compliance using versioning on S3 objects, orthogonal to git.

I built the Data Manager for a pain point I was experiencing: it was impossible to (1) uniquely identify and (2) make available behind an API multiple versions of a collection of datasets and config parameters, (3) without overburdening HDDs due to small, but frequent changes to any of the datasets in the repo and (4) while being able to see the diffs in git for each commit in order to enable collaborative discussions and reverting or further editing if necessary.

Some background: I am building natural language AI algorithms (a) easily retrainable on editable training datasets, meaning changes or deletions in the training data are reflected fast, without traces of past training and without retraining the entire language model (sounds impossible), and (b) that explain decisions back to individual training data.

I look forward to constructive feedback and suggestions!

submitted by /u/Usual-Maize1175
[link] [comments]

0

Is There Any Jobs And Careers Dataset And Where Can I Find Them?

I am creating Career Recommender for my Computer Science Project and I am looking for a dataset that contains information about all the different Careers/Jobs and stats about them like salary, field, and Education needed for it. Where can I find a dataset which would include these are at least some?

submitted by /u/parrotpo
[link] [comments]

0

Market Distribution Data Analytics Report

I am working on a project to collect data from Different sources (distributors, retail stores, etc.) thru different approaches (ftp, api, scrapping, excel, etc.). I would like to consolidate all the information and create dynamic reports. I would like to add all the offers and discounts suggested by these various vendors.

How do I get all this data? Is there a data provider who can provide the data? I would like to start with IT hardware and IT Electronic Consumers goods.

Any help is highly appreciated. TIA

submitted by /u/BeGood9170
[link] [comments]

0

Hey!! Please Help Us Create A Dataset With This Survey For Out School Project!!!!!!!!!!

https://docs.google.com/forms/d/e/1FAIpQLSdcIXGYOzN6HPDxz4cFlcJ-YNUmybVGF0urslilmuvehl_FLw/viewform?usp=sf_link

submitted by /u/JaisMur
[link] [comments]

0

Dataset Of Vehicles Small Dataset Of Vehicles Detection Have At Least 6 Classes Contain Ambulance Class

i need a small dataset of vehicles detection have at least 6 classes contain ambulance class for my project “smart traffic control” , I am use YOLO v8

submitted by /u/ibrahim-elsadat
[link] [comments]

0

Exploring The Potential Of Data For The Public Good: Share Your Insights!

Hey everyone!

We are a group of design students currently conducting academic research on an intriguing topic: the democratization of data and its potential of data to benefits the public. We believe that data can play a vital role in improving people’s lives outside the realm of business, and we would love to hear your thoughts and experiences on this subject.

If you have a moment, we kindly invite you to answer one or more of the following questions either privately or as a comment:

– Please share your most recent experience using datasets for self– worth or public value (non-business purposes). For example, a project that makes data accessible or extracts insights that can help the general public?

– Working on the project, what worked and what didn’t work? Were there barriers and challenges that you can share?

– Are there any insights or tips you would like to share following the project?

– Do you have any insights or thoughts regarding the use or accessibility of data for the public good?

Your contribution can be as brief or as detailed as you like. We greatly appreciate any answers, thoughts, or perspectives you are willing to share.

Thank you all!

submitted by /u/Direct-Goat-2072
[link] [comments]

0

Required Pakistani Data On Hate Speech Analysis ?

Any data on hate speech of about 3000 to 5000 records such as: on inflation in pakistan, power outage in pakistan etc is required

:

submitted by /u/Muted_Researcher_785
[link] [comments]

0

Tutorial: Getting Vegetation Time-series Data (NDVI) From Sentinel-2 Satellite Using Python [self-promotion]

submitted by /u/purple_manta_ray
[link] [comments]

0

Need A Dataset For Fruit Disease Detection I Will Show The Code Below And Pls Tell Me A Dataset And How To Use It. I’m A Total Idiot When It Comes To This. I Learnt Theory ABT It But Not That Much Practicals..can Somebody Help

import os import cv2 import numpy as np

Data organization

dataset_root = ‘path_to_dataset_root_directory’ categories = [‘healthy’, ‘diseased’]

for category in categories: category_dir = os.path.join(dataset_root, category) images = os.listdir(category_dir) for image_name in images: image_path = os.path.join(category_dir, image_name) # Perform further processing on each image

Preprocessing and disease detection

def preprocess_and_detect_disease(image_path): # Load the image image = cv2.imread(image_path)

# Preprocess the image gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY) blurred_image = cv2.GaussianBlur(gray_image, (5, 5), 0) # Apply image processing techniques (e.g., thresholding) _, thresholded_image = cv2.threshold(blurred_image, 100, 255, cv2.THRESH_BINARY) # Find contours in the image contours, _ = cv2.findContours(thresholded_image.copy(), cv2.RETR_EXTERNAL, cv2.CHAIN_APPROX_SIMPLE) # Loop over the contours and detect fruit disease for contour in contours: # Calculate the area of the contour area = cv2.contourArea(contour) # Set a threshold for disease detection threshold_area = 5000 if area > threshold_area: # Draw a bounding box around the detected fruit x, y, w, h = cv2.boundingRect(contour) cv2.rectangle(image, (x, y), (x+w, y+h), (0, 255, 0), 2) # Display the output image cv2.imshow(‘Fruit Disease Detection’, image) cv2.waitKey(0) cv2.destroyAllWindows()

Example usage

image_path = ‘path_to_your_image.jpg’ preprocess_and_detect_disease(image_path)

submitted by /u/No-Bad-5051
[link] [comments]

0

Looking For Hardware Dataset (computer/mouse/keyboard)

i need a dataset of hardware componnent like mouse ,ram,pc,laptop etc…. for a school project

submitted by /u/the_one777777897
[link] [comments]

0

Bye Reddit! Remember To Delete Your Posts Before Deleting Your Account!

Your value to Reddit is your free posts and comments so remember to delete your posts before deleting your account!

I used this Reddit API script to delete 10 years of my comments and posts: https://codepen.io/j0be/full/WMBWOW/

Bye Reddit! It has been fun!

submitted by /u/wickedevil
[link] [comments]

0

Entire Reddit Dataset Released Somewhere?

I’ve heard the entire reddit post and comment dataset is released monthly somewhere but can’t find it.
Does anyone know where?

submitted by /u/Fedude99
[link] [comments]

0

[Form] Could You Please Take 5 Minutes To Answer This For About Electric Vehicles Concerns?

https://forms.microsoft.com/e/5gBkjDhYKN

I will make this public after some time, and I really appreciate it for anyone who complete it.

submitted by /u/rayofhope313
[link] [comments]

0

Reddit API Changes. What Do You Think?

Lots of subs are going to go dark/private because reddit will raise the price of api calls to them.

/r/datasets is more pro cheap/free data than most subs. What do you think of the idea of going dark? Example explanation from another sub.
https://old.reddit.com/r/redditisfun/comments/144gmfq/rif_will_shut_down_on_june_30_2023_in_response_to/

submitted by /u/cavedave
[link] [comments]

0

Category: Datatards

Common Anti-bot (anti-scraping) Measures On Websites And How To Bypass Them

DiffusionDB: A Large-scale Text-to-image Prompt Gallery Dataset Based On Stable Diffusion

Occupations With Demographic Information Dataset

Number Of Direct Flights Between Domestic Airports

Struggling With Finding A Use Case To Work On For My Course Work

Starting A New Job, I Am Required To Work On Old XLSX Datasets From 1990s. Need Help!

HR Datasets For SQL Exploratory Data Analysis

Chef’s Special – Documenting Equipment Losses During The 2023 Wagner Group Mutiny

US Ethnic Demographics By County And State

Dataset With Large Amount Of Mixed Or Categorical For Cluster Analysis?

I Need 24 Hours (temperature /heart Rate /oxygen Saturation) Dataset????

Twitter Spam Dataset Needed With Actual Tweet Text

A Dataset Contrasting US Wealth In The Context Of Height

In(greed)flation: Comparing Grocery Food Skus Year-over-year

REDD Dataset (A Public Data Set For Energy Disaggregation) New Link?

Dataset For Mushroom Yield Prediction Using 3 Predictive Model

[self-promotion] Feedback Needed: Building Git For Data That Commits Only Diffs (for Storage Efficiency On Large Repositories), Even Without Full Checkouts Of The Datasets

Is There Any Jobs And Careers Dataset And Where Can I Find Them?

Market Distribution Data Analytics Report

Hey!! Please Help Us Create A Dataset With This Survey For Out School Project!!!!!!!!!!

Dataset Of Vehicles Small Dataset Of Vehicles Detection Have At Least 6 Classes Contain Ambulance Class

Exploring The Potential Of Data For The Public Good: Share Your Insights!

Required Pakistani Data On Hate Speech Analysis ?

Tutorial: Getting Vegetation Time-series Data (NDVI) From Sentinel-2 Satellite Using Python [self-promotion]

Need A Dataset For Fruit Disease Detection I Will Show The Code Below And Pls Tell Me A Dataset And How To Use It. I’m A Total Idiot When It Comes To This. I Learnt Theory ABT It But Not That Much Practicals..can Somebody Help

Data organization

Preprocessing and disease detection

Example usage

Looking For Hardware Dataset (computer/mouse/keyboard)

Bye Reddit! Remember To Delete Your Posts Before Deleting Your Account!

Entire Reddit Dataset Released Somewhere?

[Form] Could You Please Take 5 Minutes To Answer This For About Electric Vehicles Concerns?

Reddit API Changes. What Do You Think?

Recent Posts

Recent Comments

18+ Content

Data organization

Preprocessing and disease detection

Example usage

Recent Posts

Recent Comments