submitted by /u/cavedave
[link] [comments]
Category: Other Nonsense & Spam
Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.
Explore the data: https://app.gigasheet.com/spreadsheet/OpenFoodFacts-org-Products-Database/9a056567_9b41_4dda_a673_37fe1d3526b5
submitted by /u/n1nja5h03s
[link] [comments]
There seems to be about (if not more) 350 million registered domain names, but can’t seem to find any source that offers to download this data.
I am only interested in root domains eg dailynews.com I came across this repo https://github.com/tb0hdan/domains But after filtering the root domains I end up about 150 million. There is also paid service such as zonefiles. Io that offers about 260 millions domain. Anyone knows or aware of any other sources that provide the complete set?
Thanks in advance.
P.S. Is it worth it to setup your own crawlers for this type of thing?
submitted by /u/activelearning23
[link] [comments]
Hello again! I came across this dataset and found it to be interesting. It includes major felony crimes in mostly the D.C area between 2010-2020. The information also includes gender, race, year, felony charge, offense, time served, and a lot more!
Click here to view the dataset: https://app.gigasheet.com/spreadsheet/Felony-Sentence-2010-2020-csv/71dbef04_e629_43ca_b8c4_007de9244fd6
Looks like “drug” charges are usually the top over the course of the 10 years and 2012 was the worst year for crime between 2010-2020
Dataset Source: https://opendata.dc.gov/datasets/DCGIS::felony-sentences/explore
submitted by /u/sheetheadd
[link] [comments]
For a data analysis project, I’m looking for a reliable dataset about how sleep quality is affected by different genetic and lifestyle qualities.
Things like: gender, age, caffeine/alcohol consumption, exercise frequency, etc.
Something with labels like this one would be optimal: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency – however I can’t confirm the authenticity of this data.
Any resources would be greatly appreciated!
submitted by /u/Ok_Afternoon_1720
[link] [comments]
Hello,
I am working on building a property management software, and i wanted to know how could I migrate data from one software to another to make customer onboarding easier.
Is there a third party tool to help with this? If so, how would it work?
Thank you.
submitted by /u/Substantial-Art-9322
[link] [comments]
GitHub: https://github.com/radi-cho/datasetGPT
It can generate texts by varying input parameters and using multiple backends. But, personally, the conversations dataset generation is my favorite: It can produce dialogues between two ChatGPT agents.
Possible use cases may include:
Constructing textual corpora to train/fine-tune detectors for content written by AI. Collecting datasets of LLM-produced conversations for research purposes, analysis of AI performance/impact/ethics, etc. Automating a task that a LLM can handle over big amounts of input texts. For example, using GPT-3 to summarize 1000 paragraphs with a single CLI command. Leveraging APIs of especially big LLMs to produce diverse texts for a specific task and then fine-tune a smaller model with them.
What would you use it for?
submitted by /u/radi-cho
[link] [comments]
I’m looking for a dataset similar to the Amphibians dataset from UCI for an undergraduate data science project. It should be a classification problem, i.e. presence/absence of a species dependent on habitat characteristics such as temperature, type of vegetation, size of water reservoir, amount of rainfall, distance to roads/civilisation, etc.
It should include
>15 numerical and categorical features >300 observations temporal and/or spatial data if possible, so I can play around with some heat maps or time series analysis.
Any hints are highly appreciated as I’m a beginner and I’ve been scrolling my eyes out on kaggle etc. all weekend.
submitted by /u/apex—-predator
[link] [comments]
Where might I find some public or purchasable data on recent school enrollment K-12, postsecondary, etc, in the USA? Needing both public schools and private or charter schools, by state, by grade, if possible. I’ll take what I can get!
submitted by /u/LongDrawn
[link] [comments]
Hello! I’m a senior electronics engineering student. My friend trying to make a blind-assistant that helps blind people to differentiate same form-objects as like Coca-Cola vs Sprite. He design a hardware with esp8266 and uses a cloud for storing datasets. We create a dataset with taking photos of cokes however its hard to creating for all stuff. Is there any solution or resource for finding daily life datasets? We had dive a lot of open datasets CIFAR, Berkley, Kaggle, COCO, MNIST but we required 224×224 pixels for our ML model.
submitted by /u/yagmurxyildiz
[link] [comments]
So I have been looking around for some time and havent been able to find a dataset, whether global or local coverage.
I am looking for either raw data or clean data so that I can use it a project, there are a lot of statistics out there, but where is the raw data coming from?
Thanks for your help
submitted by /u/zmkarakas
[link] [comments]
Hi
I’m wondering where I can buy machine learning data directly for my project/product. Let’s say it’s a music or allergy app. I would like to connect a chat/predictor which, based on a few data, is able to indicate a certain percentage of something. However, large amounts of data are needed to train such algorithms. Where can you actually buy them?
submitted by /u/jackoborm
[link] [comments]
Hi, I am looking for a data set that is specific to cartel-related violent crimes in Mexico. I know that INEGI has some crime data, but I can’t seem to find anything related to the cartels.
By violent crimes, I mean Homicides, Kidnappings, Robberies, etc.
Any help would be great.
submitted by /u/commanderzemm
[link] [comments]
Hi there!
I just put up a new dataset on Kaggle. It’s cryptically titled The largest diamond dataset currently on Kaggle
It has just under 220,000 diamonds and 25 columns of data making it about 3x larger than next largest. I think it’s perfect for regression models and there is an attached notebook.
This is my first submission to Kaggle so I’d be very much interested in any feedback you might have.
Thanks!
submitted by /u/hrokrin
[link] [comments]
Any ideas where I can find data for hybrid vehicle market share in the US over the last five years? Google has been most unhelpful.
submitted by /u/RefrigeratorTall1093
[link] [comments]
Hello everyone,
I am currently pursuing a career as a Senior Business Analyst, and I know that having a strong understanding of SQL is essential for this role. However, there are so many aspects of SQL to learn, and I’m not sure where to focus my attention.
I would like to know from those who work as Senior Business Analysts, or those who have experience working with them, what are the best aspects of SQL to learn for this position? Which SQL skills do you use the most in your day-to-day work, and which ones have been the most valuable for you?
I appreciate any insights or advice you can offer, and I look forward to learning from your experiences. Thank you!
submitted by /u/LampRunner
[link] [comments]
Hello everyone,
I am currently working on creating a chatbot that can recommend solutions to log errors that occur in Java applications. To do this, I need a dataset that contains examples of log errors along with their corresponding solutions. I am hoping to find a dataset that is large enough to train a machine learning model to accurately suggest solutions based on the log error message.
If anyone knows of a dataset that would be helpful for this project or has any suggestions on where to find one, I would greatly appreciate it. Any information or assistance would be extremely valuable to me.
Thank you for your time and consideration.
submitted by /u/Farjou69
[link] [comments]
Looking for data that can help me compare how covid may have encouraged more people to take hobby flying lessons. I could use either: – # of people that signed up for classes – # take offs/landings of smaller aircrafts like Cessnas – # of PPLs/CPLs issued as a proxy for seeing the impact
submitted by /u/Eeshoo
[link] [comments]
Financial thematic data package, pertaining to banking
https://app.snowflake.com/marketplace/listing/GZTSZAS2KF7/cybersyn-inc-financial-data-package
Includes data from:
Federal Deposit Insurance Corporation (FDIC) Federal Reserve Economic Data (FRED) Federal Financial Institutions Examination Council (FFIEC) Consumer Financial Protection Bureau (CFPB)
submitted by /u/aiatco2
[link] [comments]
Here is a simple spreadsheet of several thousand battles. I am working (slowly) to get a ton of information on each battle. Please critique and notify me of errors. Cheers.
submitted by /u/UnlimitedRed
[link] [comments]
Hello there, I have a medical dataset in which some features are numeric, while others are categorical. With “categorical” I mean that these features are natively encoded with ordinal integer encoding, such that every possible value is represented as an incremental integer value. It is important for you to know that this dataset has been obtained as part of a survey, so that every categorical value is referred to different types of answers such as “never”, “sometimes”, “a lot of the time” and so on. I have to apply a MLP to this kind of data and I know that in order to do it I first need to scale data. Question is, do I have to scale all features without regard to categorical ones or do I need to scale only numerical variables applying One-hot encoding to the others? I was also wondering if it is necessary to apply one-hot encoding to categorical columns or if I can leave them as they are, applying standardization only to the numerical variables.
submitted by /u/NathanDrake27
[link] [comments]
Came across this pretty popular dataset on Maryland Crashes from 2016-2022. Check it out here:
From these findings, it’s pretty clear that:
Baltimore county (not city) has the highest number of crashes at 156K incidents, with 2018 being the highest year for accidents. The Baltimore Beltway seems to be the highest place for these incidents, with 2.2K incidents occurring over the course of 2016-2022. Yikessss. The Capital Beltway has the highest # of incidents, sitting at 22K Marylanders tend to hit other cars and objects on the road the most but have the least amount of incidents at U-turns (surprising!) The lowest county with crashes is Kent County
Source: https://opendata.maryland.gov/Public-Safety/Maryland-Statewide-Vehicle-Crashes/65du-s3qu
submitted by /u/sheetheadd
[link] [comments]
I am looking for good source to get historical intraday stock data for individual stocks (Norwegian). Maximum timeframe 30min. Any good databases/APIs
submitted by /u/waleed3011
[link] [comments]
Interesting dataset pulled from Boston’s Official Government Site. I definately heard about the spike of crimes that occurred during height of Covid, so I decided to merge the two CSVs from 2021 and 2020. It also helps depict/infer the safest streets in Boston.
Curious, is anyone else interested in a specific location/city and it’s crime data? I see tons of datasets like this online. Would love to share and see some interesting ones!
Click here to view the dataset: https://app.gigasheet.com/spreadsheet/2020-2021-Covid-Crime-in-Boston/94982770_3c8c_48fb_9176_efeb72becdd8
submitted by /u/sheetheadd
[link] [comments]
With the bias of most face detection algorithms against marginal groups such as people of color I’m working a project for skin disease for people of color and would like to know where I can find dataset for people of color. Thanks
submitted by /u/Think_Huckleberry299
[link] [comments]
Data from pulsating stars explained in this data story: https://www.marpledata.com/data-stories/disco-balls-in-space-how-pulsating-stars-work
Data itself is available at https://archive.stsci.edu/
submitted by /u/mbaerto
[link] [comments]