Data On 2.4M Foods From OpenFoodFacts.org – Ingredients, Nutrition, Allergens

Open Food Facts is a database of food products with ingredients, allergens, nutrition facts and all the tidbits of information we can find on product labels.

Explore the data: https://app.gigasheet.com/spreadsheet/OpenFoodFacts-org-Products-Database/9a056567_9b41_4dda_a673_37fe1d3526b5

Source: https://world.openfoodfacts.org/data

submitted by /u/n1nja5h03s
[link] [comments]

0

[self-promotion] Fuzzy Match, Highlight And Remove Duplicates In Google Sheets

submitted by /u/getflookup
[link] [comments]

0

Looking For Active List Of Domain Names

There seems to be about (if not more) 350 million registered domain names, but can’t seem to find any source that offers to download this data.

I am only interested in root domains eg dailynews.com I came across this repo https://github.com/tb0hdan/domains But after filtering the root domains I end up about 150 million. There is also paid service such as zonefiles. Io that offers about 260 millions domain. Anyone knows or aware of any other sources that provide the complete set?

Thanks in advance.

P.S. Is it worth it to setup your own crawlers for this type of thing?

submitted by /u/activelearning23
[link] [comments]

0

Panel Of Cleaned, Pared-down IMHE US State Life Expectancy Data 1990-2019. Includes Life Expectancy At Birth, Age 25, And Age 65 With Breakdowns By Sex And Race And Ethnicity.

submitted by /u/dreaded_python
[link] [comments]

0

Washington D.C 2010-2020 Felony Offense And Sentence Overview

Hello again! I came across this dataset and found it to be interesting. It includes major felony crimes in mostly the D.C area between 2010-2020. The information also includes gender, race, year, felony charge, offense, time served, and a lot more!

Click here to view the dataset: https://app.gigasheet.com/spreadsheet/Felony-Sentence-2010-2020-csv/71dbef04_e629_43ca_b8c4_007de9244fd6

Looks like “drug” charges are usually the top over the course of the 10 years and 2012 was the worst year for crime between 2010-2020

Dataset Source: https://opendata.dc.gov/datasets/DCGIS::felony-sentences/explore

submitted by /u/sheetheadd
[link] [comments]

0

Any Sleep Quality Datasets Based On Lifestyle Factors?

For a data analysis project, I’m looking for a reliable dataset about how sleep quality is affected by different genetic and lifestyle qualities.

Things like: gender, age, caffeine/alcohol consumption, exercise frequency, etc.

Something with labels like this one would be optimal: https://www.kaggle.com/datasets/equilibriumm/sleep-efficiency – however I can’t confirm the authenticity of this data.

Any resources would be greatly appreciated!

submitted by /u/Ok_Afternoon_1720
[link] [comments]

0

Dataset Of Medical Case Scenarios And Appropriate Diagnosis

I’m looking for a dataset that contains a medical case examples and the diagnosis presented in the case.

Example of what I’m talking about:

[“Bob has been having issues with excessive thirst and blurry vision. He has elevated levels of glucose in the urine and blood.”, “Diabetes”]

I’m not too picky about the format as long as the diagnosis is seperate from the scenario and the formatting is consistent.

Artificial datasets are okay, maybe even preferred, as long as they’re accurate.

submitted by /u/flavorfulcherry
[link] [comments]

0

[Synthetic] DatasetGPT – A Command-line Tool To Generate Datasets By Inferencing LLMs At Scale. It Can Even Make Two ChatGPT Agents Talk With One Another.

GitHub: https://github.com/radi-cho/datasetGPT

It can generate texts by varying input parameters and using multiple backends. But, personally, the conversations dataset generation is my favorite: It can produce dialogues between two ChatGPT agents.

Possible use cases may include:

Constructing textual corpora to train/fine-tune detectors for content written by AI. Collecting datasets of LLM-produced conversations for research purposes, analysis of AI performance/impact/ethics, etc. Automating a task that a LLM can handle over big amounts of input texts. For example, using GPT-3 to summarize 1000 paragraphs with a single CLI command. Leveraging APIs of especially big LLMs to produce diverse texts for a specific task and then fine-tune a smaller model with them.

What would you use it for?

submitted by /u/radi-cho
[link] [comments]

0

Suggestions For Ecology Dataset For Classification

I’m looking for a dataset similar to the Amphibians dataset from UCI for an undergraduate data science project. It should be a classification problem, i.e. presence/absence of a species dependent on habitat characteristics such as temperature, type of vegetation, size of water reservoir, amount of rainfall, distance to roads/civilisation, etc.

It should include

>15 numerical and categorical features >300 observations temporal and/or spatial data if possible, so I can play around with some heat maps or time series analysis.

Any hints are highly appreciated as I’m a beginner and I’ve been scrolling my eyes out on kaggle etc. all weekend.

submitted by /u/apex—-predator
[link] [comments]

0

Looking For Data On Enrollment And Education In The USA

Where might I find some public or purchasable data on recent school enrollment K-12, postsecondary, etc, in the USA? Needing both public schools and private or charter schools, by state, by grade, if possible. I’ll take what I can get!

submitted by /u/LongDrawn
[link] [comments]

0

Finding Datasets For Computer Vision

Hello! I’m a senior electronics engineering student. My friend trying to make a blind-assistant that helps blind people to differentiate same form-objects as like Coca-Cola vs Sprite. He design a hardware with esp8266 and uses a cloud for storing datasets. We create a dataset with taking photos of cokes however its hard to creating for all stuff. Is there any solution or resource for finding daily life datasets? We had dive a lot of open datasets CIFAR, Berkley, Kaggle, COCO, MNIST but we required 224×224 pixels for our ML model.

submitted by /u/yagmurxyildiz
[link] [comments]

0

[REQUEST] I Am Looking For A Dataset Regarding Hair Transplants, Either Global Or Local

So I have been looking around for some time and havent been able to find a dataset, whether global or local coverage.

I am looking for either raw data or clean data so that I can use it a project, there are a lot of statistics out there, but where is the raw data coming from?

Thanks for your help

submitted by /u/zmkarakas
[link] [comments]

0

[Request] Any Datasets For Counties In California And Their Party Alignment?

submitted by /u/gend3rplasma
[link] [comments]

0

Where We Actually Buy Big Data For Company?

Hi

I’m wondering where I can buy machine learning data directly for my project/product. Let’s say it’s a music or allergy app. I would like to connect a chat/predictor which, based on a few data, is able to indicate a certain percentage of something. However, large amounts of data are needed to train such algorithms. Where can you actually buy them?

submitted by /u/jackoborm
[link] [comments]

0

Looking For A Dataset On Cartel Violence In Mexico

Hi, I am looking for a data set that is specific to cartel-related violent crimes in Mexico. I know that INEGI has some crime data, but I can’t seem to find anything related to the cartels.

By violent crimes, I mean Homicides, Kidnappings, Robberies, etc.

Any help would be great.

submitted by /u/commanderzemm
[link] [comments]

0

The Largest Dataset Of Graded Diamonds On Kaggle

Hi there!

I just put up a new dataset on Kaggle. It’s cryptically titled The largest diamond dataset currently on Kaggle

It has just under 220,000 diamonds and 25 columns of data making it about 3x larger than next largest. I think it’s perfect for regression models and there is an attached notebook.

This is my first submission to Kaggle so I’d be very much interested in any feedback you might have.

Thanks!

submitted by /u/hrokrin
[link] [comments]

0

Hybrid Vehicle Market Share Data Over The Last 5 Years?

Any ideas where I can find data for hybrid vehicle market share in the US over the last five years? Google has been most unhelpful.

submitted by /u/RefrigeratorTall1093
[link] [comments]

0

How To Migrate Data From One Software To Another?

Hello,

I am working on building a property management software, and i wanted to know how could I migrate data from one software to another to make customer onboarding easier.

Is there a third party tool to help with this? If so, how would it work?

Thank you.

submitted by /u/Substantial-Art-9322
[link] [comments]

0

Crimes In Boston During Covid-19 (2020-2021)

Interesting dataset pulled from Boston’s Official Government Site. I definately heard about the spike of crimes that occurred during height of Covid, so I decided to merge the two CSVs from 2021 and 2020. It also helps depict/infer the safest streets in Boston.

Curious, is anyone else interested in a specific location/city and it’s crime data? I see tons of datasets like this online. Would love to share and see some interesting ones!

Click here to view the dataset: https://app.gigasheet.com/spreadsheet/2020-2021-Covid-Crime-in-Boston/94982770_3c8c_48fb_9176_efeb72becdd8

submitted by /u/sheetheadd
[link] [comments]

0

Skin Dataset For People Of Color For A Skin Disease.

With the bias of most face detection algorithms against marginal groups such as people of color I’m working a project for skin disease for people of color and would like to know where I can find dataset for people of color. Thanks

submitted by /u/Think_Huckleberry299
[link] [comments]

0

[self-promotion] Pulsating Stars Data

Data from pulsating stars explained in this data story: https://www.marpledata.com/data-stories/disco-balls-in-space-how-pulsating-stars-work

Data itself is available at https://archive.stsci.edu/

submitted by /u/mbaerto
[link] [comments]

0

Does Anyone Know Where I Can Find A Reliable Dataset That Lists All Airports With Geolocation?

Hey everyone,

I’m working on a map project that needs a list of all airports worldwide along with their geolocation coordinates. I’ve searched online, but I’m having trouble finding a reliable/up to date source.

I was wondering if anyone here knows of a dataset that has it? It would be great if the data included the airport IATA code, and latitude/longitude coordinates.

If anyone has any suggestions or recommendations, I’d greatly appreciate it.
Thank you in advance!

submitted by /u/px07x
[link] [comments]

0

Where Can I Find Reliable Datasets For Luxury Shopping Trends And Consumer Behavior Analytics?

I’m a beginner to data analytics, I’m applying for data analyst trainee position in Dubai, i need to create a portfolio showcasing skills in SQL, Excel & Tableau. the company I’m applying for focus luxury goods consumer behavior and shopping in middle east and GCC countries.

all help will be much appreciated. Thanks

submitted by /u/Fit-Bird-1601
[link] [comments]

0

Does Anyone Have A Vix Futers Dataset, Or Know Where To Find It?

submitted by /u/MrZwink
[link] [comments]

0

Poker Hands (with Labels For Raise, Check And Fold)

I was wondering if anybody knows of a location I could get some form of dataset with the structure aforementioned in the above. I’m looking to create a supervised learning classification model that takes a set of poker hands (hold-em style I think) that predicts raise, check or fold based on the cards presented. If it were trained on a dataset from professional poker players I’d imagine it would make plays very similar to them, as such it could be rather successful.

My only other option for gathering this data, I thought, would be to host a simple web app that shows the user 5 cards and asks them whether they want to raise, check or fold, and post it on forums (here?) and and gather the data from the responses into a large database. This however may result in bad plays from users that don’t know how to play poker, and bogus answers, so I’d rather stay away from that.

submitted by /u/ryanward02
[link] [comments]

0

Looking For Galaxy Dataset Containing Celestial Object Location For A Snapshot In Time

Hi, I’m looking for a space dataset about a specific galaxy. Any galaxy will do. It needs to have spacial information for each celestial body (planet, star, black hole) for a snapshot in time, so I’m thinking an x, y, z value. I want to know each object’s location in the galaxy. It would also be nice if the dataset contained what each object is (star, planet, black hole). It could also go into more specifics about the class of the type of object it is like dwarf star, gas planet, etc & the size of the object or its radius. I’m planing on using this dataset for an art project for one of my classes. Thank you.

submitted by /u/michaelbschulte21
[link] [comments]

0

[Self-promo] Carbon Removal & Intensity Data From CDR.fyi And Our World In Data On Snowflake

Cybersyn data available on Snowflake Marketplace: https://app.snowflake.com/marketplace/listing/GZTSZAS2KEU/cybersyn-inc-environmental-tracking

Data sourced from CDR.fyi and Our World in Data.

Our World in Data publishes the carbon intensity of electricity in grams CO2e per kWh by country by year from 2000. This data measures how much CO2 it takes to produce a given amount of electricity. Determine which countries have improved their carbon footprint over time and compare which countries are the most efficient as it relates to carbon emissions from electric use.

cdr.fyi consolidates purchases, deliveries, and verification of carbon removed and stored for +100 years. Carbon dioxide removal (CDR) is the process of removing CO2 from the atmosphere and durably storing it to create negative emissions. This data set shows activity in the marketplace for carbon credits including CDR sales, deliveries, and price. The data shows which buyers and suppliers are most active in the CDR market as well as which types of CDRs are gaining and losing share. Note that all deals have CO2 tonnage associated with them, but only a subset of deals have dollar sales and price.

About Us: Cybersyn is a DaaS (data-as-a-service) company, whose mission is to make the world’s economic data transparent to governments, businesses, and entrepreneurs and enable a new generation of decision makers.

submitted by /u/aiatco2
[link] [comments]

0

[REQUEST] Annotated Images Of Ambulances

Required for my university project, I am looking for images of ambulances, where the “ambulance” sticker is annotated. Although, images of ambulances itself will work fine as well, I can annotate them later. Thanks for the help in advance!

submitted by /u/IshanDandekar
[link] [comments]

0

Val Dataset Of ImageNet For My Research

Hi everyone,

Can someone help me with the val folder of ImageNet ? (Validation Dataset only) Thank you very very much !

submitted by /u/zardeb36
[link] [comments]

0

A Deep-learning Search For Technosignatures From 820 Nearby Stars

Keras code (and link to data) at https://github.com/PetchMa/ML_GBT_SETI/blob/4096_pipeline/test_bench/VAE_NEW_ACCELERATED-BLPC1-8hz-1.ipynb

And Nature Paper https://www.nature.com/articles/s41550-022-01872-z

submitted by /u/cavedave
[link] [comments]

0

Category: Other Nonsense & Spam