Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Complete EA Sports FC (formerly FIFA) 24 Dataset Available On Kaggle

Hi r/datasets,

In case anyone is interested in analysing and exploring the latest EA Sports FC 24 dataset, I uploaded at the following link a set of csv files that allow to compare the sofifa player data from FIFA 15 until the latest EA Sports FC 24:

https://www.kaggle.com/datasets/stefanoleone992/ea-sports-fc-24-complete-player-dataset/

Here there is an analysis of players and teams that could serve you as starting point to see how the files can be read and used:

https://www.kaggle.com/code/stefanoleone992/ea-sports-fc-24-players-lineup-visualizations/

Have fun, and please do not hesitate to let me know any further improvement of the files.

Any feedback would be very much appreciated.

Thanks in advance!

submitted by /u/stexo92
[link] [comments]

Morningstar Direct: Excess Return As A Time Series

Hello!

Does anyone know how to get excess return in morningstar direct? I can find the variable but i need it as a time series monthly.

For context: We are looking at the relationsip between fund size and return, and need to run a monthly regression. We can find both the assigned benchmarks and excess return, but only measured from one point to another, not repeating each month.

submitted by /u/J-Stonks
[link] [comments]

Fracking Registry – By State And Operating Company

This data comes via FracFocus, the largest registry of hydraulic fracturing chemical disclosures in the US. The database, available to explore online and download in bulk, contains disclosures from fracking operators; it details the location, timing, and water volume of each fracking job, plus the names and amounts of chemicals used. The project is managed by the Ground Water Protection Council, “a nonprofit 501(c)6 organization whose members consist of state ground water regulatory agencies.”

Here I’ve extracted and combined 28 individual files into one master file for easy of use:

https://app.gigasheet.com/spreadsheet/Fracking-By-State-and-Company—via-FracFocus/600536fd_66b4_4408_9ba6_7deca045ce71

Raw data files:

https://fracfocus.org/data-download

submitted by /u/n1nja5h03s
[link] [comments]

Dataset Needed For College Capstone Project

Im looking for an unclean plant leaves image-dataset (to detect diseases in plants using deep learning), so that i can clean and classify that myself. been looking on google but most of the datasets are cleaned already and separated into healthy and unhealthy classes. Thank you already.

submitted by /u/OwnDot8238
[link] [comments]

Synthetic.mostly.ai – Unlock The Power Of Synthetic Data With MostlyAi – Revolutionizing AI Training! [Synthetic]

Hey Reddit community,

I wanted to share an exciting innovation in the world of AI and data science – MostlyAi, an Austrian startup that’s making waves with its cutting-edge synthetic data solutions.

What is synthetic data, you ask?

Synthetic data is a game-changer for AI development. It’s artificial data generated to mimic real-world data, allowing you to train and test your AI models without compromising privacy or data integrity.

Why MostlyAi?

🚀 Revolutionary Technology: MostlyAi’s synthetic data generation technology is at the forefront of the industry. It’s reshaping how AI models are trained.

🔒 Privacy First: With synthetic data, you can work with sensitive information without the risks. Privacy compliance is a breeze.

💡 Accelerate AI Development: Speed up your AI projects by reducing data collection and cleaning time. Focus on what matters most – innovation.

🌐 Versatile Applications: MostlyAi’s solutions are applicable across various industries – healthcare, finance, e-commerce, and more.

🌟 Trusted by Top Companies: Major players in the tech world are already leveraging MostlyAi to enhance their AI capabilities.

How to Get Started?

Visit mostlyAi’s website here to learn more about their synthetic data solutions, case studies, and the impact they’ve had on AI development.

Have questions or want to try platform for free, no problem https://synthetic.mostly.ai/ is your checkpoint.

submitted by /u/devops_captain
[link] [comments]

[REQUEST] Transactional Email Dataset

I’m looking for a transactional email dataset. By “trasactional email” I’m referring to those emails that you get when, for example, you make a purchase on ebay, get an update on an amazon order, reset your password, register for an event, get comments on a reddit post, etc.

It’s totally fine if the email content contains HTML tags. It would be extra-nice if the dataset has an “email subject” field.

And please, don’t mention the Enron dataset!! Those are mostly conversations; NOT automatic transactional emails.

Any suggestions?

submitted by /u/AshkanArabim
[link] [comments]

I Need Help To Download Cerebras/SlimPajama-627B Datasets, Please.

Hello guys, currently i’m doing research with llama model from Mainland China, but now i got problem with the datasets, this dataset is a 800GB of data, but currently we only can download it from HuggingFace which is blocked in China. So, is there anyone had download it and willing to share the direct link for me? idk, maybe use torrent or somethings, i will be appreciate it, thanks in advance.

submitted by /u/Dandelion_puff_
[link] [comments]

Datasets With Indicators On Primary Healthcare And Prevention

Hi everyone,

I have been looking for a dataset (or several) that contain information about primary healthcare, particularly about some areas of prevention such as digital health, community engagement into designing healthcare prevention strategies, and embedded prevention in general.

In an ideal world I would like the dataset to include information from as many countries as possible although I would take whatever I can get (if there is anything out there at all).

I have been looking for a while but so far I have found nothing with these specific indicators. Sources I have searched so far: ourworldindata, WHO, World Bank and some AI tools to find datasets.

Any help would be greatly appreciated.

Thank you!

submitted by /u/Experience_Designer
[link] [comments]

Seeking Graduate School Admissions Data

I’ve found the troves of data from the department of education on undergraduate admissions. School acceptance rates, ACT / SATs, etc.

Is there any such data for graduate schools or programs? For example, GRE / GMAT data, or simply acceptance rates. Any help would be greatly appreciated!

submitted by /u/crimefog
[link] [comments]

Cybersecurity Breach Data Set With Over 10k Records

Hi everyone,

I’m hoping someone can point me in the right direction. I’m trying to find a cybersecurity breach data set with 10k or more records. I’ve found several incomplete data sets regarding breaches, but nothing that exceeds 10k records.

Here’s a good example of what I’m looking for: https://docs.google.com/spreadsheets/d/1i0oIJJMRG-7t1GT-mr4smaTTU7988yXVz8nPlwaJ8Xk/edit#gid=2

Does anyone know of a similar data set with atleast 10k records?

Thanks in advance!!

submitted by /u/clueless-coder
[link] [comments]

How To Access MEVA Activities Dataset

Hey Guys, I am currently working on a human activity classification project and I have found a dataset which I believe will be very useful for me which is the MEVA(Multi view extended video with activities) dataset, Now I want to access first a small portion of this dataset by downloading it on my laptop but I do not know the proper procedure on how to do that, if anyone has worked with this dataset, and downloaded it ,I would be very grateful if you could assist me on how to access it.

submitted by /u/Demonking6444
[link] [comments]

Dataset Of Concert Ticket Sales For Prediction Model

Can anyone help me find data sets or sites that provide data on concert ticket price history? I am trying to build a dynamic ticket pricing prediction model for concerts. I did try looking into the ticketmaster API and SeatGeek API but I’m a bit confused as I’m still learning the ropes when it comes to data scraping using APIs. I appreciate any pointers you can give me for this problem.

submitted by /u/LumeaHeatherWest
[link] [comments]

👾 Yachay.ai Has Launched A Machine Learning Hackathon Focused On Advancing The Development Of Text-to-location Models

Yachay.ai has launched a Machine Learning Hackathon focused on advancing the development of text-to-location models.

🛠 Participation instructions

🗓 Deadline — November 25th

💎 Reward — $500

Yachay.ai team has built and open-sourced a architecture for geotagging models that take textual inputs and produce coordinates as outputs. Codebase is here. Currently, they are actively seeking ways to enhance validation metrics.

If you are looking for ideas on leveraging geotagging models in your projects, join our discord discussion 👾

Yachay.ai plans to release a downloadable model for Spanish texts and provide more free text/coordinates datasets in the near future. Stay tuned for updates!

submitted by /u/yachay_ai
[link] [comments]

I Have Real Estate Data For Sale – Buyers And Sellers Data 200k+ Lines Including Names And Telephone/ Email

Hi there people, I’m wondering if anyone can point me in the right direction as to where I can sell data I have obtained.

I have comprehensive set of data for sale but dont know where to sell it ,

I have data relating to the purchase and sale of real estate in Dubai. Buyer and seller data base including names of buyers and sellers and all details needed to prospect leads.

The data contains area, property name/building name, seller/buyers name, unit number, sub region, listing price/purchase price, date of purchase / sale, seller/buyers contact number, sellers/buyers email address, seller/buyers id details.

Data available- total lines of data 200k+ available in excel or Google sheet format-

ALL DUBAI MARINA

JVC

JVT

BUSINESS BAY

PALM JUMEIRAH

All DAMAC

SPRINGS

ALL MAJOR APARTMENTS/VILLA COMPLEX-INCUDING SIGNITURE VILLAS

Upto date as of July 2023

Regards

submitted by /u/naughtynatasha93
[link] [comments]

Large Retail Or Manufacturing Datasets

Does anyone here know of any large datasets containing mostly transactional retail or manufacturing data? Preferably multiple tables that are related to each other by primary and foreign keys.

I’m assuming there must be some companies that sell this data to market research companies that we could buy it from if there’s nothing out there for free?

submitted by /u/khaili109
[link] [comments]

Can I Do An Analysis For You For Free?

Does anyone have data they would like a Power BI report made with for free? One stipulation… I want to make a tutorial during the process so you shouldn’t mind the data being shown. Would like the dataset to come with “The question I am trying to answer or insight to gain from all this is…”

submitted by /u/Bombdigitdy
[link] [comments]

Global Dataset For Air Quality Index And Pollutant By Country (and City/state If Possible) Over The Years

Hi! I’m trying to look for a dataset for my university assignment.

I’d prefer if the dataset contains different pollutants such as PM2.5, PM10, O3, NO2, SO2, CO etc. The ones I found are usually either pollutants or AQI only, and in different formats so I can’t combine them easily.

(Optional) Would also be great if the dataset includes contextual data like Temperature, Wind Speed, Humidity, Source of Pollution etc

This would be a great help, thank you so much!!

submitted by /u/jyvenyu
[link] [comments]

How To Create An Image Dataset For Indian Railways Signals?

Hi everyone, I am working on a project that involves machine learning and computer vision. I want to train a model that can recognize and classify different types of signals used by the Indian railways. For this, I need a large and diverse image dataset of railway signals from various locations, angles, lighting conditions, etc.
I have searched online for existing datasets, but I could not find any that suit my needs. So I wish to create my own dataset from scratch. However, I am not sure how to go about it. What are the best practices and tools for creating an image dataset? How do I collect, label, and organize the images? How do I ensure the quality and consistency of the data?

submitted by /u/Responsible-Diver226
[link] [comments]

PubMed Papers & Annotated MESH Terms Dataset?

I’m interested in working on Pubmed/NIH data. I am looking for a dataset of all Medical Subject Headings (MeSH) terms over all pubmed articles (or at least the past few decades of indexed citations), i.e the associated MeSH terms for each article on pubmed, over all the available articles, at the level of individual articles. Is this available? (Preferably, without needing to download and to write parsing code for the full pubmed DB XML dump – which is huge and complex to parse, and using the API per article or term would take forever and be incredibly ineffeccient).

The ideal would be a CSV file or DB dump with with the associated terms, article Id and publication date. Large scale coverage is crucial.

Bonus points if it includes other structured ontology sources per paper, e.g. the associated GO terms.

Thanks very much!

submitted by /u/ddofer
[link] [comments]