Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Looking Dataset For Kaplan–Meier Estimator

Hi!

I have searched around the web and I can’t find any good dataset for Kaplan–Meier method which I need for school work. I’m looking for datasets where each entry is about an individual and has info about the start and end of some event measurement. In principle, I don’t care what the data should be about, but prefer that it isn’t about the survival rate of people.

So far I have searched for:

Tried to find a dataset about marriages (but usually no label about the end of marriages)hod. In principle, I don’t care what the data should be about, but prefer that it isn’t about the the survival rate of people. Tried to find a dataset about marriages (but usually no label about end of marriages) Unemployment duration

submitted by /u/HBlackwooder
[link] [comments]

[Data Cleaning] Time Data Missing Values

Hello, Please I want your help with an issue in a data science project… In the step of handling missing values, I handle continuous data by replacing it with the mean, but for time data, I don’t think it’s the right approach. I found out that there are two ways to do it: Forward Fill (ffill()) or Backward Fill (bfill()) and Linear Interpolation. However, I’m still wondering which one to use because it’s the first time I’m dealing with null values for time data.

submitted by /u/t_abdessamad
[link] [comments]

Looking For Social Platform Trends Data

Hey all,

I’ve been looking for a good source of pre-sanitized, collated social platform data organized by topic to run my LLM on. Wondering how people find such datasets (Google, Reddit, scholarly articles, etc) / if anyone has had luck with any specific providers recently. Thanks!

submitted by /u/mstahl23
[link] [comments]

How Does One Create A Dataset For An LLM AI Based Off Specific Content From A Website.

Ive started playing around with custom AI models because I was bored and it looked fun from things I’ve seen in YouTube. I’ve created characters, tested different models and had loads of fun learning and playing. But now I want to “fine tune” the local model I’m using on specific data for it to pull from.

The overall goal is to have this chatbot assist me in writing wiki articles and events for an online roleplay thing, I want it to have access to all 7,567 already created articles that the community has made so it can pull information and make enhance my writing and suggestions with cannon responses.

How…. how would I do that? As in get the data and put it in a format that could be used for fine tuning. The YouTube tutorials I’ve seen generally focus on “reverse engineering” midjouney prompts or medical questions.

submitted by /u/Jakob4800
[link] [comments]

How Can I Get Data From Statcan With Characteristics ?

I usually study on data that is ready in the server so I have no idea how to get it from StatCan. I read their website, but it might be I’m not a dev so … still have no clue at all.

For instance, I want the report of persistence and graduation of doctoral degree students, within Canada, by student characteristics ( including sex, age, marital, father/ mother occupation, scholarship, funding, location, household income…. ) for a period.

Where I can get all the tables I need? I would prefer the flat files CSV.

I downloaded files from website, but it’s not data same as what I got from Kaggle.

TIA!

submitted by /u/Whatswrongwithman
[link] [comments]

IMDB Vs TMDB – Advice For Recommender System

I’m building a film recommendation system, I have a large csv file with film data scraped from the IMDB dataset which I plan to use to build the machine learning model, at the same time I’m using theMovieDB api to get some extra film details like plot summary.

I’m using around 300,000 films from IMDB, and some records are missing certain data, like editor, cinematographer etc., and I’m not sure how much more data each dataset has on a film compared to the other.

Would it be better to consistently use TMDB api to display film data on the frontend, and only use IMDB to build the ML model, or consistently use the IMDB csv throughout my system for the model and for displaying film details. Alternatively I could cross-reference both sources but I’m wary of contrasting data in both datasets.

Any advice is appreciated

submitted by /u/wobowizard
[link] [comments]

In Need Of A Dataset That Has Over 1000 Rows

im currently doing a school project right now and it requires me to have a dataset that has over 1000 rows and able to download into google sheets. im currently on a mac computer so i was wondering if anyone could reply to this with links to ones that would also be available to open on this device. thanks

submitted by /u/formithica
[link] [comments]

Need Help To Access The IAM Handwriting Dataset

I need help with the IAM handwriting dataset as I cannot access it from anywhere, I don’t even have an account from which I could remote download.

Can anybody please provide me a working link to that dataset (gdrive, mega, anything). If you have ever download it and have it in your drive can you please share.

This is the link to the dataset: https://fki.tic.heia-fr.ch/databases/iam-handwriting-database

submitted by /u/Chiragrvijay
[link] [comments]

Will A Tool Like This Help You In Visualisation?

We are working on Mokkup.ai

https://www.mokkup.ai/

Which is a dashboard wireframing tool that helps create high fidelity wireframes in minutes, even for people with no design acumen.

We are targeting data analysts, PMs, developers, HRs, other business teams and stakeholders. It’s super simple to use, with drag and drop elements, 150+ pre built templates spanning across industries and for several, custom use cases.

Will a tool like this help you to create a dashboard to translate your ideas before moving to working w real data sets?

I’d love to hear about your reviews, thoughts about what we are creating! This year we have geared up to do some mad business so your every insight and comment would be incredibly valuable. Thank you!

submitted by /u/Hamburgerleader
[link] [comments]

Looking For Multivariate Data For Assignment About Microbiology?

Hi everyone, for my doctoral training, I am following a multivariate statistics course. For the exam we need to make an assignment in which we analyse a multivariate dataset of our choice by using different methods (such as PCA, discriminant analysis, factor analysis, biplot, cluster analysis …).

Do you have recommendations for interesting data sets to analyse that are available online. It would be cool if it can be about microbiology (or bacteriophage research) since this is what my doctoral research is about.

Many thanks and happy new year!

submitted by /u/Subject-Extent5978
[link] [comments]

Is There A Dataset Of Fake/fraudulent/pseudoscientific Illnesses And Medical Conditions?

There’s a system that allows users to add their medical conditions from a list. I found that there are some non-existent conditions in the list, things like autistic enterocolitis.

I need a list of conditions that have been claimed to exist but are not recognised by mainstream medicine, so I can make a script to detect the overlap.

Does such a list exist?

submitted by /u/Defiant-Snow8782
[link] [comments]