Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

AI Solutions For Preprocessing Messy CSV Files

I’m dealing with a multitude of CSV files where the formats and structures vary widely, with mixed styles, inconsistent headers, and sometimes even headers smack in the middle of the data. It’s a nightmare for any machine learning endeavor.

Manually cleaning and preprocessing these files would be imposible as there are too many small tables, and I’m wondering if there’s an out-of-the-box AI or deep learning solution that can help. Ideally, I’m looking for something that can among other preprocessing steps:

Identify and standardize headers Split tables if there’s an unexpected header in the middle Fill in missing values Turn these chaotic CSVs into clean, ML-friendly tables

Has anyone encountered a tool or model that can handle such tasks? Any recommendations or advice would be a lifesaver!

Thanks in advance for your help!

submitted by /u/Apprehensive_View366
[link] [comments]

Not Sure If This Is The Right Sub Or Not… But Any Suggestions On Software I Can Use To Automatically Calculate And Display (via Visual Graphs) The Profits/losses Of Crypto Purchases And Sales?

I have a list of data that consists of “Buy” and “Sell” entries from beginning of 2022 to current date.
Buy contains what coin was bought, its price, how much was bought, and how much was spent total… Then the same for sell. My goal is to compile it all into a visual representation that shows the profits and losses over the quarters of that year and the 2 quarters of this year.
Is there any resources online that can help me with this?

submitted by /u/Uprising_Downfall
[link] [comments]

A Request For Help On Where To Find Free Data

Hello, I am not a statistician so I’m unfamiliar with searching for data so help on where to find free data would be appreciated. I’m specifically looking for data on the quantity sold instead of sales for the Sporting Goods Retailers industry in the United States for the past 5 years (monthly). Thank you in advance!

submitted by /u/QDibie
[link] [comments]

Can You Recommend An App For Managing Large Lists Of Contact Information?

Hi r/datasets

I’m learning about how to manage upwards of 500,000 customer profiles that would include things like:

Name Profession Email Phone Address Business Address Instagram Tik Tok Twitter YouTube

I’ll need to be able to search this database based on certain criteria (IE: search all Influencers, search all male customers, search all Texas customers) as well as export and share lists.

This can obviously be done in Excel or Sheets but I was looking for something with more modern UX and an inherent focus on contact management.

Any direction appreciated.

submitted by /u/Old-Act3456
[link] [comments]

Looking For Datasets On SuperCell And Mobile Games

Hi there, I’m currently working on a group project where we were assigned to make a dashboard for a specific mobile app developer. We were assigned SuperCell, but are finding it difficult to acquire reliable and free datasets. Kaggle has some options, but we need plenty more. Specifically we are looking for data that could be used on a dashboard such as revenues, amount of downloads, review data, active users, etc… PS: More generic datasets about the mobile gaming industry in general are also useful. THX in advance.

submitted by /u/A_Succulent_Eggplant
[link] [comments]

Chemical Reaction Datasets Or Websites To Scrape

Hey there, I know it has been asked a couple of times before, but I could not get a good source from them, and besides my request is perhaps simpler.

I am looking for a dataset of chemical reactions, the simplest possible, to construct an interaction graph, e.g. from the reaction H + 2O -> H2O, I would construct two edges between (H, H2O) and (O, H2O). Is there a database with a bunch of reactions of any kind which I could use?

Alternatively, if you know a website whose HTML could be scraped, I could also work with that.

Thanks

submitted by /u/qotsalo
[link] [comments]

Need Help ForBangladeshi’s Car Market Data

My boss asked me to make the Target Group for our new products. One is an entry-level sedan, the second one is a 5-seat SUV and the last one is a full-sized 7-seater SUV. I’ve to make three TG for these 3 models. I’ve collected data on the population of Bangladesh by age group, how many people live in one urban area, and trying to relate it to the income level of population. But it is very hard to quantify how many people can buy our products. Can someone help me with this problem with suggestions or solutions?

submitted by /u/drdoctor98
[link] [comments]

Need Help With Indian F&O Data Collection From NSE

I have a small doubt hoping you can clarify. I’ve been trying to collect F&O daily bhav copies from NSE from 2011 to 2022. I was successful with doing so from years 2016 onwards using some libraries on python.

However, a lot of people on the internet including myself have been facing the issue of downloading bhav copies prior to 2016 because the new NSE website is pretty shitty that way (it’s storing the csv file in a zip so the API can’t access the csv directly).

If you have some time you spare, will you be able to help me out? It’s for a research project I’m working on!

Thank you in advance 🙂

submitted by /u/jingbolosodabama
[link] [comments]

Dataset Related To Sustainable Development Goals (SDG)

I’m working on a data mining school assignment, with a primary focus on quality education/decent work and economic growth. However, I’m open to exploring datasets related to any other SDGs as well.

I’m looking for two datasets with the following criteria:

At least 10,000 observations and six variables per dataset They must be mergeable Must be related to one of the SDGs

I’ve already searched on Kaggle but I haven’t found suitable datasets. If you have any suggestions or if you know of an easier way to filter search results effectively it would be much appreciated.

submitted by /u/Lyn03
[link] [comments]