Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Effective Method For Finding Common Colleges In Two Excel Sheets Despite Inconsistent Formatting

I have two excel sheets both containing huge set of data of colleges names in different formats and abbreviations. I want to find the list of colleges common in both the sheets, however because of inconsistency in format names of colleges it is proving to be very tedious and difficult to do so. kindly suggest the best effective method to do the work.
Is there any way to do so in excel with the help of some other tool or maybe some in-build tools in excel. I have already used filters like sort, find and replace filters etc.

submitted by /u/Darkness-of-Light
[link] [comments]

Looking For Dataset, Consisting Of Invoices And Receipts With The Corresponding General Ledger/ERP Entries

Dear community, I’m in search of a comprehensive dataset that includes Receipt Data and Invoice Data, with more than 100,000 item-lines in formats such as PDF, JPG, etc. Additionally, I need the corresponding general ledger/ERP entries, including the chosen account according to the chart of accounts, VAT, and so on.
I haven’t been able to find anything on the web. Does anyone know where I can obtain such datasets?

submitted by /u/Altruistic-Box-5744
[link] [comments]

Workout Logs (Strength Training) – Exercise, Weight, Reps

Hey everybody,

I’m currently building something that relates molecular biology, time-series algos and more to optimize muscle and strength building.

For that I need data in the form of workout logs from people. They should look something like this:

Deadlift 180kg 1×3

Squats 100kg 3×12

Lying Hamstring curls 50kg 3×8

Would help me out immensely if you have such a dataset / know someone who does and are willing to share it!

In return, everyone who contributes is invited to use the beta version for free of course!:)

Cheers,

Tim

submitted by /u/Biotential
[link] [comments]

Better Way Of Preparing Datasets For Finetuning With Large Text In Each Example???

Better way to prepare datasets ?

I have my datasets in format :

text : length 19k

extracted entity 1 : list of entity 1 extracted

extracted entity 2 : list of entity 2 extracted

Does anyone have idea on how to finetune opensource model with this kind of data .

Is finetuning better option becuase the model(llm) have to learn to extract items from the text and length of text is so large ?

Example : I have train a llm model to look at whole book text and extract author name, place name, people name Now I have 100 of books data how can I proeare datsets to fine-tune llm to be very good at extracting also consider I have supervised data of book text with extracted author, people name place name from whole text……
How can I finetune a good model let me know

submitted by /u/Guilty-Tea6607
[link] [comments]

Looking For Historical Dataset(s) On Monthly Gas And Electricity Prices And Caps In Bristol (or UK Regions)

Hi, I am doing a University Project, and part of it is creating a bill prediction service and I am having a really tough time finding good sources for what I need. I’m focusing on Bristol at the moment to help with initial development, but if there are datasets based on regions that would work too.

I need the average monthly cost for electricity and gas (separately) in Bristol dating back to at least 2018/19 to 2024, ideally with upper and lower values. I also need the price cap data (unit rates, standing charges) for those periods of time, which typically change every three months and have been posted by Ofgem, however I cannot seem to find any sources for previous years – only the current year.

I’d really appreciate any help, as I’ve said, I am really struggling to find valid datasets.

submitted by /u/JamesCompSci
[link] [comments]

Seeking Data For Analyzing Niches And Growth Trends In The Data Analytics Industry

Hey Everyone!

I’m currently working on a project focused on analyzing different niches and growth trends within the data analytics industry. My objective is to gain insights into emerging trends, market opportunities, and career prospects within various niche segments of the data analytics field.

I’m reaching out to this community to seek assistance in gathering relevant datasets for my analysis. Specifically, I’m looking for datasets that include information on:

Market size and growth rates of different niche segments within the data analytics industry. Job demand, postings, and salary trends for various data analytics roles. Emerging technologies, tools, and applications in specialized areas of data analytics. Industry reports, research studies, or surveys providing insights into niche markets and trends.

I’m open to suggestions and recommendations for reliable sources or datasets that could contribute to my analysis. Any publicly available datasets, research reports, or academic publications related to the data analytics industry would be greatly appreciated.

Your assistance in finding suitable datasets for this project would be invaluable to my research efforts. Thank you in advance for your help and contributions.

submitted by /u/vizwha
[link] [comments]

Looking For Datasets On Environmental Health

My project partner and I would like to analyze the association between air pollution, floods, and other environmental concerns and health outcomes like respiratory diseases, prenatal health, premature birth, etc. I’ve been looking for datasets for this specific aim but haven’t found one. There are multiple studies on this topic, but I can’t seem to access the datasets.

submitted by /u/Introvertedwin
[link] [comments]

MySQL Error While Importing Data (importing Csv But Getting Error)

I am trying to import a csv file to my mysql localhost server, but this error is coming:
Unhandled exception: ‘charmap’ codec can’t decode byte 0x8d in position 4887: character maps to <undefined>
I’ll link the csv file too, please do try to import it, if you are successful then PLEASE HELPPPPPP!
link: https://drive.google.com/file/d/16s54EfGnKFeedkD0Z-JItt_piqKPA370/view?usp=sharing

submitted by /u/Swat_Sam2
[link] [comments]

Help I Need Datasets For My Stats Class!

I am stuck on part 2! I am unable to find similar datasets with at least 100 values. I’m hoping for 150 to 200. Please Help! I can do parts 3 & 4 I just can’t find that data! at this point, I don’t care what the data pertains to!

The Project (Outlined Below)

The project must be submitted as a single PDF file after completing the following tasks.

Only one group member needs to submit the report. Note that if you submit a DOCX

or XLSX file for your project, you will receive a score of 0.

Download two data sets you can compare containing similarly quantifiable information (such as stock prices, economic indicators, sports analytics, and weather forecasts) that have at least 100 data values each. If you downloaded a .csv file, save it as a .xlsx file. You can find data sets on dataset search. research. google.com, data.gov, or simply Googling “public data sets”.

Set up the file with two data sets of equal size (at least 100 data values each).

Create a frequency distribution table and frequency polygons of both data sets.

Use the minimum value in the data set as your lowest class limit.

Compute the mean, median, variance, standard deviation, coefficient of variation

of each dataset.

submitted by /u/Amokittenss
[link] [comments]

Help With Data Analysis Project (mysql Online Server Help)

I have to create a power BI project with a data which should be present in MySQL online hosted server But the problem is that the data which i have is 2 tables with 130k rows each (csv files), and i made a mysql server on freemysqlhosting.net but there are 2 problems, firstly it has a 5mb limit for the database Secondly each row takes about 4 seconds to upload And on this speed i think itll take 6 days to just upload 1 table

Is there any other way to do this? Maybe something like, i could make the database in the local mysql server with the tables which doesn’t take much time and then i could maybe set up this server to be accessible to publoc somehow Please help🥲

submitted by /u/Swat_Sam2
[link] [comments]

Datasets Or Pre-trained Models For Banner Ad / Marketing Text Classification?

I am trying to find good datasets for classifying web images as ads, so that I can use it to train an image classification model for filtering out ads and only downloading useful image content from websites. I would also be interested in sets for classifying marketing/ad text to help with filtering out ad captions as well. I’m suspecting that there might be issues with copyright that are preventing people from releasing ad sets publicly, but I’m hoping that something is out there.
I found this dataset on PapersWithCode, and several sets that use old banner ads from the 90s/early 2000s, but I am wondering if there are any other publicly available web ad datasets with more recent data.
Does anyone have suggestions on good quality public datasets or preexisting classification models for ad detection?

submitted by /u/jferments
[link] [comments]

Any Tips On Healthy Lymph Node WSI Image Dataset?

Hi all, hope you guys are doing well!

I am doing a project on lymphoma detection using WSI images of lymph node tissues. I am a bit stuck as I cannot find any control dataset for this project. I am looking for a dataset which contains WSI images of healthy Lymph node tissues which can help me in the classification model.

Please leave any tips or suggestions that can be helpful

submitted by /u/gauravvvvvv
[link] [comments]