Category: Datatards

Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?

Better Way Of Preparing Datasets For Finetuning With Large Text In Each Example???

Better way to prepare datasets ?

I have my datasets in format :

text : length 19k

extracted entity 1 : list of entity 1 extracted

extracted entity 2 : list of entity 2 extracted

Does anyone have idea on how to finetune opensource model with this kind of data .

Is finetuning better option becuase the model(llm) have to learn to extract items from the text and length of text is so large ?

Example : I have train a llm model to look at whole book text and extract author name, place name, people name Now I have 100 of books data how can I proeare datsets to fine-tune llm to be very good at extracting also consider I have supervised data of book text with extracted author, people name place name from whole text……
How can I finetune a good model let me know

submitted by /u/Guilty-Tea6607
[link] [comments]

Looking For Historical Dataset(s) On Monthly Gas And Electricity Prices And Caps In Bristol (or UK Regions)

Hi, I am doing a University Project, and part of it is creating a bill prediction service and I am having a really tough time finding good sources for what I need. I’m focusing on Bristol at the moment to help with initial development, but if there are datasets based on regions that would work too.

I need the average monthly cost for electricity and gas (separately) in Bristol dating back to at least 2018/19 to 2024, ideally with upper and lower values. I also need the price cap data (unit rates, standing charges) for those periods of time, which typically change every three months and have been posted by Ofgem, however I cannot seem to find any sources for previous years – only the current year.

I’d really appreciate any help, as I’ve said, I am really struggling to find valid datasets.

submitted by /u/JamesCompSci
[link] [comments]

Seeking Data For Analyzing Niches And Growth Trends In The Data Analytics Industry

Hey Everyone!

I’m currently working on a project focused on analyzing different niches and growth trends within the data analytics industry. My objective is to gain insights into emerging trends, market opportunities, and career prospects within various niche segments of the data analytics field.

I’m reaching out to this community to seek assistance in gathering relevant datasets for my analysis. Specifically, I’m looking for datasets that include information on:

Market size and growth rates of different niche segments within the data analytics industry. Job demand, postings, and salary trends for various data analytics roles. Emerging technologies, tools, and applications in specialized areas of data analytics. Industry reports, research studies, or surveys providing insights into niche markets and trends.

I’m open to suggestions and recommendations for reliable sources or datasets that could contribute to my analysis. Any publicly available datasets, research reports, or academic publications related to the data analytics industry would be greatly appreciated.

Your assistance in finding suitable datasets for this project would be invaluable to my research efforts. Thank you in advance for your help and contributions.

submitted by /u/vizwha
[link] [comments]

Looking For Datasets On Environmental Health

My project partner and I would like to analyze the association between air pollution, floods, and other environmental concerns and health outcomes like respiratory diseases, prenatal health, premature birth, etc. I’ve been looking for datasets for this specific aim but haven’t found one. There are multiple studies on this topic, but I can’t seem to access the datasets.

submitted by /u/Introvertedwin
[link] [comments]

MySQL Error While Importing Data (importing Csv But Getting Error)

I am trying to import a csv file to my mysql localhost server, but this error is coming:
Unhandled exception: ‘charmap’ codec can’t decode byte 0x8d in position 4887: character maps to <undefined>
I’ll link the csv file too, please do try to import it, if you are successful then PLEASE HELPPPPPP!
link: https://drive.google.com/file/d/16s54EfGnKFeedkD0Z-JItt_piqKPA370/view?usp=sharing

submitted by /u/Swat_Sam2
[link] [comments]

Help I Need Datasets For My Stats Class!

I am stuck on part 2! I am unable to find similar datasets with at least 100 values. I’m hoping for 150 to 200. Please Help! I can do parts 3 & 4 I just can’t find that data! at this point, I don’t care what the data pertains to!

The Project (Outlined Below)

The project must be submitted as a single PDF file after completing the following tasks.

Only one group member needs to submit the report. Note that if you submit a DOCX

or XLSX file for your project, you will receive a score of 0.

Download two data sets you can compare containing similarly quantifiable information (such as stock prices, economic indicators, sports analytics, and weather forecasts) that have at least 100 data values each. If you downloaded a .csv file, save it as a .xlsx file. You can find data sets on dataset search. research. google.com, data.gov, or simply Googling “public data sets”.

Set up the file with two data sets of equal size (at least 100 data values each).

Create a frequency distribution table and frequency polygons of both data sets.

Use the minimum value in the data set as your lowest class limit.

Compute the mean, median, variance, standard deviation, coefficient of variation

of each dataset.

submitted by /u/Amokittenss
[link] [comments]

Help With Data Analysis Project (mysql Online Server Help)

I have to create a power BI project with a data which should be present in MySQL online hosted server But the problem is that the data which i have is 2 tables with 130k rows each (csv files), and i made a mysql server on freemysqlhosting.net but there are 2 problems, firstly it has a 5mb limit for the database Secondly each row takes about 4 seconds to upload And on this speed i think itll take 6 days to just upload 1 table

Is there any other way to do this? Maybe something like, i could make the database in the local mysql server with the tables which doesn’t take much time and then i could maybe set up this server to be accessible to publoc somehow Please help🥲

submitted by /u/Swat_Sam2
[link] [comments]

Datasets Or Pre-trained Models For Banner Ad / Marketing Text Classification?

I am trying to find good datasets for classifying web images as ads, so that I can use it to train an image classification model for filtering out ads and only downloading useful image content from websites. I would also be interested in sets for classifying marketing/ad text to help with filtering out ad captions as well. I’m suspecting that there might be issues with copyright that are preventing people from releasing ad sets publicly, but I’m hoping that something is out there.
I found this dataset on PapersWithCode, and several sets that use old banner ads from the 90s/early 2000s, but I am wondering if there are any other publicly available web ad datasets with more recent data.
Does anyone have suggestions on good quality public datasets or preexisting classification models for ad detection?

submitted by /u/jferments
[link] [comments]

Any Tips On Healthy Lymph Node WSI Image Dataset?

Hi all, hope you guys are doing well!

I am doing a project on lymphoma detection using WSI images of lymph node tissues. I am a bit stuck as I cannot find any control dataset for this project. I am looking for a dataset which contains WSI images of healthy Lymph node tissues which can help me in the classification model.

Please leave any tips or suggestions that can be helpful

submitted by /u/gauravvvvvv
[link] [comments]

Need Help Finding Insurance Claim Dataset

Hi everyone,

I’m working on a project to create a dashboard for visualizing and analyzing insurance claims processing efficiency, and I’m in search of a suitable dataset to fuel this endeavor.

I’m aiming to develop a comprehensive dashboard that tracks metrics such as claims cycle time, processing costs, and customer satisfaction scores. To achieve this, I need a dataset containing diverse information including individual insurance claims data, policyholder demographics, adjuster reports, customer feedback, and operational performance metrics.

Does anyone know where I can find such a dataset or recommend reliable sources for insurance claims processing data?
Any suggestions or leads would be greatly appreciated! Thank you in advance for your help.”

submitted by /u/No_Track9088
[link] [comments]

Dataset For Books Published By Genre Over Time

Hello, hoping to identify a dataset that shows the number of books published by year by genre (e.g., 100K fantasy books published in 2018 vs 90K in 2017), or another proxy for popularity (e.g., sales). Particularly indexed on the (1) Fantasy and (2) Romance genres.

I have tried a few angles:

Library Datasets – Seattle Public Library reports checkouts by year by title, however this seems to be the exception and other major libraries do not report this same data ISBNDB – Based on ‘database’ page, it does not appear to include genre in the dataset (closest is Dewey decimal for select rows)

Fine with leveraging a paid database / report to improve approachability of the dataset.

Thank you for any guidance you can provide.

submitted by /u/Acrobatic_Scheme4448
[link] [comments]

Looking For A National Budget Dataset

Hi everybody, I am writing a paper about the effects of politics on military spending and found a website with an amazing excel spreadsheet that had each country and data from the 1940s to present. It had various tabs with GDP, national budget, military spending, etc. I used it for my data sets in STATA, but found it on a library computer and forgot to save the link or write down the website and now am looking everywhere for it to cite in my bibliography and cannot find it. If anyone knows what spreadsheet I’m talking about or could help me find it I would be extremely grateful!

submitted by /u/Responsible_Ear_279
[link] [comments]