submitted by /u/growth_man
[link] [comments]
Category: Datatards
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
Hi everyone,
I’m working on a project to create a dashboard for visualizing and analyzing insurance claims processing efficiency, and I’m in search of a suitable dataset to fuel this endeavor.
I’m aiming to develop a comprehensive dashboard that tracks metrics such as claims cycle time, processing costs, and customer satisfaction scores. To achieve this, I need a dataset containing diverse information including individual insurance claims data, policyholder demographics, adjuster reports, customer feedback, and operational performance metrics.
Does anyone know where I can find such a dataset or recommend reliable sources for insurance claims processing data?
Any suggestions or leads would be greatly appreciated! Thank you in advance for your help.”
submitted by /u/No_Track9088
[link] [comments]
Does anyone know if Commons Crawl include Youtube videos metadata?
If yes, which metadata does it include? Subtitles?
submitted by /u/panqueca_frita
[link] [comments]
Hello, hoping to identify a dataset that shows the number of books published by year by genre (e.g., 100K fantasy books published in 2018 vs 90K in 2017), or another proxy for popularity (e.g., sales). Particularly indexed on the (1) Fantasy and (2) Romance genres.
I have tried a few angles:
Library Datasets – Seattle Public Library reports checkouts by year by title, however this seems to be the exception and other major libraries do not report this same data ISBNDB – Based on ‘database’ page, it does not appear to include genre in the dataset (closest is Dewey decimal for select rows)
Fine with leveraging a paid database / report to improve approachability of the dataset.
Thank you for any guidance you can provide.
submitted by /u/Acrobatic_Scheme4448
[link] [comments]
Hi I’m looking for datasets to train an LLM model. Hopefully someone could recommend a dataset with healthy/diabetic friendly meal recipes so I could make a chatbot to recommend meals
submitted by /u/jrvbwr34bhcmdl
[link] [comments]
I have a large dataset with facebook post that I would like to make public for educational purposes. How would I go about doing that and is there any legal issues?
I am based in the EU.
submitted by /u/fjender
[link] [comments]
I’m currently planning on starting a project to detect or classify drug addicts based on the way they talk or text. Is there any dataset that contains the texts of drug addicts?
submitted by /u/karthic2811
[link] [comments]
Hi everybody, I am writing a paper about the effects of politics on military spending and found a website with an amazing excel spreadsheet that had each country and data from the 1940s to present. It had various tabs with GDP, national budget, military spending, etc. I used it for my data sets in STATA, but found it on a library computer and forgot to save the link or write down the website and now am looking everywhere for it to cite in my bibliography and cannot find it. If anyone knows what spreadsheet I’m talking about or could help me find it I would be extremely grateful!
submitted by /u/Responsible_Ear_279
[link] [comments]
If you have any ideas or have a dataset like this please help me
submitted by /u/YigitTheResearcher
[link] [comments]
By far the most comprehensive dataset available anywhere. Complete and completely independent of any cyber security company. In other words, 100% complete, not one instance omitted. Including all product software and hardware vulnerabilities, zero days, p.o.c’s etc All reported with as much as possible detail.
Each of the millions of items verified, categorized into one or more of 19,000 tags, correlated and provided with a criticality indication by human risk analysts. Fully up-to-date, with daily between 100 and 400 new risk items inserted. 24x7x365x17. Not a single day missed. All information about just about every risk event, organized and readable in english, text format.
Items with non-english text are translated. Not only all cyber-security risks, threats and events, also all industry-specific business risks, such as banking, casinos, industrial computing, energy, military, healthcare, transportation, critical infra-structure, energy, ICS, OT, IOT, etc. specific.
In addition, all global threats, complete information on all malware, ransomware, all APTS from nation states – all collected, sorted and categorized, new threats to human health, also all illegal drugs and developments therein, all global product recalls (issues with vehicles, medical devices, risks to families and children, and so on.
Not just 1 item per event, but all available information on each topic, organized by date, latest first and on correlated to one or more tags – Scams, Geo-political issues, global and government intelligence departments, initiatives, and much more.
We can also make specific sections, or the entire database, available as a unique AI Machine Learning platform. The most complete and comprehensive dataset for cyber and business risks available worldwide.
For more information:
[more-info@brica.de](mailto:more-info@brica.de)
https://brica.de
submitted by /u/CologicNZ
[link] [comments]
I am working on a project that detects the gender and age of customer and analyzes their purchases. I am looking for a dataset containing footage (videos or pictures) of customers and a dataset for their purchases.
If anyone has any idea, please share it with me.
Much appreciated.
submitted by /u/ZookeepergameOk5683
[link] [comments]
Hi everyone am building an ai model for budgeting financial expenses.
the ai model should analyze expenses and analyze a budgeting plan like a financial advisor
all i found for now is this and i don’t seem to find others
https://catalog.data.gov/dataset/percent-change-in-consumer-spending-january-2020-through-the-present
submitted by /u/Extreme-Guarantee-80
[link] [comments]
Are there any publicly available datasets for social media sentiment analysis that also have the comment geolocation attached?
I’m thinking of doing an analysis of how green spaces in urban areas affect mental health, so I want to compare the satellite images from the comments location and their correlation to the comment sentiment.
submitted by /u/Log_Dogg
[link] [comments]
Title basically says it. Does anyone know of a dataset to detect powerlines from aerial imagery? Thats basically the requirement. An additional requirement would be that the powerlines would be labeled by voltage but it’s fine without.
I’m trying to create a drone that can avoid powerlines. When and if I get that working, I want to create a drone that charges from powelines using induction. An university team did this and I want to replicate this. I have a good amount of experience so I think its doable. At least getting a drone to avoid / control well around the powerlines I think is doable.
Thanks!
submitted by /u/AapoL092
[link] [comments]
Time in business Revenue/Profit per year Type of business (more specific than just retail i.e. fashion high end for men) Includes private and corporations
it can be anonymized but accurate
submitted by /u/techsin101
[link] [comments]
I’m trying to create an image classifier with Tensorflow CNN for dinosaur types based on an uploaded footprint. Does anyone know where I can get a dataset for this?
submitted by /u/Fail_Educational
[link] [comments]
submitted by /u/cavedave
[link] [comments]
Working on object detection in guassian splats. would anyone have a massive library of ply files?
submitted by /u/lewibs
[link] [comments]
I am making a data base (for uni) and we’ll it’s schema is this:
I need data for this and I reaaally don’t quite know where to get the specific shmuck that I need.
Tournaments
TournamentID (Primary Key) TournamentName TournamentYear TournamentCountry TournamentType TournamentFormat TournamentPrizeMoney
Leagues
LeagueID (Primary Key) LeagueName LeagueCountry LeagueWebsite LeagueSponsor LeaguePromotion LeagueRelegation
Teams
TeamID (Primary Key) TeamName TeamCity TeamCountry TeamLogo TeamFounded TeamStadium TeamCaptain LeagueID (Foreign Key)
Players
PlayerID (Primary Key) PlayerName PlayerAge PlayerNationality PlayerHeight PlayerWeight PlayerMarketValue PlayerPosition TeamID (Foreign Key)
Stadiums
StadiumID (Primary Key) StadiumName StadiumCity StadiumCountry StadiumAddress StadiumSurface StadiumRoof StadiumCapacity
Matches
MatchID (Primary Key) HomeTeamID (Foreign Key) AwayTeamID (Foreign Key) MatchDate MatchTime MatchReferee MatchAttendance MatchWeather HomeTeamScore AwayTeamScore TournamentID (Foreign Key) StadiumID (Foreign Key)
PlayerMatches
PlayerMatchID (Primary Key) PlayerID (Foreign Key) MatchID (Foreign Key) MinutesPlayed GoalsScored Assists ShotsOnTarget ShotsOffTarget Saves Tackles PassesCompleted YellowCards RedCards PlayerRating PlayerManOfTheMatch PlayerSubstitute
submitted by /u/No_Secretary1128
[link] [comments]
Hello everyone, I hope this is the correct place to ask.
As part of a university project I am looking at how Dutch trade with both Japan and China has been impacted positively / negatively by the Japan-China territorial disputes. I want to just get a very general overview of how the trade has varied over time.
But for the life of me, I can’t figure out what indicators or datasets to use for something so seemingly simple. I found BACI and UNCOM, but don’t know which one would be most useful or if they are even relevant.
Thank you very much in advance, and warm regards.
submitted by /u/Special_Bite6093
[link] [comments]
Hey guys, I am trying to download the datasets from this link https://www.kaggle.com/datasets/debashis74017/stock-market-data-nifty-50-stocks-1-min-data/data/ACC_minute_data_with_indicators.csv for a school project, but can’t use the Download button since I need to download through terminal onto another machine. I’m trying to use wget <link> but it keeps downloading the html over view page. How can I download this properly? Any help would be appreciated!!
submitted by /u/Aggressive_Drink_530
[link] [comments]
Anyone have ideas or suggestions on where to find datasets on the commercial cost of power, the prices utilities charge each other, and separately the cost to produce power in the US?
submitted by /u/original_username_4
[link] [comments]
Hi, for my final year project at university I am using data set which contains jobs postings and all related data of LinkedIn I’ve used powerbi for dashboards and visualisations now I want to predict which job is in most demand by selecting the industries giving in dataset. It’s in text like English I don’t know how to do it which model I should use. I have learned about some ml models in my ml course but they all deal with numbers how I can do prediction from text. Regards
submitted by /u/Parking-Sun-8979
[link] [comments]
I am looking for a dataset with short stories of at least several hundred stories for machine learning purposes. The dataset should also contain a genre for the story and a title.
submitted by /u/Hot_Reach_7138
[link] [comments]
Hi! I am a final year student of computer engineering and I want to do a TFG related to artificial intelligence applied to a more “medical” field in order to make a model of recognition and prediction of brain tumors, brain damage from head injuries, brain disorders or diseases from images. However, I have been investigating in platforms like Kaggle but I can’t get datasets for this purpose. Do you know of any resource to obtain images of this type?
submitted by /u/Chard_5151
[link] [comments]
Hey all- co-founder at Gretel.ai here. We are thrilled to release a high quality synthetic dataset aimed at helping LLMs improve performance working with SQL data and queries. Details and links below, we would love to hear any feedback!
Our blog: https://gretel.ai/blog/synthetic-text-to-sql-dataset
Get the dataset on Hugging Face: https://huggingface.co/datasets/gretelai/synthetic_text_to_sql
The dataset includes:
* 105,851 records partitioned into 100,000 train and 5,851 test records
* ~23M total tokens, including ~12M SQL tokens
* Coverage across 100 distinct domains/verticals
* Comprehensive array of SQL tasks: data definition, retrieval, manipulation, analytics & reporting
* Wide range of SQL complexity levels, including subqueries, single joins, multiple joins, aggregations, window functions, set operations
* Database context, including table and view create statements
* Natural language explanations of what the SQL query is doing
* Contextual tags to optimize model training
submitted by /u/meowterspace42
[link] [comments]
I need to build a project in data science I need ideas and data set also
submitted by /u/FUCKER48
[link] [comments]
Information, entered manually from my handwritten bird log, includes species and dates. Wondering what is the best way to compile and visualize this data.
I’m not a data scientist, so the simpler the better. Thanks for any tips!
submitted by /u/nyuhqe
[link] [comments]
i been looking for a while where i can find this data with no lead can someone offer some help
submitted by /u/Afraid-Reflection-82
[link] [comments]