I am trying to find information about airports. I want to get the name of the airport, IATA code, lat, long, and mayne city, country and etc. Any suggestions?
submitted by /u/Gjensidige
[link] [comments]
Here you can observe the biggest nerds in the world in their natural habitat, longing for data sets. Not that it isn’t interesting, i’m interested. Maybe they know where the chix are. But what do they need it for? World domination?
I am trying to find information about airports. I want to get the name of the airport, IATA code, lat, long, and mayne city, country and etc. Any suggestions?
submitted by /u/Gjensidige
[link] [comments]
There are plenty of Word Frequency lists but plurals, adjectives, adverbs of the same word end up in different positions in these lists.
I’m looking for a dataset or a way to create a dataset that has all forms or one word clumped together so it’s less about frequency and more about how familiar the word (and its different forms) is if that makes sense.
For instance, i have a list whete the word “have” is at 25th place, “has” at 39 and “had” at 105. Clearly, anyone who knows one of these words would know the other two as well.
Apologies if I did not get my point across clearly. Any help is appreciated. Thanks!
submitted by /u/haskpro1995
[link] [comments]
Hello guys, I’m looking for international data about imports and exports from around the world. Do you know any websites where I can find it?
submitted by /u/House-hero
[link] [comments]
Hey Redditors,
I know the cars196 dataset is nothing new, but I wanted to share some label errors and outliers that I found within it.
It’s interesting to note that the primary goal of the original paper that curated/used this dataset was “fine-grained categorization” meaning discerning the differences between something like a Chevrolet Cargo Van and a GMC Cargo Van. I found numerous examples of images that exhibit very nuanced mislabelling which is directly counterintuitive to the task they sought to research.
Here are a few examples of nuanced label errors that I found:
Audi TT RS Coupe labeled as an Audi TT Hatchback Audi S5 Convertible labeled as an Audi RS4 Jeep Grand Cherokee labeled as a Dodge Durango
I also found examples of outliers and generally ambiguous images:
multiple cars in one image top-down style images vehicles that didn’t belong to any classes.
I found these issues to be pretty interesting, yet I wasn’t surprised. It’s pretty well known that many common ML datasets exhibit thousands of errors.
If you’re interested in how I found them, feel free to read about it here.
submitted by /u/cmauck10
[link] [comments]
This is for all data enthusiasts out there!
We’ve launched Plug & Play Data Templates on Product Hunt today! 🥳
Our data templates are a step-by-step walkthrough for 50+ use-cases with pre-baked, interactive SQL queries- covering 5 critical categories- Product Analytics, Customer Analytics, Sales Analytics, Marketing Analytics and Finance Analytics.
Please check us out here! 👉🏻https://www.producthunt.com/posts/plug-play-data-templates
submitted by /u/AirbookIO
[link] [comments]
I’m currently doing research on the lack of water access for the ASEAN Data Science Explorers 2023 competition, and any data source or even incomplete datasets would be useful for us. Thank you in advance
submitted by /u/OliverNguyen1150
[link] [comments]
The file contains 3.1 million rows, each representing one article observed at one point in time.
The file uses these columns:
timestamp: The time (in UTC) of the fetch. All articles from the same fetch will have the same timestamp. position: The article’s zero-indexed position in the trending strip, from left to right. text: The text of the link used to highlight the article. Note: Sometimes the same article is associated with different text at different points in time. url: The link’s URL. Note: Sometimes (although relatively rarely) the URL for the same underlying article changes over time.
Note: Although the script generally ran every five minutes, there are some gaps in the data, accounting for roughly 3% of the total time period covered. These gaps owe to two main factors: technical complications (such as server downtime) and periods during which the website swapped out the trending strip with breaking news alerts, single-story highlights, or other notices. Unfortunately, I did not have the foresight to collect data that would distinguish between those scenarios.
submitted by /u/brianckeegan
[link] [comments]
Statistics Canada added new features to enhance the overall data user experience on the Standards, Data Sources and Methods Hub. With its improved design, new frequently asked question section and quick access links to resources, the hub is meant to be a one-stop shop for data users, statisticians and others for:
variables and classifications survey methodology key aspects of data quality direct access to questionnaires.
Explore the hub and tell us what you think, so we can make sure this page meets your needs!
Visit the Standards, Data Sources and Methods Hub.
[We are Canada’s national statistical agency. We are here to engage with Canadians and provide them with high-quality statistical information that matters! Publishing in a subreddit does not imply we endorse the content posted by other redditors.]
***
Des améliorations ont été apportées au Carrefour des normes, sources de données et méthodes de Statistique Canada pour rendre l’expérience utilisateur plus conviviale. Avec sa conception améliorée, sa nouvelle section Foire aux questions et ses liens d’accès rapide aux ressources, ce carrefour se veut un guichet unique pour les utilisateurs de données, les statisticiens et autres, qui y trouveront tout ce dont ils ont besoin sur :
les variables et les classifications; la méthodologie d’enquête; la qualité des données; l’accès direct aux questionnaires.
Explorez le Carrefour et dites-nous ce que vous en pensez, nous voulons nous assurer qu’il répond à vos besoins!
Carrefour des normes, sources de données et méthodes.
[Nous sommes l’organisme national de statistique du Canada. Nous sommes ici pour discuter avec les Canadiens et les Canadiennes et leur fournir des renseignements statistiques de grande qualité qui comptent! Le fait de publier dans un sous-reddit ne signifie pas que nous approuvons le contenu affiché par d’autres utilisateurs de Reddit.]
submitted by /u/StatCanada
[link] [comments]
Hello,
I’m currently conducting research on a major investment firm and I’m exploring its influence over the media. Could you please provide information on resources that would give the frequency or ratio of media coverage regarding this company? This data would greatly contribute to my analysis. Thank you for your assistance.
I hope my request is clear. Thank you again.
submitted by /u/itskoka
[link] [comments]
I’m wanting to answer a question about whether companies who run sports-guessing competitions make good predictions in aggregate, despite the fact that there will be many people in the organisation that don’t care about sports at all, and just pick at random.
What I’m looking for is the data from a company that ran a tipping competition for some sports competition where I can analyse the answers. (e.g. if it was the NBA, I can look that up; if it was your internal squash competition, that’s OK as long as it has who won, as well as the predictions of who would win.)
In return I’ll hopefully be able to confirm that there’s a way of maximising your total return on next year’s competition.
submitted by /u/solresol
[link] [comments]
Hey guys I need some help. I’m using statista and working on my thesis.
I have data that’s 1 and 0 (present and not present) and I’m trying to figure out if the data is statistically significant but there isn’t a normal distribution. I’m not sure what to do. Any help would be appreciated.
submitted by /u/Kapotter
[link] [comments]
Does anyone have any sample of datasets that I might use to interpret unstructured clinical notes from EHR systems? The objective is to analyze notes and look for words that can be used to categorize a condition
submitted by /u/National_Evidence548
[link] [comments]
I am looking to see if there is any data on the share of 3G and 4G phone shipments by country / region or anything close to this?
Tried looking myself but could not find the “perfect” data on this.
submitted by /u/xMacadamiaNuTx
[link] [comments]
Hi, my apologies for possibly asking the wrong questions here. I am a total newbie to all things machine learning, have just discovered kaggle and such and I’m a bit stuck with a silly question: I’ve discovered like a million different datasets on there, but I’m just wondering how people are putting these sets to good use. For instance there’s a big dataset about the Titanic. I can’t fathom a realistic use case where this dataset would prove to be useful. I guess I don’t understand just yet where the ‘machine learning’ aspect in datasets like these come into play. What is it exactly you are predicting with these?
Can somebody please enlighten me what I’m obviously missing here? I really want to know.
Thank you
submitted by /u/VHS124
[link] [comments]
I am having trouble finding this, what do people use to store and create these datasets? Not as in ‘JSON’ or a relational/non-relational data bases, but is there a popular project that streamlines all of this or should I write my own?
I am a software developer so the scraping and storing of data isn’t an issue, what I don’t want to do is re-invent the wheel. I am just starting to get into this generation of AI tech.
I’d like to find something that can take in data like images and text with ‘tagged’ context for fine tuning AI models. Something I can write scraper and parsers and add to a database, then export data for training data sets.
Like I said I am about to just write my own stuff to do this but I feel like this is a common enough problem that I should just use whatever the popular kids are using these days. Trouble is I am just not finding the right words to search.
So does this exist? am I overcomplicating this?
submitted by /u/drywallfan
[link] [comments]
Hi all, for a project in my school I’m looking for a dataset containing business budgets for many companies in the last 10-20 years. We’re Italian, so we would appreciate if some Italian companies appear in the dataset. Thanks in advice to people who will help.
submitted by /u/niger4
[link] [comments]
Does anyone know of a publicly available dataset in any language containing formal discursive text along with a “parallel”, less formal text or know of any place where one can create such a dataset (like English Wikipedia articles and corresponding Simple Wikipedia articles)? The GYAFC dataset (Rao et al. 2018) is similar to what I’m looking for.
submitted by /u/geartrains
[link] [comments]
For example
Consumer product: Liquid detergent Component chemical: Surfactant
The database should have a list of surfactants present, their concentrations and the overall viscosity
Thanks
submitted by /u/Matt_LawDT
[link] [comments]
How often is Commoncrawl updated? On a daily cadence? Or weekly/monthly? If Meghan Markle wears a Versace gown, that becomes a BBC article, and that article shows up on Googling “meghan markle” 2-3 minutes after the publishing of the article by BBC. What is the equivalent time for CC?
And secondly, is there a place where I can see CC coverage level? I mean – which websites they cover fully, which ones they cover partially, whether they cover reuters.com at all, or how much of of vice.com they cover, etc.?
submitted by /u/Attitudemonger
[link] [comments]
I asked the author and filled out the form here but don’t get any answers from them. Does anyone by chance have the dataset und could share it? Thanks!
submitted by /u/jthat92
[link] [comments]
Hi everyone, I’m looking for VR Anatomy Learning Dataset. This dataset was collected by researchers from the University of Glasgow and contains data on the use of virtual reality for teaching human anatomy. The dataset includes performance data, survey responses, and other metrics related to the effectiveness of virtual reality in anatomy education. Kindly let me know about the dataset plus any research paper(website link) regarding this topic would be very helpful.
submitted by /u/AbrarHussain-1234
[link] [comments]
Hi everyone. Anyone who knows how to get the product retail price lists from the big supermarkets in EU, such as Lidl, Kaufland, Carrefour, etc. I searched a few hours for some data and the only source I found were their catalogues from which you can’t scrape the data using PBI web import and I don’t know how to webscrape the data with python.
Thanks in advance!
submitted by /u/nousernameinspo
[link] [comments]
hi everyone new here. need help with a dataset for a school project. im required to generate test data/ mock dataset of web server logs in an excel file/CSV. the dataset should include following columns: country, time-stamp, ip address, status, URL, status code, number of websites visits, content/sports viewed. list should include different sports and reflected on the URL e.g /athletics/videos/200m-final.jpg (minimum of 3000 entries) please help.
submitted by /u/byron_0001
[link] [comments]
hello, i think it was around february 2020 someone uploaded an amazing IMDb dataset titled “IMDb movies extensive dataset”, i still have the archive file, but i wanted to find a more recent one, i tried making it myself but IMDb doesn’t provide their complete data for free, you can get the basic info but what’s really interesting for me was the breakdown data on ratings, here’s the columns from the “IMDB ratings.csv” file
imdb_title_id,weighted_average_vote,total_votes,mean_vote,median_vote,votes_10,votes_9,votes_8,votes_7,votes_6,votes_5,votes_4,votes_3,votes_2,votes_1,allgenders_0age_avg_vote,allgenders_0age_votes,allgenders_18age_avg_vote,allgenders_18age_votes,allgenders_30age_avg_vote,allgenders_30age_votes,allgenders_45age_avg_vote,allgenders_45age_votes,males_allages_avg_vote,males_allages_votes,males_0age_avg_vote,males_0age_votes,males_18age_avg_vote,males_18age_votes,males_30age_avg_vote,males_30age_votes,males_45age_avg_vote,males_45age_votes,females_allages_avg_vote,females_allages_votes,females_0age_avg_vote,females_0age_votes,females_18age_avg_vote,females_18age_votes,females_30age_avg_vote,females_30age_votes,females_45age_avg_vote,females_45age_votes,top1000_voters_rating,top1000_voters_votes,us_voters_rating,us_voters_votes,non_us_voters_rating,non_us_voters_votes
as you can see it has some juicy information, such as breakdown by age, gender, and most importantly for me the top1000_voters which i think an extremly underrated data point that i rarely mentioned, it’s very useful when you want to determine if the rating of a movie is unbiased or not, i have noticed that a lot of highly rated turkish and indian movies especially have very biased ratings and using the top1000_voters you can find which ones,
also i was able to find interesting things such as which movies females prefer more than males and which genres as well (males are biased more towards westerns while females are biased more towards the family genre)
so my question is; is it possible to get this info from imdb without paying? i live in a third world country and got no credit card to my name, i love to do these types of exploratory analysis as a hobby, can’t pay imdb the thousands that they are asking for and for the life of my i can’t figure out how to webscrape the data with imdb’s anti-scraping systems.
also on a side note it appears they have removed the breakdown in rating details from their website, you can only see breakdown by how many people voted on each score, but not by genders, age or even the top1000 that was there before.
submitted by /u/NoHetro
[link] [comments]
I would need some ideas and corresponding data sets for a finance master thesis with a more quantitative focused problem, so something where statistics/time series/machine learning (like sentiment analysis) can be used.
submitted by /u/Adventurous-Quote180
[link] [comments]
Looking for product data such as SKU, description, image, specs for both solar products and IT computer/networking products which can be used for e-commerce website. Any idea where I could possibly get this?
Thanks
submitted by /u/Foreign_Exercise7060
[link] [comments]
Would anyone know where I can find a list of ASMR artist channels?
Specifically, I need the URL of each channel but names would be ok as well.
Any format is ok.
Many thanks
submitted by /u/asmrliving
[link] [comments]
I’ve checked websites such as Yahoo finance, investing.com, macrotrends etc but with no luck. I’ve also used databases such as factset and eikon but they only have prices from 1972 onwards, even for the companies that went public prior to that
submitted by /u/esmithlliott
[link] [comments]
Currently working at a company that wants to build an internal tool to pull these codes for specific client companies we’re working with. Does anyone know of any datasets that exist with this information? All I’ve found thus far are 3rd party products you have to pay for like https://www.naics.com/company-lookup-tool/. But they obviously got their data from somewhere right?
submitted by /u/YnotX
[link] [comments]
Hi everyone, I have a request for a dataset pertaining to automotive repairs.
I am voluntarily building a free application/platform that anyone can freely use anytime to help the public make informed decisions on where to take their motor vehicles for repairs. My interest in this comes from the fact that I love cars and I hate seeing people get ripped off. I’ve worked on countless cars and helped many people with free repairs. Specifically, this platform would allow users to search for nearby automotive repair shops and they would see a graphical summary view of the quantity of repairs any individual shop has done in a given period of time (X number of brake repairs, Y number of engine oil changes, Z number of front-end alignments, etc.). More features would be added with time but this is the starting point.
I have already done legwork before coming here to make this platform a reality.
I contacted my state’s Department of Motor Vehicles (DMV) and submitted a Freedom of Information Act (FOIA) request to obtain access to the necessary dataset. My state’s DMV has a legal clause that specifically requires all automotive repair shops to retain records of estimates, work orders, invoices, parts purchase orders, and appraisals to be available for inspection by the DMV. The DMV kindly responded to my request and unfortunately, I learned that although all automotive repair shops are required to retain these records, the shops are not obligated to submit these records to the DMV for archival at any point in time. Furthermore, the circumstances under which the DMV would even audit a shop with the intent to inspect these records would be extremely circumstantial and exceptionally rare.
For clarification, my intent is to only depict the values contained in these records through visual means such as graphs and charts. Customer names, cost of repairs, parts vendor names, mechanic names, and any other personally identifiable information (except for the name of the shop doing the repair) would all be obscured.
After hitting this brick wall, I learned about some existing platforms that collect and aggregate automotive repair data (RepairPal, iATN, Mechanic Advisor, AutoMD, CarMD). Although these platforms give users the ability to post reviews like Google Reviews and Yelp, they don’t contain the fundamental data I need to build this free platform. Some also sell products or services to automotive repair shops (namely OEM how-to tutorials for specific make/model cars) and I don’t want to get involved with any financial sponsorships or political bureaucracy.
I have thought about reaching out to local automotive repair shops I have close relations with but there’s less than a handful that trust me enough to grant me access to their data and for this data to be accurate. Networking with each automotive repair shop in my entire state is just not realistic.
Any feedback would be greatly appreciated. Thanks in advance!
submitted by /u/justLURKin220020
[link] [comments]