Hey Redditors,
I know the cars196 dataset is nothing new, but I wanted to share some label errors and outliers that I found within it.
It’s interesting to note that the primary goal of the original paper that curated/used this dataset was “fine-grained categorization” meaning discerning the differences between something like a Chevrolet Cargo Van and a GMC Cargo Van. I found numerous examples of images that exhibit very nuanced mislabelling which is directly counterintuitive to the task they sought to research.
Here are a few examples of nuanced label errors that I found:
Audi TT RS Coupe labeled as an Audi TT Hatchback Audi S5 Convertible labeled as an Audi RS4 Jeep Grand Cherokee labeled as a Dodge Durango
I also found examples of outliers and generally ambiguous images:
multiple cars in one image top-down style images vehicles that didn’t belong to any classes.
I found these issues to be pretty interesting, yet I wasn’t surprised. It’s pretty well known that many common ML datasets exhibit thousands of errors.
If you’re interested in how I found them, feel free to read about it here.
submitted by /u/cmauck10
[link] [comments]