Stanford Cars (cars196) Contains Many Fine-Grained Errors

Hey Redditors,

I know the cars196 dataset is nothing new, but I wanted to share some label errors and outliers that I found within it.

It’s interesting to note that the primary goal of the original paper that curated/used this dataset was “fine-grained categorization” meaning discerning the differences between something like a Chevrolet Cargo Van and a GMC Cargo Van. I found numerous examples of images that exhibit very nuanced mislabelling which is directly counterintuitive to the task they sought to research.

Here are a few examples of nuanced label errors that I found:

Audi TT RS Coupe labeled as an Audi TT Hatchback Audi S5 Convertible labeled as an Audi RS4 Jeep Grand Cherokee labeled as a Dodge Durango

I also found examples of outliers and generally ambiguous images:

multiple cars in one image top-down style images vehicles that didn’t belong to any classes.

I found these issues to be pretty interesting, yet I wasn’t surprised. It’s pretty well known that many common ML datasets exhibit thousands of errors.

If you’re interested in how I found them, feel free to read about it here.

submitted by /u/cmauck10
[link] [comments]

Leave a Reply

Your email address will not be published. Required fields are marked *