For an assignment, I would like to compare a neural network with a CNN base and a neural network with a Visual Transformer (ViT) base on two different datasets. The idea is that for one dataset context is really important and for the other, it’s less so. The hypothesis is that ViT will perform better when context is important and CNN when it’s less important. It’s kinda hard to define context in a picture but one example of pictures with less context might be these facial expressions (https://www.kaggle.com/datasets/juniorbueno/rating-opencv-emotion-images) and an example where context is more important would be these emotion generic pictures (https://www.kaggle.com/datasets/sanidhyak/human-face-emotions). This combo seems perfect but the second dataset is too small. Do you know any datasets that capture the same idea but are larger?
submitted by /u/Limp_Award2427
[link] [comments]