Contrarianism

Sometimes I like being a contrarian. This paper (https://arxiv.org/pdf/1904.13132.pdf) suggests that you can train the low levels of a Deep Learning network with just one image (and a whole mess of data augmentation approaches, like cropping and rotating etc.). This contradicts a huge amount of belief in the field that the reason to pre-train on Imagenet is that having a large number of images makes for a really good set of low level features.

I'm curious what other assumptions we can attack, and how?

One approach to data augmentation is to take your labelled data, and make *more* labelled data by flipping the images left and right and/or crop it and use the same label for the new image. Why are these common data augmentation tools? Because often flipping an image left right (reflecting it), or slight crops result in images that you'd expect to have the same label.

So let's flip that assumption around. Imagine training an binary image classifier with many images that are labelled either (original or flipped). Can a deep learning network learn to tell if something has been flipped left/right? And if it does; what has it learned? Here is an in-post test. For these three images (the first three images that I see when I looked at facebook today), either the top or the bottom has been flipped from the original. Can you say which is the original in each case?

[(Top, bottom, bottom)]

Answers available by highlighting above.

What cues are available to figure this out? What did you use? Could a network learn this? Would it be interesting to make such a network and ask what features in the image it used to come to its conclusion?

What about the equivalent version that considers image crops? (binary classifier: is this a cropped "normal picture" or not? Non binary classifier: Is this cropped from the top left corner or the normal picture? the top right corner? the middle?)

What are other image transformations that we usually ignore?

Leave a Reply Cancel reply