Everything You Wanted to Know about Deep Learning for Computer Vision but Were Afraid to Ask

Moacir A. Ponti,Leonardo S. F. Ribeiro,Tiago S. Nazare,Tu Bui,John Collomosse,Moacir Antonelli Ponti,Leonardo Sampaio Ferraz Ribeiro,Tiago Santana Nazare
DOI: https://doi.org/10.1109/sibgrapi-t.2017.12
2017-10-01
Abstract:Deep Learning methods are currently the state-of-the-art in many Computer Vision and Image Processing problems, in particular image classification. After years of intensive investigation, a few models matured and became important tools, including Convolutional Neural Networks (CNNs), Siamese and Triplet Networks, Auto-Encoders (AEs) and Generative Adversarial Networks (GANs). The field is fast-paced and there is a lot of terminologies to catch up for those who want to adventure in Deep Learning waters. This paper has the objective to introduce the most fundamental concepts of Deep Learning for Computer Vision in particular CNNs, AEs and GANs, including architectures, inner workings and optimization. We offer an updated description of the theoretical and practical knowledge of working with those models. After that, we describe Siamese and Triplet Networks, not often covered in tutorial papers, as well as review the literature on recent and exciting topics such as visual stylization, pixel-wise prediction and video processing. Finally, we discuss the limitations of Deep Learning for Computer Vision.
What problem does this paper attempt to address?