SwiDeN : Convolutional Neural Networks For Depiction Invariant Object Recognition

Ravi Kiran Sarvadevabhatla,Shiv Surya,Srinivas S S Kruthiventi,Venkatesh Babu R
DOI: https://doi.org/10.48550/arXiv.1607.08764
2016-07-29
Abstract:Current state of the art object recognition architectures achieve impressive performance but are typically specialized for a single depictive style (e.g. photos only, sketches only). In this paper, we present SwiDeN : our Convolutional Neural Network (CNN) architecture which recognizes objects regardless of how they are visually depicted (line drawing, realistic shaded drawing, photograph etc.). In SwiDeN, we utilize a novel `deep' depictive style-based switching mechanism which appropriately addresses the depiction-specific and depiction-invariant aspects of the problem. We compare SwiDeN with alternative architectures and prior work on a 50-category Photo-Art dataset containing objects depicted in multiple styles. Experimental results show that SwiDeN outperforms other approaches for the depiction-invariant object recognition problem.
Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the object recognition problem under different visual representations (such as line drawings, realistic shaded drawings, photographs, etc.). Although the current state - of - the - art object recognition architectures have achieved remarkable performance on a single representation style, they are usually only dedicated to a specific representation form (for example, limited to photographs or limited to sketches). Therefore, these models have limitations when dealing with cross - representation form object recognition. The paper proposes a new convolutional neural network (CNN) architecture - SwiDeN, aiming to overcome this limitation and achieve effective recognition regardless of how the object is visually depicted. By introducing a novel "deep" representation style switching mechanism, SwiDeN can handle both representation - specific and representation - invariant aspects simultaneously, thereby improving the recognition performance on multi - representation form datasets. Experimental results show that, compared with other existing methods, SwiDeN performs excellently in solving the representation - invariant object recognition problem, especially in non - photographic object representation forms.