Picking Neural Activations for Fine-Grained Recognition

Xiaopeng Zhang,Hongkai Xiong,Wengang Zhou,Weiyao Lin,Qi Tian
DOI: https://doi.org/10.1109/tmm.2017.2710803
IF: 7.3
2017-01-01
IEEE Transactions on Multimedia
Abstract:It is a challenging task to recognize fine-grained subcategories due to the highly localized and subtle differences among them. Different from most previous methods that rely on object/part annotations, this paper proposes an automatic fine-grained recognition approach, which is free of any object/part annotation at both training and testing stages. The key idea includes two steps of picking neural activations computed from the convolutional neural networks, one for localization, and the other for description. The first picking step is to find distinctive neurons that are sensitive to specific patterns significantly and consistently. Based on these picked neurons, we initialize positive samples and formulate the localization as a regularized multiple instance learning task, which aims at refining the detectors via iteratively alternating between new positive sample mining and part model retraining. The second picking step is to pool deep neural activations via a spatially weighted combination of Fisher Vectors coding. We conditionally select activations to encode them into the final representation, which considers the importance of each activation. Integrating the above techniques produces a powerful framework, and experiments conducted on several extensive fine-grained benchmarks demonstrate the superiority of our proposed algorithm over the existing methods.
What problem does this paper attempt to address?