One net to rule them all: efficient recognition and retrieval of POI from geo-tagged photos

Pai Peng,Xiaoling Gu,Suguo Zhu,Lidan Shou,Gang Chen
DOI: https://doi.org/10.1007/s11042-018-6847-y
IF: 2.577
2019-01-01
Multimedia Tools and Applications
Abstract:In this work, we present DeepCamera, a novel framework that combines visual recognition and spatial recognition for identifying places-of-interest (POIs) from smartphone photos. Both deep visual features and geographic features of images are explored in our framework. For visual recognition, we first design the HashNet model extended from an ordinary convolutional neural network (ConvNet) by adding a “hash layer” following the last fully connected layer. Furthermore, we compress multiple pre-trained deep HashNets into one single shallow and hash network namely “SHNet” that outputs semantic labels and compact hash codes simultaneously. As a result, it significantly reduces the time and memory consumption during POI recognition. For spatial recognition, a new layer called Spatial Layer is appended to a ConvNet to capture spatial information. Finally, both visual and spatial knowledge contribute to generating a hybrid probability distribution over all possible POI candidates by plugging the spatial layer into SHNet. Notably, the proposed SHNet model can be used for general visual recognition and retrieval. The experiments conducted on real-world datasets and classic datasets (MNIST and CIFAR-10) demonstrate the competitive accuracy and run-time performance of our proposed framework.
What problem does this paper attempt to address?