Scalable Face Track Retrieval in Video Archives Using Bag-of-Faces Sparse Representation
Bor-Chun Chen,Yan-Ying Chen,Yin-Hsi Kuo,Thanh Duc Ngo,Duy-Dinh Le,Shin'ichi Satoh,Winston H. Hsu
DOI: https://doi.org/10.1109/tcsvt.2016.2538520
IF: 5.859
2017-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Huge video archives consisting of news programs, dramas, movies, and Web videos (e.g., YouTube) are available in our daily life. In all these videos, human is usually one of the most important subjects. Using state-of-the-art techniques, we can efficiently detect and track faces in the videos. In order to organize large-scale face tracks, containing sequences of (detected) consecutive faces in the videos, we propose an efficient method to retrieve human face tracks using bag-of-faces sparse representation (BoF-SR). Using the proposed method, a face track is encoded as a single BoF-SR, therefore allowing an efficient indexing method to handle large-scale data. To further consider the possible variations in face tracks, we generalize our method to find multiple SRs, in an unsupervised manner, to represent a bag of faces and balance the tradeoff between performance and retrieval time. The experimental results on two real-world (million-scale) data sets confirm that the proposed methods achieve significant performance gains compared with different state-of-the-art methods.