Finding Celebrities in Billions of Web Images

Xiao Zhang,Lei Zhang,Xin-Jing Wang,Heung-Yeung Shum
DOI: https://doi.org/10.1109/tmm.2012.2186121
IF: 7.3
2012-01-01
IEEE Transactions on Multimedia
Abstract:In this paper, we present a face annotation system to automatically collect and label celebrity faces from the web. With the proposed system, we have constructed a large-scale dataset called Celebrities on the Web, which contains 2.45 million distinct images of 421 436 celebrities and is orders of magnitude larger than previous datasets. Collecting and labeling such a large-scale dataset pose great challenges on current multimedia mining methods. In this work, a two-step face annotation approach is proposed to accomplish this task. In the first step, an image annotation system is proposed to label an input image with a list of celebrities. To utilize the noisy textual data, we construct a large-scale celebrity name vocabulary to identify candidate names from the surrounding text. Moreover, we expand the scope of analysis to the surrounding text of webpages hosting near-duplicates of the input image. In the second step, the celebrity names are assigned to the faces by label propagation on a facial similarity graph. To cope with the large variance in the facial appearances, a context likelihood is proposed to constrain the name assignment process. In an evaluation on 21 735 faces, both the image annotation system and name assignment algorithm significantly outperform previous techniques.
What problem does this paper attempt to address?