Cross-Modal Face Matching: Beyond Viewed Sketches

Shuxin OuYang,Timothy M. Hospedales,Yi-Zhe Song,Xueming Li
DOI: https://doi.org/10.1007/978-3-319-16808-1_15
2014-01-01
Abstract:Matching face images across different modalities is a challenging open problem for various reasons, notably feature heterogeneity, and particularly in the case of sketch recognition - abstraction, exaggeration and distortion. Existing studies have attempted to address this task by engineering invariant features, or learning a common sub-space between the modalities. In this paper, we take a different approach and explore learning a mid-level representation within each domain that allows faces in each modality to be compared in a domain invariant way. In particular, we investigate sketch-photo face matching and go beyond the well-studied viewed sketches to tackle forensic sketches and caricatures where representations are often symbolic. We approach this by learning a facial attribute model independently in each domain that represents faces in terms of semantic properties. This representation is thus more invariant to heterogeneity, distortions and robust to mis-alignment. Our intermediate level attribute representation is then integrated synergistically with the original low-level features using CCA. Our framework shows impressive results on cross-modal matching tasks using forensic sketches, and even more challenging caricature sketches. Furthermore, we create a new dataset with approximate to 59, 000 attribute annotations for evaluation and to facilitate future research.
What problem does this paper attempt to address?