Cross-Modal Face Matching: Tackling Visual Abstraction Using Fine-Grained Attributes

Yichuan Hu,Ke Li,Honggang Zhang
DOI: https://doi.org/10.1109/VCIP.2016.7805451
2016-01-01
Abstract:Despite great strides made in facial verification, it remains challenging to match facial images across different modalities. This is mainly due to the cross-modal gap induced by feature heterogeneity. Much prior work had focused on bridging the feature gap, resulting in near-perfect matching accuracies for viewed sketches. Nonetheless, studies on matching unviewed (forensic) sketches and caricatures, a much harder problem due to the additional cross-modal gap introduced by visual abstraction, had only just commenced in recent years. In this paper, we focus on matching facial caricatures with photos by directly addressing the visual abstraction problem. We show that by synergizing a taxonomy of fine-grained visual attributes with part-aware low-level feature extraction, the visual abstraction gap can be effectively traversed, resulting in improved overall cross-modal matching accuracy. More specifically, (i) we propose a simple yet effective geometry-based attribute classifier to detect fine-grained attributes at part-level, and (ii) we demonstrate how meaningful facial regions can be reliably detected to enable localized feature extraction and attribute detection, and (iii) we show a common embedding can be learned using Canonical Correlation Analysis (CCA) that combines part-based low-level features and fine-grained visual attributes. We demonstrate the superiority of the proposed cross-modal strategy by evaluating on two recent photo-caricature datasets.
What problem does this paper attempt to address?