Appearance Matters, So Does Audio: Revealing the Hidden Face via Cross-Modality Transfer

Chenqi Kong,Baoliang Chen,Wenhan Yang,Haoliang Li,Peilin Chen,Shiqi Wang
DOI: https://doi.org/10.1109/TCSVT.2021.3057457
IF: 5.859
2022-01-01
IEEE Transactions on Circuits and Systems for Video Technology
Abstract:Recently, there has been an exponential increase in the security concerns raised by faking face (e.g., deepfake), which automatically changes the identity with a specifically learned deep generative model. With numerous approaches proposed to identify the fake content, much less work has been dedicated to automatically revealing the authentic one that is originally acquired. Here, we propose a new paradigm that seeks to reveal the authentic face hidden behind the fake one by leveraging the joint information of face and audio. More specifically, given the fake face as well as the audio segment, the cross-modality transferable capability is exploited by learning to generate the feature of the authentic face, based on the underlying clues from the audio as well as the fake face appearance. The effectiveness of the proposed scheme is validated through a series of evaluations, and experimental results show that the proposed model achieves promising face reconstruction performance in revealing the hidden faces, in terms of reconstruction quality, as well as identity and face attribute inference accuracy.
What problem does this paper attempt to address?