Audio-based Kinship Verification Using Age Domain Conversion

Qiyang Sun,Alican Akman,Xin Jing,Manuel Milling,Björn W. Schuller
2024-10-15
Abstract:Audio-based kinship verification (AKV) is important in many domains, such as home security monitoring, forensic identification, and social network analysis. A key challenge in the task arises from differences in age across samples from different individuals, which can be interpreted as a domain bias in a cross-domain verification task. To address this issue, we design the notion of an "age-standardised domain" wherein we utilise the optimised CycleGAN-VC3 network to perform age-audio conversion to generate the in-domain audio. The generated audio dataset is employed to extract a range of features, which are then fed into a metric learning architecture to verify kinship. Experiments are conducted on the KAN_AV audio dataset, which contains age and kinship labels. The results demonstrate that the method markedly enhances the accuracy of kinship verification, while also offering novel insights for future kinship verification research.
Sound,Artificial Intelligence,Audio and Speech Processing
What problem does this paper attempt to address?
This paper attempts to solve the age - domain shift problem in Audio - based Kinship Verification (AKV). Specifically, the age differences between different individual samples can be regarded as the domain - shift problem in cross - domain verification tasks, which will affect the accuracy of kinship verification. ### Main problems 1. **Domain shift caused by age differences**: In audio - based kinship verification, age differences between different individuals will lead to changes in voice features, thus affecting the model's accurate identification of kinship. 2. **Deficiencies of existing methods**: Most of the existing research mainly focuses on using facial images and videos for kinship verification, while audio - based kinship verification (AKV) has not been fully explored, although audio data has unique advantages in some scenarios (such as telephone calls). ### Solutions To solve these problems, the author proposes the following methods: - **Age - standardized domain**: By using the optimized CycleGAN - VC3 network, convert audios of different ages to a unified intermediate age group, thereby reducing the impact of age differences on voice features. - **Feature extraction and metric learning**: Extract multiple features from the generated audio dataset and use a metric - learning architecture for kinship verification. ### Experimental results The experimental results show that this method significantly improves the accuracy of kinship verification, especially when dealing with cross - age - group relationships (such as father - daughter, mother - daughter). Specifically, using the optimized TripletNet model and Wav2Vec features, the overall weighted accuracy on the generated dataset reaches 71.3%, which is about 5% higher than the baseline method. ### Summary This paper effectively alleviates the age - domain shift problem in audio - based kinship verification by introducing age - conversion techniques, and improves the generalization ability and verification accuracy of the model. Future work can further consider other factors such as gender conversion to better deal with complex kinship verification tasks.