Multi-Level Information Aggregation Based Graph Attention Networks Towards Fake Speech Detection

Jian Zhou,Yong Li,Cunhang Fan,Liang Tao,Hon Keung Kwan
DOI: https://doi.org/10.1109/lsp.2024.3408676
2024-06-15
IEEE Signal Processing Letters
Abstract:It is widely acknowledged that distinguishing genuine speech from spoofed speech encompasses various subbands and temporal segments within speech signals. However, prevailing spoofing detection methods tend to oversimplify the relationships between these cues by employing linear models. In this paper, we introduce a multi-level information aggregation Graph Attention Networks (MiaGATs) to generate highly discriminative features for fake speech detection (FSD). In MiaGATs, each subband and temporal segment of a speech signal is represented as distinct nodes. MiaGATs incorporates channel information aggregation within each node to effectively harness the unique spectral and temporal characteristics during the feature encoding stage. In particular, MiaGATs address the interactions between nodes through indirect node aggregation and integrates both indirect and direct node aggregation by max-pooling operation. Experimental results on ASVspoof2019 and ASVspoof2021 LA databases show significant relative improvement compared to the current state-of-the-art. In comparison to the leading integrated spectro-temporal graph attention networks, MiaGATs gains an impressive performance improvement in various conditions, underscoring MiaGATs's position as a new benchmark in spoofing detection performance.
engineering, electrical & electronic
What problem does this paper attempt to address?