Abstract:Although many efforts have been made on decreasing the model complexity for speaker verification, it is still challenging to deploy speaker verification systems with satisfactory result on low-resource terminals. We design a transformation module that performs feature partition and fusion to implement lightweight speaker verification. The transformation module consists of multiple simple but effective operations, such as convolution, pooling, mean, concatenation, normalization, and element-wise summation. It works in a plug-and-play way, and can be easily implanted into a wide variety of models to reduce the model complexity while maintaining the model error. First, the input feature is split into several low-dimensional feature subsets for decreasing the model complexity. Then, each feature subset is updated by fusing it with the inter-feature-subsets correlational information to enhance its representational capability. Finally, the updated feature subsets are independently fed into the block (one or several layers) of the model for further processing. The features that are output from current block of the model are processed according to the steps above before they are fed into the next block of the model. Experimental data are selected from two public speech corpora (namely VoxCeleb1 and VoxCeleb2). Results show that implanting the transformation module into three models (namely AMCRN, ResNet34, and ECAPA-TDNN) for speaker verification slightly increases the model error and significantly decreases the model complexity. Our proposed method outperforms baseline methods on the whole in memory requirement and computational complexity with lower equal error rate. It also generalizes well across truncated segments with various lengths.

Speaker Verification using Convolutional Neural Networks

Siamese Network with Wav2vec Feature for Spoofing Speech Detection

Voice Presentation Attack Detection Using Convolutional Neural Networks

Self-Attention Networks for Text-Independent Speaker Verification

Text-Independent Speaker Verification Using Long Short-Term Memory Networks

End-to-End Feature Learning for Text-Independent Speaker Verification

RSKNet-MTSP: Effective and Portable Deep Architecture for Speaker Verification

Modified layer deep convolution neural network for text-independent speaker recognition

Speaker Verification based on Single Channel Speech Separation

Joint speaker encoder and neural back-end model for fully end-to-end automatic speaker verification with multiple enrollment utterances

Contrastive Learning for improving End-to-end Speaker Verification

Universal Pooling Method of Multi-layer Features from Pretrained Models for Speaker Verification

Enhancing speaker verification accuracy with deep ensemble learning and inclusion of multifaceted demographic factors

A Universal Identity Backdoor Attack against Speaker Verification based on Siamese Network

HiddenSpeaker: Generate Imperceptible Unlearnable Audios for Speaker Verification System

Lightweight Speaker Verification Using Transformation Module with Feature Partition and Fusion

Audio Spoofing Verification using Deep Convolutional Neural Networks by Transfer Learning

Audio-Visual Speaker Verification via Joint Cross-Attention

The HCCL Speaker Verification System for Far-Field Speaker Verification Challenge

Towards Speaker Identification with Minimal Dataset and Constrained Resources using 1D-Convolution Neural Network

Speaker verification using attentive multi-scale convolutional recurrent network