Abstract:Speaker modeling plays a crucial role in various tasks, and fixed-dimensional vector representations, known as speaker embeddings, are the predominant modeling approach. These embeddings are typically evaluated within the framework of speaker verification, yet their utility extends to a broad scope of related tasks including speaker diarization, speech synthesis, voice conversion, and target speaker extraction. This paper presents Wespeaker, a user-friendly toolkit designed for both research and production purposes, dedicated to the learning of speaker embeddings. Wespeaker offers scalable data management, state-of-the-art speaker embedding models, and self-supervised learning training schemes with the potential to leverage large-scale unlabeled real-world data. The toolkit incorporates structured recipes that have been successfully adopted in winning systems across various speaker verification challenges, ensuring highly competitive results. For production-oriented development, Wespeaker integrates CPU- and GPU-compatible deployment and runtime codes, supporting mainstream platforms such as Windows, Linux, Mac and on-device chips such as horizon X3’PI. Wespeaker also provides off-the-shelf high-quality speaker embeddings by providing various pretrained models, which can be effortlessly applied to different tasks that require speaker modeling. The toolkit is publicly available at https://github.com/wenet-e2e/wespeaker.

THE 2020 ESPNET UPDATE: NEW FEATURES, BROADENED APPLICATIONS, PERFORMANCE IMPROVEMENTS, AND FUTURE PLANS

ESPnet2-TTS: Extending the Edge of TTS Research

RECENT DEVELOPMENTS ON ESPNET TOOLKIT BOOSTED BY CONFORMER

Espnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

ESPnet-SE++: Speech Enhancement for Robust Speech Recognition, Translation, and Understanding

ESPnet-EZ: Python-only ESPnet for Easy Fine-tuning and Integration

ESPnet-ST-v2: Multipurpose Spoken Language Translation Toolkit

ESPnet-Codec: Comprehensive Training and Evaluation of Neural Codecs for Audio, Music, and Speech

ESPnet-SPK: full pipeline speaker embedding toolkit with reproducible recipes, self-supervised front-ends, and off-the-shelf models

Muskits-ESPnet: A Comprehensive Toolkit for Singing Voice Synthesis in New Paradigm

ESPnet-ST IWSLT 2021 Offline Speech Translation System.

EURO: ESPnet Unsupervised ASR Open-source Toolkit

Espresso: A Fast End-to-end Neural Speech Recognition Toolkit.

TEA-PSE 2.0: Sub-Band Network for Real-Time Personalized Speech Enhancement.

Deep Learning Based TTS-STT Model with Transliteration for Indic Languages

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

Advancing Speaker Embedding Learning: Wespeaker Toolkit for Research and Production

SpatialNet: Extensively Learning Spatial Information for Multichannel Joint Speech Separation, Denoising and Dereverberation

WeSep: A Scalable and Flexible Toolkit Towards Generalizable Target Speaker Extraction

ESPNetv2: A Light-Weight, Power Efficient, and General Purpose Convolutional Neural Network

Multi-Stage Progressive Speech Enhancement Network