Abstract:One typical phonotactic system for language recognition is parallel phone recognition followed by vector space modeling (PPRVSM). In this system, various phone recognizers are applied in parallel and fused at the score level. Each phone recognizer is trained for a known language, which is assumed to extract complementary information for effective fusion. But this method is limited by the large amount of training samples for which word or phone level transcription is required. Also, score fusion is not the optimal method as fusion at the feature or model level will retain more information than at the score level. This paper presents a new strategy to build and fuse parallel phone recognizers (PPR). This is achieved by training multiple acoustic diversified phone recognizers and fusing at the feature level. The phone recognizers are trained on the same speech data but using different acoustic features and model training techniques. For the acoustic features, Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) are both employed. In addition, a new time-frequency cepstrum (TFC) feature is proposed to extract complementary acoustic information. For the model training, we examine the use of the maximum likelihood and feature minimum phone error methods to train complementary acoustic models. In this study, we fuse phonotactic features of the acoustic diversified phone recognizers using a simple linear fusion method to build the PPRVSM system. A novel logistic regression optimized weighting (LROW) approach is introduced for fusion factor optimization. The experimental results show that fusion at the feature level is more effective than at the score level. And the proposed system is competitive with the traditional PPRVSM. Finally, the two systems are combined for further improvement. The best performing system reported in this paper achieves an equal error rate (EER) of 1.24%, 4.98% and 14.96% on the NIST 2007 LRE 30-second, 10-second and 3-second evaluation databases, respectively, for the closed-set test condition.

THU-EE System Description for NIST LRE 2015

THUEE system description for NIST 2019 SRE CTS Challenge

THUEE System for NIST SRE19 CTS Challenge.

THUEE system description for NIST 2020 SRE CTS challenge

THUEE system for the Albayzin 2012 language recognition evaluation

THU-EE System Fusion for the NIST 2012 Speaker Recognition Evaluation.

Design and implementation of a speaker recognition system

I4U System Description for NIST SRE'20 CTS Challenge

THUEE Language Modeling Method for the OpenKWS 2015 Evaluation

HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation

UTD-CRSS Systems for 2016 NIST Speaker Recognition Evaluation

The THUEE System for the Openkws14 Keyword Search Evaluation.

Thuee system description for mce 2018

STC speaker recognition systems for the NIST SRE 2021

The SpeakIn System Description for CNSRC2022

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

ICT System Description for the 2006 TC-STAR Run #2 SLT Evaluation

Language Recognition Based on Acoustic Diversified Phone Recognizers and Phonotactic Feature Fusion

The THUEE Speaker Identity Verication System for Evalita 2009 SIV-Application

Parallel Absolute-Relative Feature Based Phonotactic Language Recognition.