NN-based Ordinal Regression for Assessing Fluency of ESL Speech.

Shaoguang Mao,Zhiyong Wu,Jingshuai Jiang,Peiyun Liu,Frank K. Soong
DOI: https://doi.org/10.1109/icassp.2019.8682187
2019-01-01
Abstract:Automatic assessment of a language learner's speech fluency is highly desirable for language education, e.g. for English as a Second Language (ESL) learning. In this paper, we formulate the fluency assessment as a problem of Ordinal Regression with Anchored Reference Samples (ORARS), where the fluency of a speech utterance is predicted by an ordinal regression neural network (NN) trained with anchored reference samples. The ORARS is trained and tested by: picking human expert labeled samples in each mean opinion score (MOS) bucket as the anchored reference samples and pairing them with input speech samples as training couplets; training an NN-based binary classifier to determine which sample in a pair is better in fluency; predicting the rank (MOS) of a test sample based upon the posteriors of all binary comparisons between the test sample and all anchored reference samples. Experimentally, our proposed approach outperforms the traditional NN-based methods and reaches a performance of "human parity", i.e. as comparable as human experts, in its fluency assessment of collected ESL speech. To the best of our knowledge, this is the first attempt to assess speech fluency with an ordinal regression framework where a test input is paired with bucketed and anchored reference samples.
What problem does this paper attempt to address?