Unsupervised Training With Directed Manual Transcription For Recognising Mandarin Broadcast Audio

K. Yu,M. J. F. Gales,P. C. Woodland
DOI: https://doi.org/10.21437/interspeech.2007-480
2007-01-01
Abstract:The performance of unsupervised discriminative training has been found to be highly dependent on the accuracy of the initial automatic transcription. This paper examines a strategy where a relatively small amount of poorly recognised data are manually transcribed to supplement the automatically transcribed data. Experiments were carried out on a Mandarin broadcast transcription task using both Broadcast News (BN) and Broadcast Conversation (BC) data. A range of experimental conditions are compared for both maximum likelihood and discriminative training using directed manual transcription. For BC data, using fully unsupervised discriminative training, only 17% of the reduction in character error rate (CER) from supervised training is obtained. By automatically selecting 18% of the data for manual transcription yields 50% of the CER gain from supervised training. The directed approach to selecting data outperforms the use of a random set of data for manual transcription.
What problem does this paper attempt to address?