The Development of the Cambridge University Alignment Systems for the Multi-Genre Broadcast Challenge.

P. Lanchantin,M. J. F. Gales,P. Karanasou,X. Liu,Y. Qian,L. Wang,P. C. Woodland,C. Zhang
DOI: https://doi.org/10.1109/asru.2015.7404857
2015-01-01
Abstract:We describe the alignment systems developed both for the preparation of data for the Multi-Genre Broadcast (MGB) challenge and for our participation in the transcription and alignment tasks. Captions of varying quality are aligned with the audio of TV shows that range from few minutes long to more than six hours. Lightly supervised decoding is performed on the audio and the output text is aligned with the original text transcript. Reliable split points are found and the resulting text chunks are force-aligned with the corresponding audio segments. Confidence scores are associated with the aligned data. Multiple refinements - including audio segmentation based on deep neural networks (DNNs) and the use of DNN-based acoustic models - were used to improve the performance. The final MGB alignment system had the highest F-measure value on the evaluation data.
What problem does this paper attempt to address?