Abstract:This paper presents the work of restoring punctuation for ASR transcripts generated by multilingual ASR systems. The focus languages are English, Mandarin, and Malay which are three of the most popular languages in Singapore. To the best of our knowledge, this is the first system that can tackle punctuation restoration for these three languages simultaneously. Traditional approaches usually treat the task as a sequential labeling task, however, this work adopts a slot-filling approach that predicts the presence and type of punctuation marks at each word boundary. The approach is similar to the Masked-Language Model approach employed during the pre-training stages of BERT, but instead of predicting the masked word, our model predicts masked punctuation. Additionally, we find that using Jieba1 instead of only using the built-in SentencePiece tokenizer of XLM-R can significantly improve the performance of punctuating Mandarin transcripts. Experimental results on English and Mandarin IWSLT2022 datasets and Malay News show that the proposed approach achieved state-of-the-art results for Mandarin with 73.8% F1-score while maintaining a reasonable F1-score for English and Malay, i.e. 74.7% and 78% respectively. Our source code that allows reproducing the results and building a simple web-based application for demonstration purposes is available on Github.

Streaming Punctuation for Long-form Dictation with Transformers

Streaming Punctuation: A Novel Punctuation Technique Leveraging Bidirectional Context for Continuous Speech Recognition

Improved Training for End-to-End Streaming Automatic Speech Recognition Model with Punctuation

Incorporating External POS Tagger for Punctuation Restoration

Fast and Accurate Capitalization and Punctuation for Automatic Speech Recognition Using Transformer and Chunk Merging

Automatic punctuation generation for speech

Semantic Segmentation with Bidirectional Language Models Improves Long-form ASR

A light-weight and efficient punctuation and word casing prediction model for on-device streaming ASR

Multimodal Punctuation Prediction with Contextual Dropout

Resolving Transcription Ambiguity in Spanish: A Hybrid Acoustic-Lexical System for Punctuation Restoration

Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments

Leveraging Timestamp Information for Serialized Joint Streaming Recognition and Translation

FF2: A Feature Fusion Two-Stream Framework for Punctuation Restoration

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

A CIF-Based Speech Segmentation Method for Streaming E2E ASR

Sentence Punctuation for Collaborative Commentary Generation in Esports Live-Streaming

Text-conditioned Transformer for Automatic Pronunciation Error Detection

Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study

Streaming Sequence Transduction through Dynamic Compression

Boosting Punctuation Restoration with Data Generation and Reinforcement Learning

LLaMA based Punctuation Restoration With Forward Pass Only Decoding