Zephyr: Zero-Shot Punctuation Restoration
Minghan Wang,Yinglu Li,Jiaxin Guo,Xiaosong Qiao,Chang Su,Min Zhang,Shimin Tao,Hao Yang
DOI: https://doi.org/10.1109/ICASSP49357.2023.10095799
2023-01-01
Abstract:Punctuation restoration can be crucial for the cascade speech translation system. Traditional approaches typically treat it as a sequential tagging problem, predicting which punctuation mark should follow a given word. However, this often requires significant computational and storage resources for full-stage training or fine-tuning. Our argument is that pre-trained language models (PLMs) can directly leverage their learned knowledge for punctuation generation, making additional training unnecessary. In this paper, we propose the Zephyr algorithm, which utilizes PLMs to perform zero-shot and few-shot punctuation restoration for both offline and streaming scenarios. Our experimental results demonstrate that, in comparison to fine-tuning-based baselines, Zephyr achieves competitive performance while requiring little to no training cost and exhibiting better generalizability in zeroshot and few-shot settings.