Detect Turn-takings in Subtitle Streams with Semantic Recall Transformer Encoder

Yuhai Liang,Qiang Zhou
DOI: https://doi.org/10.1109/ialp51396.2020.9310512
2020-01-01
Abstract:Subtitles are precious dialogue text data because of similarity to human conversation, but the lack of turn structures limits their applications in many NLP tasks. The previous work takes turn-taking detection (TTD) in subtitles as a sentence-pair classification problem to predict if there is a turn-taking happened between the two adjacent utterances. The results are not good enough. For considering the dialogue context information, we innovatively take TTD as a sentence-level sequence labelling problem, predicting the turn-takings through the whole subtitle stream. First, we present a novel fine-tuning method that enables BERT to encode utterances into embedding set with effective turn-taking features. Then, we propose the Semantic Recall Transformer (SRT) model to detect the turn-takings among the utterance embedding set, by taking the turn-taking features and context information into account at the same time. Compared with the baselines, it achieves state-of-the-art on both English and Chinese subtitle corpus. Moreover, we explored the impacts of the length of the subtitle stream and the number of conversation participants on our model performance, which show the performance can be further improved in the future.
What problem does this paper attempt to address?