Abstract:During multi-turn dialogue, with the increase in dialogue turns, the difficulty of intention recognition and the generation of the following sentence reply become more and more difficult. This paper mainly optimizes the context information extraction ability of the Seq2Seq Encoder in multi-turn dialogue modeling. We fuse the historical dialogue information and the current input statement information in the encoder to capture the context dialogue information better. Therefore, we propose a BERT-based fusion encoder ProBERT-To-GUR (PBTG) and an enhanced ELMO model 3-ELMO-Attention-GRU (3EAG). The two models mainly enhance the contextual information extraction capability of multi-turn dialogue. To verify the effectiveness of the two proposed models, we demonstrate the effectiveness of our model by combining data based on the LCCC-large multi-turn dialogue dataset and the Naturalconv multi-turn dataset. The experimental comparison results show that, in the multi-turn dialogue experiments of the open domain and fixed topic, the two Seq2Seq coding models proposed are significantly improved compared with the current state-of-the-art models. For specified topic multi-turn dialogue, the 3EAG model has the average BLEU value reaches the optimal 32.4, which achieves the best language generation effect, and the BLEU value in the actual dialogue verification experiment also surpasses 31.8. for open-domain multi-turn dialogue. The average BLEU value of the PBTG model reaches 31.8, the optimal 31.8 achieves the best language generation effect, and the BLEU value in the actual dialogue verification experiment surpasses 31.2. So, the 3EAG model is more suitable for fixed-topic multi-turn dialogues for the two tasks. The PBTG model is more muscular in open-domain multi-turn dialogue tasks; therefore, our model is significant for promoting multi-turn dialogue research.

Detect Turn-takings in Subtitle Streams with Semantic Recall Transformer Encoder

SBAT: Video Captioning with Sparse Boundary-Aware Transformer

Subtitles to Segmentation: Improving Low-Resource Speech-to-Text Translation Pipelines

RTQ: Rethinking Video-language Understanding Based on Image-text Model

Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding.

Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition

SegRewardGraph: unsupervised teaching video story segmentation method based on subtitle length-rewarding strategy and semantic relatedness graphs

Automatically Annotate TV Series Subtitles for Dialogue Corpus Construction

Long Short-Term Relation Transformer With Global Gating for Video Captioning

End-to-End Subtitle Detection and Recognition for Videos in East Asian Languages via CNN Ensemble with Near-Human-Level Performance

Mart: Memory-Augmented Recurrent Transformer For Coherent Video Paragraph Captioning

Unsupervised Abstractive Dialogue Summarization for Tete-a-Tetes

Gated Multimodal Fusion with Contrastive Learning for Turn-taking Prediction in Human-robot Dialogue

Reading and Thinking: Re-read LSTM Unit for Textual Entailment Recognition.

A text-dependent speaker verification application framework based on Chinese numerical string corpus

Human–Machine Multi-Turn Language Dialogue Interaction Based on Deep Learning

Visual-Semantic Transformer for Scene Text Recognition

An Unsupervised Dialogue Topic Segmentation Model Based on Utterance Rewriting

End-to-End Video Text Spotting with Transformer

Character-aware audio-visual subtitling in context

Direct Speech Translation for Automatic Subtitling