Abstract:Abstract Automatic paraphrase generation is an essential task of natural language processing. However, due to the scarcity of paraphrase corpus in many languages, Chinese, for example, generating high-quality paraphrases in these languages is still challenging. Especially in domain paraphrasing, it is even more difficult to obtain in-domain paraphrase sentence pairs. In this paper, we propose a novel approach for domain-specific paraphrase generation in a zero-shot fashion. Our approach is based on a sequence-to-sequence architecture. The encoder uses a pre-trained multilingual autoencoder model, and the decoder uses a pre-trained monolingual autoregressive model. Because these two models are pre-trained separately, they have different representations for the same token. Thus, we call them unaligned pre-trained language models. We train the sequence-to-sequence model with an English-to-Chinese machine translation corpus. Then, by inputting a Chinese sentence into this model, it could surprisingly generate fluent and diverse Chinese paraphrases. Since the unaligned pre-trained language models have inconsistent understandings of the Chinese language, we believe that the Chinese paraphrasing is actually performed in a Chinese-to-Chinese translation manner. In addition, we collect a small-scale English-to-Chinese machine translation corpus in the domain of computer science. By fine-tuning with this domain-specific corpus, our model shows an excellent capability of domain-paraphrasing. Experiment results show that our approach significantly outperforms previous baselines regarding Relevance, Fluency, and Diversity.

Language-Independent Representations Improve Zero-Shot Summarization

Improving Zero and Few-Shot Abstractive Summarization with Intermediate Fine-tuning and Data Augmentation

Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization

Domain-Oriented Prefix-Tuning: Towards Efficient and Generalizable Fine-tuning for Zero-Shot Dialogue Summarization

Zero-shot Faithfulness Evaluation for Text Summarization with Foundation Language Model

Training Dynamics for Text Summarization Models

Exploring Neural Models for Query-Focused Summarization

Assessing LLMs for Zero-shot Abstractive Summarization Through the Lens of Relevance Paraphrasing

Balancing Lexical and Semantic Quality in Abstractive Summarization

Languages Transferred Within the Encoder: On Representation Transfer in Zero-Shot Multilingual Translation

Revisiting Zero-Shot Abstractive Summarization in the Era of Large Language Models from the Perspective of Position Bias

Finetuned Language Models Are Zero-Shot Learners

Searching for Effective Multilingual Fine-Tuning Methods: A Case Study in Summarization

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

Unleashing the Power of Contrastive Learning for Zero-Shot Video Summarization

Zero-Shot Strategies for Length-Controllable Summarization

Zero-shot domain paraphrase with unaligned pre-trained language models

Pre-trained Language Model Representations for Language Generation

Text Summarization with Pretrained Encoders

Summarization is (Almost) Dead

Zero-Shot Cross-Lingual Abstractive Sentence Summarization Through Teaching Generation and Attention