Abstract:Paraphrasing, conveying the same meaning in different ways, is an intrinsic part of natural languages. The research field of Automatic Paraphrasing encompasses the tasks of collecting, identifying, and generating paraphrases in an automatic or a computeraided manner. In addition, researchers have investigated the contribution of automatic paraphrasing techniques to many natural language applications, such as question answering (QA), information extraction (IE), multi-document summarization (MDS), and machine translation (MT). For example, in Machine Translation, paraphrases have been used for rewriting and simplifying input sentences, enlarging translation phrase tables, expanding human references for automatic evaluation, and so forth. This special section of ACM TIST is intended to cover state-of-the-art research in automatic paraphrasing. Especially, we highlight the applications of paraphrasing techniques in real-world systems, such as MT systems and search engines. Seven articles are included in the special section. One of them is about paraphrase extraction from monolingual corpora, while the other six discuss the applications of paraphrases, including paraphrasing for machine translation, sentence compression, word meaning computing, and plagiarism detection. There are three articles that focus on applying paraphrasing techniques for MT. These articles cover the three main research directions mentioned, namely, source sentence rewriting, phrase table enlargement, and human reference expansion. In “Using Targeted Paraphrasing and Monolingual Crowdsourcing to Improve Translation” by Philip Resnik, Olivia Buzek, Yakov Kronrod, Chang Hu, Alexander J. Quinn, and Benjamin B. Bederson, the authors propose enhancing the translation quality of an SMT system based on crowdsourcing. A remarkable advantage of the proposed method is that it involves only monolingual workers to identify target-side translation errors and supply source-side paraphrase, rather than relying on workers with bilingual expertise. The proposed solution has the potential of providing a more cost-effective approach to translation in scenarios where machine translation would be considered acceptable to use if only it were generally of high enough quality. It also has the potential to vastly reduce the burden of human effort for cases in which bilingual translators postedit machine translation output. In the article “Distributional Phrasal Paraphrase Generation for Statistical Machine Translation” by Yuval Marton, the author focuses on extracting paraphrases to improve the coverage of the translation model. The proposed method extracts paraphrases from large-scale monolingual corpora based on distributional similarity. The extracted paraphrases are then used to augment a translation phrase table with pairs not covered by the initial table. The novelty of the proposed method lies in it being languageindependent, and hence it does not rely on bitexts for generating paraphrases or new phrase pairs. In “Generating Targeted Paraphrases for Improved Translation” by Nitin Madnani and Bonnie Dorr, the authors adopt an approach that uses automatic paraphrase generation to tune parameters for an SMT system. Specifically, given a single reference translation, they build a paraphrase generation system that can produce several different semantically equivalent variants that can then be used as additional reference translations. Experimental results on several language pairs have demonstrated that the proposed approach can improve translation quality. Furthermore, this article presents

Enriching SMT Training Data Via Paraphrasing.

Improve SMT Quality with Automatically Extracted Paraphrase Rules

Better Simultaneous Translation with Monotonic Knowledge Distillation.

ParaMac: A General Unsupervised Paraphrase Generation Framework Leveraging Semantic Constraints and Diversifying Mechanisms.

Introduction to Special Section on Paraphrasing

Paraphrase Generation As Unsupervised Machine Translation.

Improved statistical machine translation using monolingual paraphrases

Principled Paraphrase Generation with Parallel Corpora

Leveraging multiple MT engines for paraphrase generation

A Morphologically-Aware Dictionary-based Data Augmentation Technique for Machine Translation of Under-Represented Languages

Unsupervised Paraphrasing by Simulated Annealing

Multilingual Lexical Simplification via Paraphrase Generation

Pushing Paraphrase Away from Original Sentence: A Multi-Round Paraphrase Generation Approach.

Query and Output: Generating Words by Querying Distributed Word Representations for Paraphrase Generation

Integrating Linguistic Knowledge to Sentence Paraphrase Generation.

WORD EMBEDDING ATTENTION NETWORK: GENERATING WORDS BY QUERYING DISTRIBUTED WORD REPRESENTATIONS FOR PARAPHRASE GENERATION

Syntax-Enhanced Neural Machine Translation with Syntax-Aware Word Representations

Extracting Paraphrase Patterns from Bilingual Parallel Corpora

Improving Large-scale Paraphrase Acquisition and Generation

Robustness to Modification with Shared Words in Paraphrase Identification

Faster decoding for subword level Phrase-based SMT between related languages