Introduction to Special Section on Paraphrasing
Haifeng Wang,Bill Dolan,Idan Szpektor,Shiqi Zhao
DOI: https://doi.org/10.1145/2483669.2483670
IF: 5
2013-01-01
ACM Transactions on Intelligent Systems and Technology
Abstract:Paraphrasing, conveying the same meaning in different ways, is an intrinsic part of natural languages. The research field of Automatic Paraphrasing encompasses the tasks of collecting, identifying, and generating paraphrases in an automatic or a computeraided manner. In addition, researchers have investigated the contribution of automatic paraphrasing techniques to many natural language applications, such as question answering (QA), information extraction (IE), multi-document summarization (MDS), and machine translation (MT). For example, in Machine Translation, paraphrases have been used for rewriting and simplifying input sentences, enlarging translation phrase tables, expanding human references for automatic evaluation, and so forth. This special section of ACM TIST is intended to cover state-of-the-art research in automatic paraphrasing. Especially, we highlight the applications of paraphrasing techniques in real-world systems, such as MT systems and search engines. Seven articles are included in the special section. One of them is about paraphrase extraction from monolingual corpora, while the other six discuss the applications of paraphrases, including paraphrasing for machine translation, sentence compression, word meaning computing, and plagiarism detection. There are three articles that focus on applying paraphrasing techniques for MT. These articles cover the three main research directions mentioned, namely, source sentence rewriting, phrase table enlargement, and human reference expansion. In “Using Targeted Paraphrasing and Monolingual Crowdsourcing to Improve Translation” by Philip Resnik, Olivia Buzek, Yakov Kronrod, Chang Hu, Alexander J. Quinn, and Benjamin B. Bederson, the authors propose enhancing the translation quality of an SMT system based on crowdsourcing. A remarkable advantage of the proposed method is that it involves only monolingual workers to identify target-side translation errors and supply source-side paraphrase, rather than relying on workers with bilingual expertise. The proposed solution has the potential of providing a more cost-effective approach to translation in scenarios where machine translation would be considered acceptable to use if only it were generally of high enough quality. It also has the potential to vastly reduce the burden of human effort for cases in which bilingual translators postedit machine translation output. In the article “Distributional Phrasal Paraphrase Generation for Statistical Machine Translation” by Yuval Marton, the author focuses on extracting paraphrases to improve the coverage of the translation model. The proposed method extracts paraphrases from large-scale monolingual corpora based on distributional similarity. The extracted paraphrases are then used to augment a translation phrase table with pairs not covered by the initial table. The novelty of the proposed method lies in it being languageindependent, and hence it does not rely on bitexts for generating paraphrases or new phrase pairs. In “Generating Targeted Paraphrases for Improved Translation” by Nitin Madnani and Bonnie Dorr, the authors adopt an approach that uses automatic paraphrase generation to tune parameters for an SMT system. Specifically, given a single reference translation, they build a paraphrase generation system that can produce several different semantically equivalent variants that can then be used as additional reference translations. Experimental results on several language pairs have demonstrated that the proposed approach can improve translation quality. Furthermore, this article presents