Proceedings of the 23rd International Conference on Computational Linguistics: Tutorial Notes: Paraphrases and Applications
Shiqi Zhao,Haifeng Wang
2010-01-01
Abstract:Paraphrases are various expressions that convey the same meaning. Research of paraphrasing is critical in many related NLP research areas, such as machine translation (MT), question answering (QA), information retrieval (IR), information extraction (IE), natural language generation (NLG), etc. This tutorial is intended to provide the attendees with an in-depth look at the identification, generation, application, and evaluation of paraphrases. The tutorial first reviews studies on paraphrase identification (or extraction), which aims to acquire paraphrases from various data sources, such as large-scale web corpora, monolingual parallel corpora, monolingual comparable corpora, bilingual parallel corpora, as well as some other resources. It then surveys methods on paraphrase generation, in which the MT-based method will be highlighted, while the other kinds of methods, including thesaurus-based, pattern-based, and NLG-based methods, will also be introduced. We then discuss the applications of paraphrases in related research areas, especially in MT. We will show how paraphrases can help to alleviate data sparseness problem, simplify input sentences, tune parameters, and improve automatic evaluation in statistical MT systems. The last part of the tutorial is about the evaluation of paraphrases. Till now, no approach has been widely accepted on paraphrase evaluation, which leaves it as an open issue. This tutorial will summarize existing approaches to paraphrase evaluation, which include human evaluation, automatic evaluation, and application-driven evaluation. The target audience will be NLP researchers, practitioners, and students. But participants do not need prior knowledge of paraphrasing.