Complement the comparable corpus obtained from websites

Zhou Youliang,Gong Zhengxian,Zhou Guodong
DOI: https://doi.org/10.1109/ICFCC.2010.5497762
2010-01-01
Abstract:This paper proposes a method to automatically extract high quality phrase translation tuples from web corpora, and discuss the automatic way to complement the lost part of the bilingual corpora for the first time. It analyzes the features of bilingual translation pairs in web pages, and then a statistical discriminative model combined with multiple features is used to extract translation pairs. Experimental results show that after our experiment, the corpus is aligned well enough for related research.
What problem does this paper attempt to address?