Design and Implementation of Bilingual Parallel Web Page Mining System

CHEN Wei,HUANG Lei,LIU Feng,ZHAO Zhi-hong
DOI: https://doi.org/10.3969/j.issn.1000-3428.2009.14.093
2009-01-01
Abstract:Aiming at bilingual corpora is critical resources for developing statistical machine translation system,this paper presents a method which automatically mines bilingual parallel Web page form Web.Different from mining data from pre-specified Web sites,the system is developed to mine parallel Web page from the entire Web,it is greatly suitable for new content domains and language pairs.It implements a parallel Web page mining system.Experimental results show that the system can provide large scale and high quality parallel Web page for statistical machine translation.
What problem does this paper attempt to address?