Automatic English-Chinese parallel corpus acquisition and sentences extraction

Yi Lu,Xiong Zhang,Dequan Zheng
2013-01-01
Journal of Computational Information Systems
Abstract:There are lots of valuable resource on Internet which can provide with cross languages and cross areas parallel corpus. Some earlier methods are developed to do this mining work. However, they often use one feature only in the mining process. We use multiple reasonable features of parallel pages to acquire parallel corpus. At last, we also add a SVM classifier which utilize all the features to do the mining work. Surely, it achieve a significant improvement than earlier methods. The evaluation is based on massive manually annotated pairs and our method achieves precision rate of 95% and recall rate of 99%. Copyright © 2013 Binary Information Press.
What problem does this paper attempt to address?