A Refinement Framework for Cross Language Text Categorization.

Ke Wu,Bao-Liang Lu
DOI: https://doi.org/10.1007/978-3-540-68636-1_39
2008-01-01
Abstract:Cross language text categorization is the task of exploiting labelled documents in a source language (e.g. English) to classify documents in a target language (e.g. Chinese). In this paper, we focus on investigating the use of a bilingual lexicon for cross language text categorization. To this end, we propose a novel refinement framework for cross language text categorization. The framework consists of two stages. In the first stage, a cross language model transfer is proposed to generate initial labels of documents in target language. In the second stage, expectation maximization algorithm based on naive Bayes model is introduced to yield resulting labels of documents. Preliminary experimental results on collected corpora show that the proposed framework is effective.
What problem does this paper attempt to address?