An efficient method for determining bilingual word classes

Franz Josef Och
DOI: https://doi.org/10.3115/977035.977046
1999-01-01
Abstract:In statistical natural language processing we always face the problem of sparse data. One way to reduce this problem is to group words into equivalence classes which is a standard method in statistical language modeling. In this paper we describe a method to determine bilingual word classes suitable for statistical machine translation. We develop an optimization criterion based on a maximum-likelihood approach and describe a clustering algorithm. We will show that the usage of the bilingual word classes we get can improve statistical machine translation.
What problem does this paper attempt to address?