K-vec: A New Approach for Aligning Parallel Texts

Pascale Fung,Kenneth Church
DOI: https://doi.org/10.48550/arXiv.cmp-lg/9407021
1994-07-25
Abstract:Various methods have been proposed for aligning texts in two or more languages such as the Canadian Parliamentary Debates(Hansards). Some of these methods generate a bilingual lexicon as a by-product. We present an alternative alignment strategy which we call K-vec, that starts by estimating the lexicon. For example, it discovers that the English word "fisheries" is similar to the French "pe^ches" by noting that the distribution of "fisheries" in the English text is similar to the distribution of "pe^ches" in the French. K-vec does not depend on sentence boundaries.
Computation and Language
What problem does this paper attempt to address?