A Stochastic Finite-State Word-Segmentation Algorithm for Chinese

Richard Sproat,Chilin Shih,William Gale,Nancy Chang
DOI: https://doi.org/10.48550/arXiv.cmp-lg/9405008
1994-05-06
Abstract:We present a stochastic finite-state model for segmenting Chinese text into dictionary entries and productively derived words, and providing pronunciations for these words; the method incorporates a class-based model in its treatment of personal names. We also evaluate the system's performance, taking into account the fact that people often do not agree on a single segmentation.
Computation and Language
What problem does this paper attempt to address?