Automatic Chinese name recognition based on web corpus analysis

Liyun Ru,Zijian Tong,Yiqun Liu,Shaoping Ma
2007-01-01
Abstract:In this paper, we proposed a unified solution for Chinese name Recognition based analysis into large scale Chinese Web corpuses. In our approach, a Chinese name is identified according to its component, context and structure features. The possibility of a three-character string being a Chinese name is calculated according to statistical analysis into Web corpus which contains over 100 million Web pages and 24 million Chinese names. Experimental results based on a widely-adopted Chinese annotated corpus show that our method is effective by achieving 93% precision and 89% recall rate.
What problem does this paper attempt to address?