Statistical Analyses on Chinese Ancient Books fo Information Retrieval

Zhang Min
2001-01-01
Abstract:Based on the need of information retrieval technology on Chinese ancient books,we made the statistical analyses of the ancient Chinese on a large scale corpus.Firstly,we propose a method to cooperate corpus on different fields.In this method,we analyzed the statistics of ancient Chinese on more than 35,000,000 characters.It shows that the common used characters are concentrated but the remaining is diffused with the decreasing speed of exponential.Then we give some more analyses on bigrams.Comparisons are made between modern Chinese and ancient Chinese.Conclusions are got and Chinese characters are divided into four different parts according with the usage frequency.Finally,these statistics are used in the information retrieval system of ancient Chinese books.
What problem does this paper attempt to address?