Digitizing On Chinese Ancient Books: Information Extraction And Retrieval

M Zhang,Sp Ma,Z Jiang
2001-01-01
Abstract:This paper describes the digitizing of ancient Chinese books, which is almost still a blank in intelligent information processing field. The digitizing processes are composed of three parts of works. First one is pre-processing, to extracting information from ancient books kept on papers. Second is retrieval, to search and get the information from full text and metadata. In the full text retrieval, Vector Space Model (VSM) is used and improved. Fussy and accurate matching algorithms are combined to improve the system performance. The last one is reorganizing, to show the information to the user with a friendly human-computer interaction interface, keeping the very original style. All the works are based on the statistical analyses on over 35,000,000 words of Chinese Ancient books. The system is designed and realized in the network.
What problem does this paper attempt to address?