The Research and Application about the Information Extraction in Chinese Domain
Suxiang Zhang,Juan Wen,Ying Qin,Xiaojie Wang,Yixin Zhong
DOI: https://doi.org/10.1109/ICOSP.2006.345822
2007-01-01
Abstract:A specific prototype information service system was proposed by this paper, which can send interesting information to user with database search way from unstructured text. In order to achieve this goal, two fundamental issues were studied by using maximum entropy (ME) algorithm, which is named entity recognition and relation extraction. Our named entity recognition approach is distinguished from most of the previous approaches. Where, probabilistic feature functions are used instead of binary feature functions, it is one of the several differences between this model and the most of the previous ME based model. We also explore several new features in our model, which includes confidence functions, position of features etc. Like those in some previous works, we use sub-models to model Chinese person names, foreign names respectively, but we bring some new techniques in these sub-models. The experimental result is promising. Moreover, ME algorithm is the first time to be used to extract relations between entities from Chinese texts. Twelve features have been designed, which includes morphology, grammar and semantic feature. The experimental result is satisfied. Therefore, two research results were used into my information extraction system, the goal of information service came from unstructured text is achieved