Cross-document Personal Name Disambiguation Merging Sentential Semantic Analysis

Han ZHANG,Sen-lin LUO,Li-li ZOU,Xiu-min SHI
DOI: https://doi.org/10.3785/j.issn.1008-973x.2015.04.016
2015-01-01
Abstract:A multi‐stage disambiguation algorithm was proposed based on the construction of text feature space .According to the characteristics of query terms often occurring as common terms ,heuristic rule was applied to determine if the query term is personal name after the pre‐processing of documents .Then named entity and occupation were extracted according to the feature templates .The sentential semantic model was used for sentential semantic analysis and sentential semantic features extraction .The word frequency was counted according to the bag‐of‐words model .Then the three layers of feature space were constructed . The rule‐based classification and two‐stage hierarchical clustering algorithm was used to realize the name disambiguation .The overlap coefficient was introduced to compute the similarity of the sentential semantic features .The experiments datasets built by CLP2012 Chinese Personal Name disambiguation showed that F achieved 88 .79% , w hich proved that the proposed approach can improve the performance of cross‐document personal name disambiguation .
What problem does this paper attempt to address?