Lexicalized Token Subcategory And Complex Context Based Shallow Parsing

Shui Liu,Zheng Zhang,Pengyuan Liu
DOI: https://doi.org/10.1007/978-3-319-27194-1_46
2015-01-01
Abstract:Based on second order hidden markov model (HMM), this paper proposed a Viterbi-decoding chunking algorithm and a novel chunking post-processing algorithm. The method for estimating the parameter in HMM makes use of token subcategory and lexicalization information, which balances the disambiguation ability and data sparseness problem in maximum likelihood estimate (MLE) caused by the token subcategory and lexicalization. To compensate for the absence of complex context during HMM based chunking, this paper proposed a post-processing algorithm which makes a stable improvement to chunking algorithm and avoids the illegal token path in chunking. The experiment indicates that the performance of this chunking system achieves 93% f-measure on the CoNLL 2000 standard testing corpus.
What problem does this paper attempt to address?