Leveraging morphological information via employing word hashing for sequence labeling

Zonghui Peng,Ruifang Liu,Si Li
DOI: https://doi.org/10.1109/PIC.2017.8359510
2017-01-01
Abstract:State-of-the-art sequence labeling systems traditionally used handcrafted n-gram features and data pre-processing, but usually ignored character-level information. In this paper, we propose to apply word hashing method which can catch the morphological information of words to sequence labeling tasks. Auto-encoder is first employed to learn latent morphological representation in a pre-training stage. Our model benefits from both morphological and semantic features of words by using bidirectional LSTM structure. Experiment results show that our model achieves best result on Chunking task - 94.93% and NP-Chunking task - 95.70% on CoNLL2000 dataset and obtains competitive performance on NER task - 89.29% on CoNLL2003 dataset.
What problem does this paper attempt to address?