Hidden Markov Chains, Entropic Forward-Backward, and Part-Of-Speech Tagging

Elie Azeraf,Emmanuel Monfrini,Emmanuel Vignon,Wojciech Pieczynski
DOI: https://doi.org/10.48550/arXiv.2005.10629
2020-05-21
Abstract:The ability to take into account the characteristics - also called features - of observations is essential in Natural Language Processing (NLP) problems. Hidden Markov Chain (HMC) model associated with classic Forward-Backward probabilities cannot handle arbitrary features like prefixes or suffixes of any size, except with an independence condition. For twenty years, this default has encouraged the development of other sequential models, starting with the Maximum Entropy Markov Model (MEMM), which elegantly integrates arbitrary features. More generally, it led to neglect HMC for NLP. In this paper, we show that the problem is not due to HMC itself, but to the way its restoration algorithms are computed. We present a new way of computing HMC based restorations using original Entropic Forward and Entropic Backward (EFB) probabilities. Our method allows taking into account features in the HMC framework in the same way as in the MEMM framework. We illustrate the efficiency of HMC using EFB in Part-Of-Speech Tagging, showing its superiority over MEMM based restoration. We also specify, as a perspective, how HMCs with EFB might appear as an alternative to Recurrent Neural Networks to treat sequential data with a deep architecture.
Machine Learning,Computation and Language
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: **How to enable the Hidden Markov Chain (HMC) model to effectively handle arbitrary features, so as to achieve better performance in Natural Language Processing (NLP) tasks**. Specifically, the paper points out that the traditional HMC model and its classic Forward - Backward (FB) algorithm cannot handle arbitrary features (such as affixes, word length, etc.) very well, unless it is assumed that these features are independent of each other. However, in NLP tasks, this independence assumption usually does not hold, so the HMC model performs poorly in text segmentation tasks. To solve this problem, the author proposes a new calculation method, that is, the HMC model based on **Entropic Forward - Backward (EFB)**. This method allows the HMC model to use arbitrary features as flexibly as the Maximum Entropy Markov Model (MEMM), without relying on the independence assumption. In addition, the paper also experimentally proves the superiority of the EFB - based HMC model in the Part - Of - Speech Tagging (POS Tagging) task and explores its potential as an alternative to Recurrent Neural Networks (RNN). ### Key Point Summary: 1. **Problem Background**: - Traditional HMC models and their algorithms (such as Viterbi and FB) perform poorly in NLP tasks because they cannot flexibly handle arbitrary features. - This limitation has prompted researchers to develop other models (such as MEMM and RNN) to make up for the shortcomings of HMC. 2. **Solution**: - Propose a new HMC model based on EFB probability, which solves the limitation of feature processing. - The new method allows the HMC model to directly utilize arbitrary features without the independence assumption. 3. **Experimental Results**: - In the POS tagging task, the EFB - based HMC model performs better than the MEMM model, especially when dealing with unknown words, the effect is significantly improved. 4. **Future Prospects**: - Discuss the possibility of expanding the HMC model to handle deep - sequence data, which may become an alternative to RNN. Through this research, the paper demonstrates the potential of the HMC model in the NLP field and provides a new direction for future research.