Advanced LSTM: A Study about Better Time Dependency Modeling in Emotion Recognition

Fei Tao,Gang Liu
DOI: https://doi.org/10.48550/arXiv.1710.10197
2017-10-27
Abstract:Long short-term memory (LSTM) is normally used in recurrent neural network (RNN) as basic recurrent unit. However,conventional LSTM assumes that the state at current time step depends on previous time step. This assumption constraints the time dependency modeling capability. In this study, we propose a new variation of LSTM, advanced LSTM (A-LSTM), for better temporal context modeling. We employ A-LSTM in weighted pooling RNN for emotion recognition. The A-LSTM outperforms the conventional LSTM by 5.5% relatively. The A-LSTM based weighted pooling RNN can also complement the state-of-the-art emotion classification framework. This shows the advantage of A-LSTM.
Machine Learning,Sound,Audio and Speech Processing
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to improve the long - short - term memory network (LSTM)'s ability to model temporal dependencies in sentiment recognition. Specifically, the traditional LSTM assumes that the current state depends only on the state at the previous moment, and this assumption limits its ability when dealing with time - series data. To overcome this limitation, the author proposes a new LSTM variant - Advanced LSTM (A - LSTM). This model allows the current state to depend on states at multiple different time steps, thus providing a more flexible ability to model temporal dependencies. By applying A - LSTM in the weighted pooling recurrent neural network (RNN) for sentiment recognition, the paper shows the performance improvement of A - LSTM compared to the traditional LSTM. The experimental results indicate that A - LSTM has a relative improvement of 5.5% in the Macro - averaged F - measure (MAF), which shows that A - LSTM has better performance when dealing with sentiment recognition tasks. In addition, the paper also explores the potential of A - LSTM in practical applications, especially in voice - based sentiment recognition scenarios such as voice assistants.