Speech Emotion Recognition (SER) dengan Metode Bidirectional LSTM

Maryamah Maryamah,Nicholas Juan Kalvin Pradiptamurty,Hafiyyah Khayyiroh Shafro,Mohammad Sihabudin Al Qurtubi,Giovanny Alberta Tambahjong,Qothrotunnidha' Almaulidiyah
DOI: https://doi.org/10.33005/senada.v3i1.105
2023-11-07
Abstract:Emotions are a part of humans as a form of response to experienced events. Emotion analysis or known as speech emotion recognition (SER) is a field many researchers are interested in because voice recognition systems can assist in criminal investigations, monitoring, and detection of potentially dangerous events, and assisting the health care system. Therefore, this study proposes the detection of SER using the Bidirectional Long short-term memory (Bi-LSTM) model approach. The dataset used was scraped on the YouTube platform. The dataset is manually labeled then feature extraction is performed using the Mel Frequency Cepstral Coefficients (MFCC). The experiment using the Bi-LSTM method has an AUC ROC value of 0.97 and an f1-score value of 0.878. Based on these results, it can be concluded that the performance of the proposed method succeeded in predicting SER better than other comparison methods. This model also proved to be more precise in classifying human voices based on four types of emotions, namely happy, sad, angry, and neutral.
What problem does this paper attempt to address?