WeaveNet: End-to-End Audiovisual Sentiment Analysis

Yinfeng Yu,Zhenhong Jia,Fei Shi,Meiling Zhu,Wenjun Wang,Xiuhong Li
DOI: https://doi.org/10.1007/978-981-16-9247-5_1
2022-01-01
Abstract:The way of analyzing sentiment by the proposed model in this paper is strikingly similar to the mechanism by which one person perceives another’s sentiment. In this paper, We proposed a novel neural architecture named WeaveNet to “listen” and “watch” a person’s sentiment. The main strength of our model comes from capturing both intra-interactions of one modal and inter-interactions of different modals stage by stage. Intra-interactions were modeled by convolution operations in the first few stages for each modality respectively and by bidirectional LSTM in the final stage for both audio clips and video clips. Inter-interactions were recognized at each stage applying various fusion effectively. At the same time, our model concentrated on the delicate design of the neural network rather than handcrafted features. The inputs of the network in our model were raw audios and natural images. In addition, audio clips and frames of a video were aligned by keyframe rather than by time in time order. We performed extensive comparisons on three publicly available datasets for both sentiment analysis and emotion recognition. WeaveNet outperformed state-of-the-art results in three publicly available datasets.
What problem does this paper attempt to address?