Reproducibility Companion Paper of "MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style"

Fan Yu,Beibei Zhang,Yaqun Fang,Jia Bei,Tongwei Ren,Jiyi Li,Luca Rossetto
DOI: https://doi.org/10.1145/3652583.3658373
2024-01-01
Abstract:To support the replication of "MMSF: A Multimodal Sentiment-Fused Method to Recognize Video Speaking Style", which was presented at ICMR'23, this companion paper provides the details of the artifacts. Speaking style recognition is aimed at recognizing the styles of conversations, which provides a fine-grained description about talking. In the original paper, we proposed a novel multimodal sentiment-fused method, MMSF, which extracts and integrates visual, audio and textual features of videos and introduced sentiment in MMSF with cross-attention mechanism to enhance the video feature to recognize speaking styles. In this paper, we explain the details of the implement code and the dataset used for experiments.
What problem does this paper attempt to address?