Social Perception Prediction for MuSe 2024: Joint Learning of Multiple Perceptions

Zhuofan Wen,Hailiang Yao,Shun Chen,Haiyang Sun,Mingyu Xu,Licai Sun,Zheng Lian,Bin Liu,Fengyu Zhang,Siyuan Zhang,Jianhua Tao
DOI: https://doi.org/10.1145/3689062.3689087
2024-01-01
Abstract:In this paper, we present our unique method for the MuSe 2024 Perception sub-challenge. In the Perception sub-challenge, 21 labeled social perceptions data are given, 16 social perceptions are required to be predicted. Joint learning is crucial for our approach, as it allows for the comprehensive integration of multiple perceptions to enhance prediction accuracy. We fully utilize the LMU-ELP dataset, integrating 16 perceptions and their PCC distribution, along with an additional 5 perceptions that are not required for prediction, for joint prediction. We use visual, audio, and text modality features as the basic multimodal input into an MLP encoder, and employ 21 encoders to represent the 21 perceptions provided in the LMU-ELP dataset. All embeddings are stacked and multiplied with a learnable PCC matrix, initialized as the 21 perceptions PCC matrix. This is followed by a attention block for further joint learning. Our method achieves a mean Pearson's correlation coefficient of 0.4098 and ranks in the Top 1 in this challenges.
What problem does this paper attempt to address?