Abstract:Driven by the surge in students sharing insights on online platforms, evaluating discussion forum posts has emerged as a significant research focus. This assessment is crucial for both students and educators, as it enables teachers to efficiently gauge student comprehension and helps students tailor their learning approaches to suit their individual needs, thereby fostering personalized growth. However, existing research on post-quality assessment faces challenges: 1) They overlook multimodal information interaction between post and topic, leading to inaccurate evaluations. 2) They are usually classification tasks that fail to provide feedback on the minor and relative differences between posts on the same topic. 3) They are usually evaluated using a single perspective, which is insufficient for capturing the complexity of the relationship between post and topic. Based on the above challenges, we propose a new assessment task, Multimodal Topic-Post Relevance Score Prediction (MTRSP), which analyzes whether a student's post comprehensively answers the question of the discussion topic by combining text and images to predict topic-post relevance scores, i.e., the degree of correctness of the answer. We develop an end-to-end Multi-perspective Topic-Post Relevance Score Reasoning (MTRSR) Method to solve the MTRSP Task, which leverages images and text from both the post and topic to infer topic-post relevance scores based on semantic similarity and logical coherence. Specifically, the topic-post content relevance reasoning module uses multimodal fusion to learn the semantic similarity of posts and topics. The logical coherence inference module examines the logical connections between posts and topics. Finally, we use three newly collected multimodal topic-post datasets and the public dataset Lazada-Home as an evaluation benchmark for the MTRSP task. Experimental results show that our MTRSR method can bring up to 9.02% in the NDCG@3 (Normalized Discounted Cumulative Gain) metric compared to the best-performing text-only model. The source code and dataset will be made public.

Complementary or Substitutive? A Novel Deep Learning Method to Leverage Text-image Interactions for Multimodal Review Helpfulness Prediction

Adaptive Contrastive Learning on Multimodal Transformer for Review Helpfulness Predictions

Multimodal Interaction Modeling via Self-Supervised Multi-Task Learning for Review Helpfulness Prediction

A novel deep learning method to use feature complementarity for review helpfulness prediction

Contrastive Intra- and Inter-Modality Generation for Enhancing Incomplete Multimedia Recommendation

An evidence-based multimodal fusion approach for predicting review helpfulness with human-AI complementarity

SANCL: Multimodal Review Helpfulness Prediction with Selective Attention and Natural Contrastive Learning

Transfer Meets Hybrid: A Synthetic Approach for Cross-Domain Collaborative Filtering with Text

A Collaborative Neural Model for Rating Prediction by Leveraging User Reviews and Product Images

Multimodal Review Generation for Recommender Systems

Predicting Product Review Helpfulness - A Hybrid Method

Gradient-Boosted Decision Tree for Listwise Context Model in Multimodal Review Helpfulness Prediction

End-to-end Multi-perspective Multimodal Posts Relevance Score Reasoning Prediction

On Analyzing the Role of Image for Visual-Enhanced Relation Extraction (student Abstract).

Attending to Customer Attention: A Novel Deep Learning Method for Leveraging Multimodal Online Reviews to Enhance Sales Prediction

Visual-Textual Sentiment Analysis Enhanced by Hierarchical Cross-Modality Interaction

New Ideas and Trends in Deep Multimodal Content Understanding: A Review

Identifying Complements and Substitutes of Products

Predicting Helpfulness of Online Reviews

A Survey on Image-text Multimodal Models

Multi-Task Neural Learning Architecture for End-to-End Identification of Helpful Reviews.