Robust multimedia spam filtering based on visual, textual, and audio deep features and random forest
Marouane Kihal,Lamia Hamza
DOI: https://doi.org/10.1007/s11042-023-15170-x
IF: 2.577
2023-04-04
Multimedia Tools and Applications
Abstract:Nowadays, there is a growing demand among Internet and social media users for improved protection against spam. Despite numerous studies focused on spam detection, no contribution has addressed filtering text, image, audio, and video modalities of multimedia content simultaneously. In view of this situation, we present in this paper a new deep multimodal decision-level fusion system that could effectively detect multimedia spam. Our proposed system employs Convolutional Neural Networks (CNN) for feature extraction and selection. The retrieved features are organized into three independent vectors, namely visual, textual, and audio (VTA) vectors, to attain a strong content representation. Each vector is then individually fed into a Random Forest (RF) model for further analysis and classification. Thus, we have called our model VTA-CNN-RF. We show that our model overcomes seven Machine Learning (ML) algorithms in each of the three types of VTA information. Additionally, our study involved experiments demonstrating the fusion's advantages on the system's overall performance. Our results indicate a precision rate of 99.08% on a publicly available hybrid dataset that includes text and image content and 98.20% on a composite multimedia dataset. The proposed VTA-CNN-RF model provides superior spam identification compared to previous methods.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering