Clickbait Detection in YouTube Videos

Ruchira Gothankar,Fabio Di Troia,Mark Stamp
DOI: https://doi.org/10.48550/arXiv.2107.12791
2021-07-26
Abstract:YouTube videos often include captivating descriptions and intriguing thumbnails designed to increase the number of views, and thereby increase the revenue for the person who posted the video. This creates an incentive for people to post clickbait videos, in which the content might deviate significantly from the title, description, or thumbnail. In effect, users are tricked into clicking on clickbait videos. In this research, we consider the challenging problem of detecting clickbait YouTube videos. We experiment with multiple state-of-the-art machine learning techniques using a variety of textual features.
Machine Learning,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to detect the clickbait phenomenon in YouTube videos. Specifically, the author focuses on how to use machine learning and deep learning techniques to identify videos whose titles, descriptions or thumbnails do not match the actual content. These videos aim to attract users to click in order to increase the number of views and the income of creators. ### Problem Background With the development of the Internet, more and more people rely on the network to obtain information. Many platforms allow anyone to publish content, but the authenticity of this content cannot be guaranteed. Especially on video - sharing platforms such as YouTube, some creators will use misleading titles, descriptions or thumbnails to attract users to click in order to increase the number of views and income. This behavior is called "clickbait", which not only wastes users' time, but may also affect the trustworthiness of the platform. ### Research Objectives The author hopes to detect clickbait videos on YouTube by experimenting with a variety of advanced machine learning techniques and using text features (such as titles, descriptions, etc.). Specific research objectives include: 1. **Identify clickbait videos**: Analyze the content of videos such as titles, descriptions, thumbnails and comments to determine whether they are clickbait. 2. **Improve detection accuracy**: Experiment with different machine learning and deep learning models to find the most effective detection method. 3. **Explore multi - modal features**: In addition to text features, other statistical features (such as the number of likes, the number of comments, etc.) are also considered to improve the detection effect. ### Method Overview The author used a variety of machine learning and deep learning models in the research and combined different feature extraction methods. Mainly including: - **Logistic Regression with Word2Vec**: Use Word2Vec to generate word vectors and combine the metadata features of videos for classification. - **Random Forest with Word2Vec**: A random forest classifier based on Word2Vec embedding, trained with more features. - **MLP with Word2Vec**: Use a multi - layer perceptron (MLP) to process the text features and metadata features of Word2Vec embedding. - **MLP with BERT**: Use BERT to generate context - related word vectors and combine metadata features for classification. - **MLP with DistilBERT**: Use DistilBERT (a lightweight version of BERT) to perform similar tasks. ### Results and Conclusions Through experiments, the author found that: - Using more features (such as titles, descriptions, the number of likes, the number of comments, etc.) can significantly improve the detection accuracy. - The random forest model performs best when all features are combined, with an accuracy rate of 92.5%. - Pretrained language models such as BERT and DistilBERT perform excellently in processing natural language tasks and can capture more abundant semantic information. In conclusion, this research provides an effective method for automatically detecting clickbait videos on YouTube, which is helpful for improving user experience and platform trustworthiness.