Abstract:The growing popularity of social media has engendered the social problem of spam proliferation through this medium. New spam types that evade existing spam detection systems are being developed continually, necessitating corresponding countermeasures. This study proposes an anomaly detection-based framework to detect new Twitter spam, which works by modeling the characteristics of non-spam tweets and using anomaly detection to classify tweets deviating from this model as anomalies. However, because modeling varied non-spam tweets is challenging, the technique's spam detection and false positive (FP) rates are low and high, respectively. To overcome this shortcoming, anomaly detection is performed on known spam tweets pre-detected using a trained decision tree while modeling normal tweets. A one-class support vector machine and an autoencoder with high detection rates are used for anomaly detection. The proposed framework exhibits superior detection rates for unknown spam compared to conventional techniques, while maintaining equivalent or improved detection and FP rates for known spam. Furthermore, the framework can be adapted to changes in spam conditions by adjusting the costs of detection errors.

What problem does this paper attempt to address?

The problem this paper attempts to address is: With the proliferation of social media, spam on these platforms has rapidly increased, especially with the emergence of new types of spam that can bypass existing detection systems. Therefore, new methods need to be developed to effectively detect these new types of spam. Specifically, this study proposes an anomaly detection-based framework for detecting new types of spam on Twitter. ### Background of the Paper - **Widespread use of social media**: With the popularization of the internet and smartphones, the number of social media users has significantly increased, making these platforms important tools for daily communication. - **Threat of spam**: Spammers use social media to spread illegal advertisements, fake news, political propaganda, etc., causing negative impacts on society. - **Limitations of existing detection methods**: Traditional spam detection methods (such as blacklists and honeypot techniques) have issues with timeliness and adaptability, making it difficult to cope with the rapid changes in new types of spam. ### Research Objectives - **Propose a new detection framework**: This study proposes a hybrid method combining anomaly detection and misuse detection, aiming to improve the detection rate of new types of spam while maintaining a low false positive rate. - **Address new types of spam**: By modeling the characteristics of normal tweets and using anomaly detection techniques to identify tweets that deviate from the normal pattern, new types of spam can be detected. - **Improve detection performance**: By using Decision Trees (DT) to pre-detect known spam and then performing anomaly detection on unknown data, the overall detection performance is improved. ### Main Contributions 1. **Proposed a spam detection framework based on autoencoders and one-class Support Vector Machines (SVM)** to address new types of spam. 2. **Used Decision Trees (DT) to detect known spam**, enhancing the low detection rate of anomaly detection. 3. **Improved the detection rate of known spam and normal tweets** by performing customized anomaly detection on subsets of data not classified as spam. 4. **Proposed a scalable spam detection framework** that can focus on detecting known or unknown spam based on the current situation. ### Method Overview - **Known spam detection module**: Uses Decision Trees (DT) to detect known spam and groups undetected data into multiple subsets. - **Unknown spam detection module**: Applies anomaly detection algorithms trained only on normal models to each subset to distinguish between normal tweets and unknown spam. - **Scalable spam detection system**: By adjusting the misclassification cost in the Decision Tree, the system can focus on detecting either known or unknown spam. ### Conclusion The method proposed in this study performs excellently in detecting new types of spam while maintaining a low false positive rate, demonstrating high practicality and reliability.

Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection

Bio-Inspired Algorithm Based Undersampling Approach and Ensemble Learning for Twitter Spam Detection

Online learning for Social Spammer Detection on Twitter

Near Real-Time Twitter Spam Detection with Machine Learning Techniques

Follow Spam Detection based on Cascaded Social Information

Analyzing and Detecting Adversarial Spam on a Large-scale Online APP Review System.

Online Social Spammer Detection

An Unsupervised Framework for Anomaly Detection in a Water Treatment System

Multi-Objective Genetic Algorithm and CNN-Based Deep Learning Architectural Scheme for effective spam detection

A cascading framework for uncovering spammers in social networks

[SKF 93479, a newly developed histamine H2-receptor antagonist. / Effect on gastric potential difference in the presence and absence of acetylsalicylic acid].

Classification of spammer and nonspammer content in online social network using genetic algorithm-based feature selection

ENWalk: Learning Network Features for Spam Detection in Twitter

Markov-Driven Graph Convolutional Networksfor Social Spammer Detection

Graph Convolutional Networks with Markov Random Field Reasoning for Social Spammer Detection

Robust Spammer Detection in Microblogs

A hybrid classification method for Twitter spam detection based on differential evolution and random forest

Opinion Spam Detection: A New Approach Using Machine Learning and Network-Based Algorithms

Deep Learning Framework for Cyber Threat Situational Awareness Based on Email and URL Data Analysis

Fake social media news and distorted campaign detection framework using sentiment analysis & machine learning

Social Spammer and Spam Message Co-Detection in Microblogging with Social Context Regularization.