Scalable Learning Framework for Detecting New Types of Twitter Spam with Misuse and Anomaly Detection

Jaeun Choi,Byunghwan Jeon,Chunmi Jeon
DOI: https://doi.org/10.3390/s24072263
IF: 3.9
2024-04-03
Sensors
Abstract:The growing popularity of social media has engendered the social problem of spam proliferation through this medium. New spam types that evade existing spam detection systems are being developed continually, necessitating corresponding countermeasures. This study proposes an anomaly detection-based framework to detect new Twitter spam, which works by modeling the characteristics of non-spam tweets and using anomaly detection to classify tweets deviating from this model as anomalies. However, because modeling varied non-spam tweets is challenging, the technique's spam detection and false positive (FP) rates are low and high, respectively. To overcome this shortcoming, anomaly detection is performed on known spam tweets pre-detected using a trained decision tree while modeling normal tweets. A one-class support vector machine and an autoencoder with high detection rates are used for anomaly detection. The proposed framework exhibits superior detection rates for unknown spam compared to conventional techniques, while maintaining equivalent or improved detection and FP rates for known spam. Furthermore, the framework can be adapted to changes in spam conditions by adjusting the costs of detection errors.
engineering, electrical & electronic,chemistry, analytical,instruments & instrumentation
What problem does this paper attempt to address?
The problem this paper attempts to address is: With the proliferation of social media, spam on these platforms has rapidly increased, especially with the emergence of new types of spam that can bypass existing detection systems. Therefore, new methods need to be developed to effectively detect these new types of spam. Specifically, this study proposes an anomaly detection-based framework for detecting new types of spam on Twitter. ### Background of the Paper - **Widespread use of social media**: With the popularization of the internet and smartphones, the number of social media users has significantly increased, making these platforms important tools for daily communication. - **Threat of spam**: Spammers use social media to spread illegal advertisements, fake news, political propaganda, etc., causing negative impacts on society. - **Limitations of existing detection methods**: Traditional spam detection methods (such as blacklists and honeypot techniques) have issues with timeliness and adaptability, making it difficult to cope with the rapid changes in new types of spam. ### Research Objectives - **Propose a new detection framework**: This study proposes a hybrid method combining anomaly detection and misuse detection, aiming to improve the detection rate of new types of spam while maintaining a low false positive rate. - **Address new types of spam**: By modeling the characteristics of normal tweets and using anomaly detection techniques to identify tweets that deviate from the normal pattern, new types of spam can be detected. - **Improve detection performance**: By using Decision Trees (DT) to pre-detect known spam and then performing anomaly detection on unknown data, the overall detection performance is improved. ### Main Contributions 1. **Proposed a spam detection framework based on autoencoders and one-class Support Vector Machines (SVM)** to address new types of spam. 2. **Used Decision Trees (DT) to detect known spam**, enhancing the low detection rate of anomaly detection. 3. **Improved the detection rate of known spam and normal tweets** by performing customized anomaly detection on subsets of data not classified as spam. 4. **Proposed a scalable spam detection framework** that can focus on detecting known or unknown spam based on the current situation. ### Method Overview - **Known spam detection module**: Uses Decision Trees (DT) to detect known spam and groups undetected data into multiple subsets. - **Unknown spam detection module**: Applies anomaly detection algorithms trained only on normal models to each subset to distinguish between normal tweets and unknown spam. - **Scalable spam detection system**: By adjusting the misclassification cost in the Decision Tree, the system can focus on detecting either known or unknown spam. ### Conclusion The method proposed in this study performs excellently in detecting new types of spam while maintaining a low false positive rate, demonstrating high practicality and reliability.