An Image is Worth a Thousand Toxic Words: A Metamorphic Testing Framework for Content Moderation Software

Wenxuan Wang,Jingyuan Huang,Jen-tse Huang,Chang Chen,Jiazhen Gu,Pinjia He,Michael R. Lyu
2023-08-19
Abstract:The exponential growth of social media platforms has brought about a revolution in communication and content dissemination in human society. Nevertheless, these platforms are being increasingly misused to spread toxic content, including hate speech, malicious advertising, and pornography, leading to severe negative consequences such as harm to teenagers' mental health. Despite tremendous efforts in developing and deploying textual and image content moderation methods, malicious users can evade moderation by embedding texts into images, such as screenshots of the text, usually with some interference. We find that modern content moderation software's performance against such malicious inputs remains underexplored. In this work, we propose OASIS, a metamorphic testing framework for content moderation software. OASIS employs 21 transform rules summarized from our pilot study on 5,000 real-world toxic contents collected from 4 popular social media applications, including Twitter, Instagram, Sina Weibo, and Baidu Tieba. Given toxic textual contents, OASIS can generate image test cases, which preserve the toxicity yet are likely to bypass moderation. In the evaluation, we employ OASIS to test five commercial textual content moderation software from famous companies (i.e., Google Cloud, Microsoft Azure, Baidu Cloud, Alibaba Cloud and Tencent Cloud), as well as a state-of-the-art moderation research model. The results show that OASIS achieves up to 100% error finding rates. Moreover, through retraining the models with the test cases generated by OASIS, the robustness of the moderation model can be improved without performance degradation.
Software Engineering,Artificial Intelligence,Computation and Language,Computer Vision and Pattern Recognition
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to address the issue of toxic content dissemination on social platforms. With the rapid development of social media platforms, these platforms are increasingly used to spread harmful content, including hate speech, malicious advertisements, and pornographic content, which have a serious negative impact on society, especially on the mental health of teenagers. Although many text and image content moderation methods have been developed and deployed, malicious users can still evade moderation by embedding text into images (such as screenshots with interference). The paper points out that the performance of modern content moderation software in the face of such malicious inputs has not been fully studied. To this end, the authors propose a metamorphic testing framework called OASIS, specifically designed to verify the performance of content moderation software in handling images containing toxic content. OASIS generates image test cases that retain toxicity but may bypass moderation by summarizing 21 transformation rules extracted from 5,000 real-world toxic content instances. ### Main Contributions 1. **Preliminary Study**: Conducted an empirical study on 5,000 image messages and summarized 21 transformation rules. 2. **OASIS Framework**: Proposed OASIS, the first comprehensive testing framework for toxic text content disseminated through images, including 21 metamorphic relations, and supporting both English and Chinese. 3. **Evaluation and Improvement**: Evaluated five commercial content moderation software and a state-of-the-art academic model using OASIS, showing that OASIS can achieve a 100% error detection rate. Additionally, retraining models with test cases generated by OASIS can significantly improve model robustness while maintaining accuracy on the original test set. ### Background and Methods - **Metamorphic Testing**: Metamorphic testing is a testing technique that detects deviations by identifying metamorphic relations (MRs) between different software runs. In the verification of AI software, metamorphic testing is widely used to automatically identify and report errors. - **Content Moderation Software**: Commercial content moderation software typically employs hybrid classification algorithms, combining neural network models and predefined rules to leverage the strengths of both. ### Experimental Setup and Results - **Datasets**: Various datasets were used as seed data to verify the effectiveness of OASIS, including hate speech, malicious advertisements, and pornographic content. - **Experimental Results**: - **RQ1**: Test cases generated by OASIS are toxic and realistic. - **RQ2**: OASIS can detect erroneous outputs returned by content moderation software. - **RQ3**: Using test cases generated by OASIS can improve the performance of content moderation models. ### Conclusion The OASIS framework provides an effective tool for verifying and improving content moderation software, particularly in handling toxic content disseminated through images. This approach can significantly enhance the robustness and accuracy of content moderation systems, thereby better protecting users from harmful content.