SpotSpam: Intention Analysis–driven SMS Spam Detection Using BERT Embeddings

C. Oswald,Sona Elza Simon,Arnab Bhattacharya
DOI: https://doi.org/10.1145/3538491
IF: 3.35
2022-09-19
ACM Transactions on the Web
Abstract:Short Message Service (SMS) is one of the widely used mobile applications for global communication for personal and business purposes. Its widespread use for customer interaction, business updates, and reminders has made it a billion-dollar industry in “Text Marketing.” Along with valid SMS, a tsunami of spam messages also pop up that serve various purposes for the sender and the majority of them are fraudulent. Filtering spam SMS in an accurate manner is a crucial and challenging task that will benefit human lives both mentally and economically. Some of the challenges in the filtering of spam SMS include less number of characters, texts in informal languages, lack of public SMS spam corpus, and so on. Focusing solely on the textual features of the SMS is a major handicap of the existing methods, as it lacks in dynamically adapting to the increasing number of new keywords and jargon. In this article, we develop an intention-based approach of SMS spam filtering that efficiently handles dynamic keywords by focusing on the semantics of the words. We capture both semantic and textual features of the short-text messages based on 13 pre-defined intention labels. Moreover, the contextual embeddings of the texts are generated using various pre-trained NLP (Natural Language Processing) models. Finally, intention scores are computed for the pre-defined labels and a bunch of supervised learning classifiers are employed for filtering as spam or ham. Our approaches are evaluated on the SMS Spam Collection [
computer science, information systems, software engineering
What problem does this paper attempt to address?