Using Arabic Tweets to Understand Drug Selling Behaviors

Wesam Alruwaili,Bradley Protano,Tejasvi Sirigiriraju,Hamed Alhoori
DOI: https://doi.org/10.48550/arXiv.1911.01275
2019-10-26
Abstract:Twitter is a popular platform for e-commerce in the Arab region including the sale of illegal goods and services. Social media platforms present multiple opportunities to mine information about behaviors pertaining to both illicit and pharmaceutical drugs and likewise to legal prescription drugs sold without a prescription, i.e., illegally. Recognized as a public health risk, the sale and use of illegal drugs, counterfeit versions of legal drugs, and legal drugs sold without a prescription constitute a widespread problem that is reflected in and facilitated by social media. Twitter provides a crucial resource for monitoring legal and illegal drug sales in order to support the larger goal of finding ways to protect patient safety. We collected our dataset using Arabic keywords. We then categorized the data using four machine learning classifiers. Based on a comparison of the respective results, we assessed the accuracy of each classifier in predicting two important considerations in analysing the extent to which drugs are available on social media: references to drugs for sale and the legality/illegality of the drugs thus advertised. For predicting tweets selling drugs, Support Vector Machine, yielded the highest accuracy rate (96%), whereas for predicting the legality of the advertised drugs, the Naive Bayes, classifier yielded the highest accuracy rate (85%).
Computers and Society,Machine Learning,Social and Information Networks
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to understand illegal drug - selling behaviors by analyzing Arabic tweets. Specifically, researchers are concerned with how to use machine - learning techniques to identify from the data collected on Twitter which tweets are selling drugs and whether these drugs are legal. This involves two key research questions: 1. **How to determine whether the content of a tweet is related to drug - selling**: Researchers need to distinguish whether the drugs mentioned in the tweet are for sale, for providing health advice, or for joking. 2. **How to judge whether the drug - selling in a tweet is legal**: Researchers need to determine whether the drugs mentioned in the tweet are legal and whether legal drugs are illegally sold without prescriptions. To answer these questions, researchers used four machine - learning classifiers (Support Vector Machine, Decision Tree, Naive Bayes, and Random Forest) to classify the data and evaluated the accuracy of each classifier. Eventually, the Support Vector Machine performed best in predicting whether a tweet involves drug - selling, with an accuracy rate of 96%, while the Naive Bayes performed best in predicting the legality of drug - selling, with an accuracy rate of 85%.