Abstract:Background: Nowadays, social media are often being used by general public to create and share public messages related to their health. With the global increase in social media usage, there is a trend of posting information related to adverse drug reactions (ADR). Mining the social media data for this type of information will be helpful for pharmacological post-marketing surveillance and monitoring. Although the concept of using social media to facilitate pharmacovigilance is convincing, construction of automatic ADR detection systems remains a challenge because the corpora compiled from social media tend to be highly imbalanced, posing a major obstacle to the development of classifiers with reliable performance. Methods: Several methods have been proposed to address the challenge of imbalanced corpora. However, we are not aware of any studies that investigated the effectiveness of the strategies of dealing with the problem of imbalanced data in the context of ADR detection from social media. In light of this, we evaluated a variety of imbalanced techniques and proposed a novel word embedding-based synthetic minority over-sampling technique (WESMOTE), which synthesizes new training examples from the sentence representation based on word embeddings. We compared the performance of all methods on two large imbalanced datasets released for the purpose of detecting ADR posts. Results: In comparison with the state-of-the-art approaches, the classifiers that incorporated imbalanced classification techniques achieved comparable or better F-scores. All of our best performing configurations combined random under-sampling with techniques including the proposed WESMOTE, boosting and ensemble, implying that an integration of these approaches with under-sampling provides a reliable solution for large imbalanced social media datasets. Furthermore, ensemble-based methods like vote-based under-sampling (VUE) and random under-sampling boosting can be alternatives for the hybrid synthetic methods because both methods increase the diversity of the created weak classifiers, leading to better recall and overall F-scores for the minority classes. Conclusions: Data collected from the social media are usually very large and highly imbalanced. In order to maximize the performance of a classifier trained on such data, applications of imbalanced strategies are required. We considered several practical methods for handling imbalanced Twitter data along with their performance on the binary classification task with respect to ADRs. In conclusion, the following practical insights are gained: 1) When dealing with text classification, the proposed word embedding-based synthetic minority over-sampling technique is more effective than traditional synthetic-based over-sampling methods. 2) In cases where large amounts of training data are available, the imbalanced strategies combined with under-sampling techniques are preferred. 3) Finally, employment of advanced methods does not guarantee better performance than simpler ones such as VUE, which achieved high performance with advantages like faster building time and ease of development.

Explainable detection of adverse drug reaction with imbalanced data distribution

Classifying adverse drug reactions from imbalanced twitter data

Study of Serious Adverse Drug Reactions Using FDA-approved Drug Labeling and MedDRA

An Effective Emotional Expression and Knowledge-Enhanced Method for Detecting Adverse Drug Reactions

An Attentive Neural Sequence Labeling Model for Adverse Drug Reactions Mentions Extraction.

Developing A Deep Learning Natural Language Processing Algorithm For Automated Reporting Of Adverse Drug Reactions

Adversarial Transfer Network with Bilinear Attention for the Detection of Adverse Drug Reactions from Social Media.

Predicting Adverse Drug Reactions from Social Media Posts: Data Balance, Feature Selection and Deep Learning

Identifying Adverse Drug Reaction-Related Text from Social Media: A Multi-View Active Learning Approach with Various Document Representations.

Adverse drug reaction detection via a multihop self-attention mechanism

Automatic assessment of adverse drug reaction reports with interactive visual exploration

Filtering Big Data from Social Media - Building an Early Warning System for Adverse Drug Reactions

Recognizing Continuous and Discontinuous Adverse Drug Reaction Mentions from Social Media Using LSTM-CRF

Adverse Drug Event Detection Using A Weakly Supervised Convolutional Neural Network And Recurrent Neural Network Model

BiMPADR: A Deep Learning Framework for Predicting Adverse Drug Reactions in New Drugs

Identifying adverse drug reaction entities from social media with adversarial transfer learning model

Adversarial Neural Network with Sentiment-Aware Attention for Detecting Adverse Drug Reactions

ADRNet: A Generalized Collaborative Filtering Framework Combining Clinical and Non-Clinical Data for Adverse Drug Reaction Prediction

Developing large language models to detect adverse drug events in posts on x

Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features

Detecting Potential Adverse Drug Reactions from Health-Related Social Networks