MDDC: An R and Python Package for Adverse Event Identification in Pharmacovigilance Data

Anran Liu,Raktim Mukhopadhyay,Marianthi Markatou
DOI: https://doi.org/10.48550/arXiv.2410.01168
2024-10-02
Abstract:The safety of medical products continues to be a significant health concern worldwide. Spontaneous reporting systems (SRS) and pharmacovigilance databases are essential tools for postmarketing surveillance of medical products. Various SRS are employed globally, such as the Food and Drug Administration Adverse Event Reporting System (FAERS), EudraVigilance, and VigiBase. In the pharmacovigilance literature, numerous methods have been proposed to assess product - adverse event pairs for potential signals. In this paper, we introduce an R and Python package that implements a novel pattern discovery method for postmarketing adverse event identification, named Modified Detecting Deviating Cells (MDDC). The package also includes a data generation function that considers adverse events as groups, as well as additional utility functions. We illustrate the usage of the package through the analysis of real datasets derived from the FAERS database.
Computation,Methodology
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is to develop a new method that can effectively identify adverse drug events (Adverse Events, AE). Specifically, the paper introduces an algorithm named Modified Detecting Deviating Cells (MDDC), which is implemented as software packages in R and Python languages and is used to identify potential adverse event signals in post - marketing drug safety monitoring. ### Background and Problems of the Paper 1. **Safety of Medical Products**: Adverse events of medical products (such as drugs, therapeutic biological products or medical devices) are a serious health problem, which may lead to hospitalization or even death. 2. **Limitations of Clinical Trials**: Although clinical trials provide important safety information, the sample size is small and the duration is short, so it is difficult to detect rare but serious adverse events. 3. **Importance of Post - marketing Monitoring**: Therefore, continuous monitoring after the marketing of medical products is crucial for long - term safety monitoring. 4. **Existing Adverse Event Identification Methods**: Currently, spontaneous reporting systems (Spontaneous Reporting Systems, SRS) and pharmacovigilance databases are the main post - marketing monitoring tools. There are already multiple methods for analyzing SRS data, such as proportional reporting ratios (Proportional Reporting Ratios, PRR), reporting odds ratios (Reporting Odds Ratios, ROR), etc. 5. **Deficiencies of Pattern Discovery Methods**: Although there are multiple methods, pattern discovery methods with statistical performance guarantees in pharmacovigilance have not been fully discussed. ### Characteristics of the MDDC Algorithm 1. **Simple Computation**: The MDDC algorithm is easy to calculate. 2. **Consideration of the Relationship between Adverse Events**: The algorithm takes into account the correlation between adverse events. 3. **Data - driven Threshold**: The algorithm uses a data - driven threshold to identify outliers. 4. **Independence from Ontology**: The algorithm does not depend on a specific ontology. ### Algorithm Steps 1. **Calculation of Standardized Pearson Residuals**: \[ e_{ij}=\frac{n_{ij}-\left(\frac{n_{i\cdot}n_{\cdot j}}{n_{\cdot\cdot}}\right)}{\sqrt{\left(\frac{n_{i\cdot}n_{\cdot j}}{n_{\cdot\cdot}}\right)\left(1 - \frac{n_{i\cdot}}{n_{\cdot\cdot}}\right)\left(1 - \frac{n_{\cdot j}}{n_{\cdot\cdot}}\right)}} \] 2. **Separation of Cells**: Divide the cells in the table into non - zero cells and zero cells, and calculate the thresholds respectively. 3. **Calculation of Correlation**: Calculate the Pearson correlation coefficient for each pair of adverse event rows. 4. **Calculation of Predicted Values**: Calculate the predicted value for each cell based on the connected adverse events. 5. **Calculation of Residuals and p - values**: Calculate the difference between the standardized residuals and the predicted values, calculate the p - value through the standard normal distribution, and finally perform Benjamini - Hochberg adjustment to control the false discovery rate. ### Data Generation Function The paper also introduces a data generation function for simulating pharmacovigilance data sets that contain adverse event correlations. This function utilizes standardized Pearson residuals and allows adverse events to be regarded as clusters, embedding the correlation of adverse events. ### Package Structure - **R Package**: Provides the `mddc_boxplot` and `mddc_mc` functions, which implement the box - plot method and the Monte Carlo method respectively to determine the thresholds. - **Python Package**: Provides similar functions and supports multi - threaded computing to improve efficiency. ### Usage Example The paper shows how to use the MDDC package for adverse event identification through a data set of β - blockers (extracted from the FAERS database). Specific steps include loading data, checking the input table, running the MDDC algorithm and interpreting the results. In conclusion