Optimizing in silico drug discovery: simulation of connected differential expression signatures and applications to benchmarking

Catalina Gonzalez Gomez,Manuel Rosa-Calatrava,Julien Fouret
DOI: https://doi.org/10.1093/bib/bbae299
2024-05-23
Abstract:Background: We present a novel simulation method for generating connected differential expression signatures. Traditional methods have struggled with the lack of reliable benchmarking data and biases in drug-disease pair labeling, limiting the rigorous benchmarking of connectivity-based approaches. Objective: Our aim is to develop a simulation method based on a statistical framework that allows for adjustable levels of parametrization, especially the connectivity, to generate a pair of interconnected differential signatures. This could help to address the issue of benchmarking data availability for connectivity-based drug repurposing approaches. Methods: We first detailed the simulation process and how it reflected real biological variability and the interconnectedness of gene expression signatures. Then, we generated several datasets to enable the evaluation of different existing algorithms that compare differential expression signatures, providing insights into their performance and limitations. Results: Our findings demonstrate the ability of our simulation to produce realistic data, as evidenced by correlation analyses and the log2 fold-change distribution of deregulated genes. Benchmarking reveals that methods like extreme cosine similarity and Pearson correlation outperform others in identifying connected signatures. Conclusion: Overall, our method provides a reliable tool for simulating differential expression signatures. The data simulated by our tool encompass a wide spectrum of possibilities to challenge and evaluate existing methods to estimate connectivity scores. This may represent a critical gap in connectivity-based drug repurposing research because reliable benchmarking data are essential for assessing and advancing in the development of new algorithms. The simulation tool is available as a R package (General Public License (GPL) license) at https://github.com/cgonzalez-gomez/cosimu.
What problem does this paper attempt to address?