Similarity bias from consensus perturbational signatures from the L1000 Connectivity Map

Ian Smith,Katy Scott,Benjamin Haibe-Kains
DOI: https://doi.org/10.1101/2022.01.24.477615
2024-01-07
Abstract:In recent years, high-throughput perturbational datasets have become an important tool for rapidly characterizing the function of large collections of chemical compounds. To overcome the biological and technical noise in these experiments, researchers have used consensus signatures - averages of multiple experiments - to summarize the effects of perturbations. In this work, we demonstrate that consensus signatures on the L1000 Connectivity Map show a pervasive similarity bias: as more signatures are averaged, the resulting consensus signatures are increasingly similar to each other, regardless of whether the signatures are related. We show that the distribution of Pearson’s correlation changes as a function of the number of signatures averaged. The artifactual similarity bias is caused by skewness in the data and a consequence of using median normalization on non-normal distributions. Furthermore, we show that mean normalization can partly remedy this similarity bias and improve power to identify associations. The similarity bias introduced by consensus signatures is an important potential confounder of analysis of perturbational datasets, and our practical solution could easily be applied by practitioners in the field to improve the analysis of the L1000 Connectivity Map.
Bioinformatics
What problem does this paper attempt to address?