StatsPro: Systematic Integration and Evaluation of Statistical Approaches for Detecting Differential Expression in Label-Free Quantitative Proteomics.

Yin Yang,Jingqiu Cheng,Shisheng Wang,Hao Yang
DOI: https://doi.org/10.1016/j.jprot.2021.104386
IF: 3.855
2022-01-01
Journal of Proteomics
Abstract:Quantitative label-free mass spectrometry (MS) is an increasingly powerful technology for profiling thousands of proteins from complex biological samples. One of the primary goals of analyses performed on such proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Many statistical methods have been developed and assessed for DEP detection in various proteomics studies. However, it remains a challenge for many proteomics scientists to choose an appropriate statistical procedure. Therefore, in this study, we organized 12 common testing algorithms and 6 P-value combination methods and further provided Cohen's d effect size for every protein and three evaluation criteria to help proteomics scientists investigate their influence on DEP detection in a systematic manner. To promote the widespread use of these methods, we developed a user-friendly web tool, StatsPro, and presented two case studies involving label-free quantitative proteomics data obtained using data-dependent acquisition and data-independent acquisition to illustrate its practicability. This tool is freely available in our GitHub repository (https://github.com/YanglabWCH/StatsPro/). SIGNIFICANCE: One of the primary goals of analyses performed on liquid chromatography-mass spectrometry (LC-MS) based proteomics data is to detect differentially expressed proteins (DEPs) under different experimental conditions. Despite of many research efforts have been proposed to detect DEPs, to date, there is a scarcity of efficient, systematic, and easy-to-handle tools that are tailored for proteomics scientists to choose an appropriate statistical procedure. Herein, we present a new tool, StatsPro, to enable implementation and evaluation of different statistical methods for proteomics scientists. This tool has two significant advances compared to existing software: a) It integrates up to 18 common statistical approaches (12 statistical tests and 6 P-value combination strategies) and performs Cohen's d effect size systematically for users, moreover, it provides a web-based interface and can be quite conveniently operated by users, even those with less profound computational background. b) It supports three performance evaluation criteria (e.g. number of DEPs, correlation coefficient between P-values and effect sizes, Area under the ROC curve) for users to review the final statistical results, which may guide the method selection for DEPs detection.
What problem does this paper attempt to address?