Phenotype Classification using Proteome Data in a Data-Independent Acquisition Tensor Format
Fangfei Zhang,Shaoyang Yu,Lirong Wu,Zelin Zang,Xiao Yi,Jiang Zhu,Cong Lu,Ping Sun,Yaoting Sun,Sathiyamoorthy Selvarajan,Lirong Chen,Xiaodong Teng,Yongfu Zhao,Guangzhi Wang,Junhong Xiao,Shiang Huang,Oi Lian Kon,N Gopalakrishna Iyer,Stan Z Li,Zhongzhi Luan,Tiannan Guo
DOI: https://doi.org/10.1021/jasms.0c00254
2020-11-04
Abstract:A novel approach for phenotype prediction is developed for data-independent acquisition (DIA) mass spectrometric (MS) data without the need for peptide precursor identification using existing DIA software tools. The first step converts the DIA-MS data file into a new file format called DIA tensor (DIAT), which can be used for the convenient visualization of all the ions from peptide precursors and fragments. DIAT files can be fed directly into a deep neural network to predict phenotypes such as appearances of cats, dogs, and microscopic images. As a proof of principle, we applied this approach to 102 hepatocellular carcinoma samples and achieved an accuracy of 96.8% in distinguishing malignant from benign samples. We further applied a refined model to classify thyroid nodules. Deep learning based on 492 training samples achieved an accuracy of 91.7% in an independent cohort of 216 test samples. This approach surpassed the deep-learning model based on peptide and protein matrices generated by OpenSWATH. In summary, we present a new strategy for DIA data analysis based on a novel data format called DIAT, which enables facile two-dimensional visualization of DIA proteomics data. DIAT files can be directly used for deep learning for biological and clinical phenotype classification. Future research will interpret the deep-learning models emerged from DIAT analysis.