Bi-clustering interpretation and prediction of correlation between gene expression and protein abundance

Xiaojun Wang,Lin Teng,Haicang Zhang,Qian Zhou,Xiaoquan Su,Xinping Cui,Dongbu Bu,Xinqi Gong,Ansgar Poetsch,Kang Ning
DOI: https://doi.org/10.1101/270397
2018-01-01
bioRxiv
Abstract:Most organisms9 transcript and protein level only moderately correlate for various reasons, such as regulation of transcription and protein degradation. Better prediction and understanding the correlation between gene expression and protein abundance has been possible by harnessing the matching RNA/protein datasets produced by modern high-throughput RNA-Seq and mass spectrometry methods. In this work, we have utilized some well-studied matching RNA/protein datasets, and explored for the first time a bi-clustering method to cluster genes that have consistent correlation patterns between gene expression and protein abundance. The clustering results have been interpreted from the perspective of both transcriptomic and proteomic features, which show that mRNA half-life, protein half-life and protein structure in concert significantly affect the correlation of gene expression and protein abundance. With these and other carefully selected features, a prediction model based on individual clusters, called Cluster-based Linear prediction Model (CLM), was built and tested on mouse liver mitochondrial, mouse brainstem mitochondrial, Saccharomyces cerevisiae and Danio rerio datasets. CLM could find genes for which protein abundance can be predicted from mRNA data. In summary, based on bi-clustering, feature selection and CLM model, we have established a new and valuable cluster-based protein abundance prediction method.
What problem does this paper attempt to address?