A Comprehensive Platelet Expression Atlas Resource and Platelet Transcriptome Landscape
Xie Gui-Yan,Liu Chun-Jie,Miao Ya-Ru,Xia Mengxuan,Zhang Qiong,An-Yuan Guo
DOI: https://doi.org/10.1002/ajh.26393
IF: 13.265
2022-01-01
American Journal of Hematology
Abstract:Platelets are small circulating anucleate cells and play key roles in hemostasis, wound healing, and disease progression.1 RNA profiles in platelets could be altered when platelets communicate with other cell types or circulate in blood.2 Platelets could be involved in tumor development and metastasis,3 and RNA expression signatures in tumor educated platelets are capable of detecting tumors.4, 5 Only few databases provide platelet data such as PlateletWeb for platelet protein signaling networks, human platelet antigen (HPA, https://www.versiti.org/hpa) for human platelet antigens and PLATELETOMICS for platelet RNA and miRNA expressions in healthy subjects. However, a systematical study of platelet expression profiles in big data is still lacking, which requires high-quality platelet transcriptome dataset under different conditions. A comprehensive platelet expression database for diseases is not available and it will be very helpful for the research community. To systematically explore RNA expression profiles in platelets, we curated platelet expression datasets, including 1260 RNA-seq, 358 RNA microarray, 21 miRNA-seq, and 430 miRNA microarray datasets from 27 disease types and healthy controls from gene expression omnibus of national center for biotechnology information (NCBI GEO) and sequence read archive (SRA) databases. We obtained RNA-seq datasets of peripheral blood mononuclear cell (PBMC) and whole blood from National Genomics Data Center (ID: CRA001839) and GTEx (V7), respectively. All data were processed as described on the online document (http://bioinfo.life.hust.edu.cn/PEA/#!/document). Then, we removed the batch effect for combined RNA-seq expression data and classified them into four groups: (1) solid tumors: colorectal cancer (CRC), breast cancer (BRCA), pancreatic cancer (PC), hepatobiliary carcinoma (HCC), glioblastoma (GBM), nonsmall cell lung cancer (NSCLC), and low-grade glioma (LGG); (2) cardiovascular diseases (CAD): unstable angina pectoris (UA), ST-segment elevation myocardial infarction, pulmonary hypertension (PH), nonsignificant atherosclerosis (NSAth), and stable angina pectoris; (3) infections (Virus): human immunodeficiency virus (HIV), dengue, and influenza (H1N1); (4) others: chronic pancreatitis (CP), epilepsy, multiple sclerosis, and diabetes mellitus (DM). Reads distribution analysis for platelets revealed that an average of about 80% reads was mapped to exon (39.89%) and intron (39.18%) regions (Figure 1A). We found 12.07% reads were mapped to mitochondria DNA since intact mitochondria played a key role in platelet function and survival.6 Only 8.86% reads were mapped to intergenic regions. Next, we explored platelet expression in healthy samples and found that the number of expressed genes varied dramatically among studies (Figure 1B), which may be due to different treatment or platelet processing methods. The distribution of expressed genes with fragments per kilobase of transcripts per million (FPKM) > 3 in healthy platelet were similar (Figure 1B), thus we regarded genes with FPKM >3 as high-confident genes. As a result, about 4994 protein-coding and 2168 non-coding genes on average were expressed in platelets from healthy controls, while the total number of protein-coding and noncoding genes in platelets from diseases varied dramatically from 3069 to 13 678 (Figure 1C). As illustrated in Figure 1C, the numbers of genes expressed in platelets from solid tumors were less than in healthy controls except for LGG. However, compared with healthy control platelet, three CAD diseases (UA, NSAth, and PH) had higher number of protein-coding genes in platelets (p < .05). Since platelets are circulating in vessels, we explored the differences in gene expression profile among platelets, PBMC, and whole blood. We found that the number of genes (FPKM > 3) expressed in platelets, PBMC, and whole blood were 7162, 10 521, and 9336 on average, respectively. Among these genes, noncoding genes accounted for about 30.27%, 12.54%, and 12.18% in each source (Figure 1D). What's more, we filtered genes expressed in more than 40% samples with FPKM >3 and identified 4326 common genes (4168 and 158 for coding and noncoding genes) expressed in the three sources. When exploring top expression genes for the three sources (Figure 1E), we found they shared 27 genes in top 100 genes and 142 genes in top 500 genes. Intriguingly, 35 out of 37 mitochondria genes were in top 100 platelet genes and six of the top 10 highly expressed genes in platelets were from mitochondria (MT-RNR2, MT-RNR1, MT-ND1, MT-CO2, MT-ATP6, and MT-CO3) (Figure 1F). The other four genes were B2M, TMSB4X, FTH1, and PPBP, which were closely related to the platelet formation and function.5, 7 Meanwhile, the top 10 genes expressed in PBMC and whole blood were in both mitochondria-related genes and hemoglobin genes (e.g., HBA2 and HBA1) (Figure 1F). To investigate the function of genes in different diseases, we excluded platelet-related genes by filtering keyword “platelet” in Gene Ontology (GO) term and performed GO enrichment analysis for the remaining top 500 genes. As illustrated in Figure 1G, these genes in healthy samples and diseases were all enriched in protein targeting to membrane, protein localization to endoplasmic reticulum, and co-translational protein targeting to membrane. Intriguingly, HCC, BRCA, PC, CRC, DM, CP, and virus are lacking in ribosome biogenesis-related GO terms when compared with the healthy samples and other diseases. These suggested that RNA profiles are different in platelets under various conditions or diseases.8 Next, we identified differentially expressed genes (DEGs) between disease and healthy samples with cutoffs: fold-change >1.5, false discovery rate < 0.05, and FPKM >10 in the higher group. The numbers of DEGs varied greatly (from 8 to 3298) in different CAD diseases and no DEGs in infection diseases. Compared with healthy platelets, there were 1434, 1414, 1316, 1191, 1070, and 225 DEGs in CRC, BRCA, PC, HCC, GBM, and NSCLC, respectively, which demonstrated the difference of NSCLC from other cancers in platelets. To further explore the difference, we intersected DEGs and found 716 DEGs shared by five tumors (BRCA, CRC, GBM, PC, and HCC) and 143 unique DEGs in NSCLC (Figure 1H). Meanwhile, we found four commonly up-regulated genes (ITGA2B, DEFA1, DEFA3, and TLN1) in six tumors, which have been reported to be associated with the development of multiple cancers9 and 78 commonly down-regulated genes are mainly encoded for ribosomal proteins (Figure 1I). To explore the potential biomarkers in platelets for cancer diagnosis, we identified 11 specifically expressed genes (SEGs) in solid tumors by SEGtool10 (Figure 1J). We found that FKBP1A is specifically lowly expressed in NSCLC and LGG, and we also observed that FKBP1A is down regulated in lung and brain tumor compared with healthy controls in the cancer genome atlas (TCGA) data. In addition, the remaining seven platelet SEGs (GP1BB, PRR7, CYBA, NOP53, TYMP, STUB1, and ATP5D) from NSCLC and LGG, and three SEGs (DUSP1, DUSP2, and DDIT4) uniquely from LGG were reported to be involved in the process of tumor. To facilitate novel discoveries using these high-quality datasets of platelets, we organized these platelet expression data and analyzing results into the state-of-the-art platelet expression database named platelet expression atlas (PEA, http://bioinfo.life.hust.edu.cn/PEA) (Figure S1). PEA provides a comprehensive repertoire of gene and miRNA expression profiles and advanced analysis results of each dataset: (1) expression profiles of platelet in different diseases; (2) average expression of a specified gene (miRNA) in health and various diseases; (3) differential expression analysis, including functional enrichment of DEGs and protein–protein interaction network; and (4) platelet SEGs analysis. In conclusion, we first systematically investigated RNA expression profiles in platelets of different diseases and presented the expression landscape in platelets, PBMC, and whole blood. Meanwhile, we provided a comprehensive platelet expression database, which could be an infrastructure for platelet research community. In the future, several questions remain to be addressed, including long-term maintaining of PEA and platelet expression consistency upon big-data accumulation. The authors thank all participants contributing to the platelet expression data. This work was supported by the National Natural Science Foundation of China (nos. 31822030, 31801113, and 31771458), China Postdoctoral Science Foundation (nos. 2019M652623 and 2018M632830). The authors declare no competing financial interests. Gui-Yan Xie Assembled all transcriptome data; Gui-Yan Xie and Chun-Jie Liu designed database and data processing; Gui-Yan Xie and Ya-Ru Miao performed miRNA microarray analysis; Mengxuan Xia and Qiong Zhang provided bioinformatics support; An-Yuan Guo and Chun-Jie Liu led and guided the study; and all authors reviewed the manuscript. The data are publicly available at http://bioinfo.life.hust.edu.cn/PEA. The data are publicly available at http://bioinfo.life.hust.edu.cn/PEA. Figure S1. An overview of the data processing and the PEA database architecture. Data were carefully obtained, processed, and integrated into PEA. Four functional modules were offered in PEA database. Please note: The publisher is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.