TCR Repertoire and Transcriptional Signatures of Circulating Tumour‐associated T Cells Facilitate Effective Non‐invasive Cancer Detection
Fansen Ji,Lin Chen,Zhizhuo Chen,Bin Luo,Yongwang Wang,Xun Lan
DOI: https://doi.org/10.1002/ctm2.853
IF: 8.554
2022-01-01
Clinical and Translational Medicine
Abstract:Clinical and Translational MedicineVolume 12, Issue 9 e853 LETTER TO THE EDITOROpen Access TCR repertoire and transcriptional signatures of circulating tumour-associated T cells facilitate effective non-invasive cancer detection Fansen Ji, Fansen Ji orcid.org/0000-0002-4220-0786 Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China School of Medicine, Tsinghua University, Beijing, ChinaSearch for more papers by this authorLin Chen, Lin Chen School of Medicine, Tsinghua University, Beijing, China General Surgery Department, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, ChinaSearch for more papers by this authorZhizhuo Chen, Zhizhuo Chen School of Life Science, Tsinghua University, Beijing, ChinaSearch for more papers by this authorBin Luo, Corresponding Author Bin Luo luobin@mail.tsinghua.edu.cn General Surgery Department, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this authorYongwang Wang, Corresponding Author Yongwang Wang wangyongwang81@126.com Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, Guilin, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this authorXun Lan, Corresponding Author Xun Lan xlan@tsinghua.edu.cn Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China School of Medicine, Tsinghua University, Beijing, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this author Fansen Ji, Fansen Ji orcid.org/0000-0002-4220-0786 Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China School of Medicine, Tsinghua University, Beijing, ChinaSearch for more papers by this authorLin Chen, Lin Chen School of Medicine, Tsinghua University, Beijing, China General Surgery Department, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, ChinaSearch for more papers by this authorZhizhuo Chen, Zhizhuo Chen School of Life Science, Tsinghua University, Beijing, ChinaSearch for more papers by this authorBin Luo, Corresponding Author Bin Luo luobin@mail.tsinghua.edu.cn General Surgery Department, Beijing Tsinghua Changgung Hospital, School of Clinical Medicine, Tsinghua University, Beijing, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this authorYongwang Wang, Corresponding Author Yongwang Wang wangyongwang81@126.com Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, Guilin, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this authorXun Lan, Corresponding Author Xun Lan xlan@tsinghua.edu.cn Tsinghua-Peking Center for Life Sciences, MOE Key Laboratory of Tsinghua University, Beijing, China School of Medicine, Tsinghua University, Beijing, China Correspondence Xun Lan, School of Medicine, Tsinghua University, Haidian District, Beijing 100084, China. Email: xlan@tsinghua.edu.cn Bin Luo, Beijing Tsinghua Changgung Hospital, Room 431, Building 3, 168 Litang Road, Changping District, Beijing 102218, China. Email: luobin@mail.tsinghua.edu.cn Yongwang Wang, Department of Anesthesiology, Affiliated Hospital of Guilin Medical University, No.15 Lequn Road, Xiufeng District, Guilin 541001, China. Email: wangyongwang81@126.comSearch for more papers by this author First published: 22 September 2022 https://doi.org/10.1002/ctm2.853Citations: 1 Fansen Ji and Lin Chen contributed equally to this study. AboutSectionsPDF ToolsRequest permissionExport citationAdd to favoritesTrack citation ShareShare Give accessShare full text accessShare full-text accessPlease review our Terms and Conditions of Use and check box below to share full-text version of article.I have read and accept the Wiley Online Library Terms and Conditions of UseShareable LinkUse the link below to share a full-text version of this article with your friends and colleagues. Learn more.Copy URL Share a linkShare onFacebookTwitterLinked InRedditWechat The concept of precision medicine in oncology has dramatically transformed the clinical application of tumour screening, which makes the malignancies more curable if diagnosed early. Traditional serological tumour biomarkers like α-fetoprotein, prostate-specific antigens, carcinoembryonic antigens, CA19–9 and CA125 have been widely investigated in clinic, but the specificity has not reached a satisfactory stage for population level.1-3 Novel technologies utilizing tumour-derived signals from blood non-invasively provide a new tumour diagnostic strategy called liquid biopsy over the past decades.3-5 Several peripheral biomarkers such as cell-free DNA (cfDNA)6-8 especially circulating tumour DNA (ctDNA),9 circulating tumour cells (CTCs),10, 11 circulating micro-RNAs,12, 13 tumour-derived exosomes14 and cancer cell metabolites15 achieved great progress and showed huge prospects in tumour screening. However, these methods are all derived from the modality of tumour and often need predefined panels or biomarkers for diagnosis, which may be non-specific and subjective due to the heterogeneous properties of tumour. The feasibility of using tumour-associated T cell response involved in tumour initiation and development, as a supplementary diagnosis choice has not been explored widely.16, 17 Until recently, tumour-infiltrated T lymphocytes (TILs) were considered to be beneficial tumour-specific T cells.18 But due to the complex interaction of different immune components mediated by chemokines or cytokines within the tumour microenvironment (TME), the majority of passively expanding TILs cannot recognize tumour-specific antigens (TSAs) and are thus believed to be bystander T cells.19-21 These bystander T cells may dilute tumour-specific signals and make the identification of TSAs-specific T cells challenging.22-24 Programmed cell death protein 1 (PD-1) is suggested to be a biomarker for tumour-specific CD8+ T cells both in TILs and in peripheral blood mononuclear cells (PBMCs),22, 25-27 but the efficacy needs to be further validated in practical applications.21, 28, 29 Tracking the general immunophenotype of T lymphocytes when they encounter antigens and enrichment of tumour-associated T cells over a pool of irrelevant signals during tumour development reflects the overall immune status of patients and offers opportunities for cancer prevention and therapy.30 Next generation sequencing (NGS)-based T cell receptor (TCR) repertoire quantification has provided methods for TSA recognition and now is extensively used in the identification of tumour-reactive T lymphocytes.19, 31-34 The past few years have witnessed a series of studies utilizing T or B-cell repertoire to pinpoint disease-associated signatures, and evidences have demonstrated the diagnostic potential of TCR repertoire in autoimmune diseases,35 infectious diseases36, 37 and even cancer.38-42 Sustained neoantigen stimulation during tumour cell development impels TCR to shift towards a tumour-specific distribution and to exhibit different amino acids motifs than those in healthy cells.38, 43 Under physiological conditions, naïve T cells maturing in the thymus will flow through peripheral blood or lymphatic vessels and migrate through high endothelial venules into secondary lymphoid organs where they encounter potential tumour antigens.44, 45 T cell trafficking and circulation theoretically enable these tumour-specific T cells to be detectable both in tumour sites and peripheral blood. T lymphocytes circulating among PBMCs that paired with TILs residing in tumour tissues have been suggested to be highly correlated with T cell-induced cytotoxicity and to indicate enrichment of tumour reactive signals.22, 46-51 Elucidating the connection between anti-tumour T cells in the periphery and those in the TME22, 25, 44, 45, 52 may provide clues to design novel approaches for non-invasive tumour screening. Assessing overlapping TCRs between PBMCs and TILs and considering them tumour-specific predictors will not only help us comprehensively study T cell circulation and migration but will alleviate the deficiencies caused by using the TIL population only, which is enriched with bystander T cells. In this study, we defined a group of circulating T lymphocytes in PBMCs that shared TCRs with TILs as tumour-associated T cells (TATs). Using the CDR3 sequences of TATs with those in healthy TCRs as input data, we trained a binary model to distinguish TATs from healthy clones. Applying this model on several independent clinical datasets, we acquired the number of TAT sequences in PBMCs for each individual. We then designed a TCR repertoire risk score (TRRS) as the number of TATs our model predicted in the PBMCs divided by the number of detected healthy TCRs from healthy individuals. We demonstrated that the TRRS separated tumour patients from healthy donors effectively. Next, we characterized the transcriptional signatures of TATs in the PBMC populations using multiple single cell RNA sequencing (scRNA-seq) coupled with TCR sequencing datasets and found that T cell activation pathway was significantly up-regulated in TATs. Combining the TCR repertoire and transcriptional signatures of TATs,53 we developed an integrated framework for non-invasive tumour screening using only PBMC samples. Furthermore, we performed bulk TCR and RNA sequencing of PBMC samples from 11 tumour patients and six healthy donors and validated the performance of this tumour screening strategy with these data and another independent cohort. Our study proves the principle of using TATs as an alternative non-invasive tumour screening biomarker and broadens the liquid biopsy application from the view of the immune landscape. T cells with identical TCR sequences are thought to be derived from a single naïve T cell, which migrates and circulates among different tissue types and may undergo a functional state transition upon antigen stimulation.54 Based on the TCR-sharing relationship of TILs and PBMCs, we first divided TCR clonotypes in PBMCs and TILs into four different compartments (Figure 1A). TCR clonotypes of PBMCs that are identical to those of TILs are called PBMCs_Shared compartment, while clonotypes of TILs that are identical to those of PBMCs are called TILs_Shared compartment. These two compartments have the same TCR clonotypes, but their tissue sources distinguish them. Due to different tissue environment, the frequency of each clonotype and the degree of clonality between the two compartments may differ. Therefore, we specifically named the T cells among PBMCs that share TCRs with TILs as circulating Tumor Associated T cells (cTATs). In contrast, the TCR clonotypes that are unique to TILs are in the TILs_only compartment, and the TCR clonotypes observed only in PBMCs are in the PBMCs_Only compartment. We believe that the PBMCs_Only compartment most likely represents naïve, effector or memory T cells in periphery that are not related to the tumour immune response. T cells in the TILs_Only compartment may largely represent tissue-resident T cells, which are not in the set of T lymphocytes prevalent in circulation. It should be noted that due to the technical limitation of TCR repertoire sequencing, both TILs_Only and PBMCs_Only compartments actually contain a proportion of overlapped clonotypes that cannot be detected sensitively at present. FIGURE 1Open in figure viewerPowerPoint T cell receptor repertoire (TCR) sharing relationship between the tumour microenvironment, and the periphery defines four different compartments. (A) Schematic overview of TCR sharing relationship between PBMCs and tumour-infiltrated T lymphocytes (TILs) and the definition of the four TCR compartments. (B) Length distribution of the CDR3 beta chains within the four different TCR compartments. (C) V gene usage of the CDR3 beta chains in different TCR compartments. (D) Proportion of shared TCRs among different tissue types (TILs or PBMCs) at both sequence level and clone level. (E–F) The indices of clonality and Gini coefficient among different TCR compartments and that from healthy donor PBMCs samples. (G–H) Correlation between the overall clonality/Gini coefficient and the proportion of shared TCRs in PBMCs and TILs This framework allowed us to use TCR sequences as molecular barcodes to track and analyse the function of TATs among TILs and PBMCs. We collected a series of TCR CDR3 β chain sequencing data of paired PBMCs and tumour tissues from the same patient (Table S1) and assigned each T cell clonotype to one of the four compartments using the aforementioned definitions. After removing non-functional TCRs, we found that, among the different compartments, most (91%) of the TCR CDR3 sequences were 12∼17 amino acids in length (Figure 1B). In addition, the CDR3 sequence length distribution in the TILs resembles that in the PBMCs, indicating that there is no CDR3 length difference between these cells in the two different tissue types. Next, because TCR beta chain variable (TCRBV) genes contributed the most diversity to CDR3 sequences, we analysed the TCRBV gene usage in the different compartments. We found that TRBV genes, such as TRBV06/14/15/20/25, are expressed more frequently in the shared compartments than in the non-shared compartments43 (Figure 1C), which suggests that the antigen specificities of TCRs may differ between these compartments. In the following analysis, subsets with CDR3 sequences of 12∼17 amino acids were analysed, and TCRs with excessively long or short CDR3 regions were removed. To estimate the degree of TCR sequence overlap in PBMCs and TILs, we calculated the relative proportion of TCRs shared by both TILs and PBMCs in each sample. We found that the proportion of TCRs in the TILs_Shared compartment (approximately 22.56%, 95% confidence interval (CI): 12.82%–37.51%) was significantly higher (p < 1e−4) than the proportion in the cTATs of the PBMC population (approximately 3.08%, 95% CI: .64%–5.15%, Figure 1D), which indicated that TILs show higher shared TCR enrichment than PBMCs possibly due to the close interaction of T lymphocytes with TSAs in the TME. This result is consistent both at the TCR clone and TCR sequence level (Figure 1D). The same analysis based on single-cell TCR sequencing (TCR-seq) data showed no significant differences (p > .05) in the proportion of shared TCRs at either the clone or sequence level, possibly due to the limited number of cells captured in single cell TCR-seq datasets, and many shared clones might be labelled as not shared (Figure S1A). TCR sequences with a high degree of similarity and clonal expansion are more likely to recognize TSAs effectively. We found the indices of clonality and Gini coefficient in the PBMCs_Shared compartment were higher (p < 1e−4) than those in PBMCs_Only compartment (Figure 1E,F). The same trend was also observed in the PBMC population when single cell TCR-seq data were analysed (Figure S1B). These results indicate that TATs are more likely to undergo clonal expansion and to represent functional tumour-reactive T cells. Adding a healthy donor cohort PBMC dataset36 as the control (see Methods), we found that the clonality and Gini coefficient of the healthy samples were lower (p < 1e−3) than those of the shared compartments and higher than those of the tissue-only compartments (Figure 1E,F), possibly due to the baseline immune activity that developed against common antigens in the surrounding environments, such as influenza virus or human cytomegalovirus (HCMV). Our results suggest T cell clones in the shared compartment are more likely to be tumour reactive and are different from those induced by non-tumour antigens commonly present in healthy individuals. Moreover, we found that the proportion of TATs in blood, which is observable only when tumour tissue is sequenced, was highly correlated with the overall T cell clonality and Gini coefficient of PBMCs, which were obtained non-invasively; however, such correlation was not observed in TILs (Figure 1G,H). These results highlight the potential of using the T cell clonality and Gini coefficient of PBMCs as indicators of cancer development. TATs among PBMCs are more likely to reflect the clonal expansion of T lymphocytes in periphery, and a higher degree of shared TCR clones among PBMCs may indicate that more neoantigen-specific T cells pre-exist in the PBMCs. It has been reported that a greater degree of PBMC-TIL TCR repertoire overlap indicates an improved immune response and is associated with better clinical outcome of immunotherapy.42, 47, 49, 55 We believe that this compartment largely represents tumour reactive T cells and may serve as a biomarker to distinguish blood samples of cancer patients from those of healthy individuals. In this study, we sought to build a deep learning binary classifier to predict tumour-reactive TCR sequences. To construct a training dataset for the model, we first downloaded a publicly available TCR sequencing data obtained from PBMC samples of healthy individuals36 as the control dataset and only used data from HCMV-negative individuals to exclude potential tumour-irrelevant immune signals. Two datasets of healthy cohorts were included in the analysis, and we named these sets Healthy351 and Healthy69 according to the number of samples after filtering. Since the Healthy351 included more healthy donors and TCRs (more than 30 million), we considered the TCRs in this cohort to be a healthy TCR pool and used these data to identify the TCR sequences that overlapped with those in the PBMCs from cancer patients. Then, we extracted TAT TCRs in PBMC samples from TIL-PBMC-paired TCR sequencing datasets described above and filtered TCRs that were also detected in Healthy351. We labelled the remaining TAT TCRs as positive samples and the TCRs in Healthy351 as negative samples. The schematic workflow and experimental design are summarized in Figure S2A. Deep convolutional neural networks (CNNs) generally performed better in TCR pattern recognition studies 56-58; therefore, we encoded the CDR3 beta chain using the one-hot encoding method and built a three-layer CNN to distinguish the TCRs of TATs from those of healthy individuals. The output of the CNN is the probability of each input TCR sequence being the TCR of a TAT. Next, we generated a TRRS for each PBMC sample summarizing the number of TATs our model had predicted in PBMC relative to the number of healthy TCRs that had been detected in the Healthy351 dataset. We evaluated the performance of the TRRS for non-invasive cancer detection with several independent PBMC datasets obtained from cancer patients using Healthy69 as negative samples. The detailed illustration of model construction is presented in the Methods. We first selected the same number of negative TCRs as that of TATs and used five-fold cross validation to test the generalization ability of our CNN model. Considering the heterogeneity of cancer patients, the data were split at the patient level rather than at the TCR sequence level to ensure that the model did not learn sample-specific confounding effects. Both the receiver operating characteristic curve (ROC) curve and precision-recall curve (PRC) (Figure 2A,B) showed the model performed modestly well in differentiating TCRs of TATs from TCRs of healthy samples (ROC: .699–.706, PRC: .446–.787) and are not influenced by human leukocyte antigen (HLA) haplotypes (Methods, Figure S2B,C). Because we randomly split patients into the training and test dataset and high variation in the number of TATs exists in different patients, the PRC shows high variability across the different iterations of random splits. The final model was trained and validated using the entire data, and as the number of training epochs increased to about 60, the loss and accuracy had reached a plateau (Figure 2C). Further inspection of the prediction probability distribution of different CDR3 length indicated a significant difference between the TCRs of TATs and those of healthy samples (Figure 2D). FIGURE 2Open in figure viewerPowerPoint Development of binary model to predict tumour-associated T cells (TATs) in the peripheral blood and using the T cell receptor repertoire (TCR) repertoire risk score (TRRS) to distinguish tumour patients from healthy individuals. (A–B) Model ROC and precision-recall curve (PRC) by five-fold cross validation. (C) The model loss/accuracy changes at different epochs. (D) The model prediction probability score for different CDR3 lengths among tumoural and non-tumoural TCRs. (E) The TCR repertoire risk score (TRRS) in PBMC samples among tumour patients from different cancer types, healthy individuals from the Healthy69 cohort and human cytomegalovirus (HCMV) positive samples. (F) The ROC plot of TRRS in differentiating tumour patients from healthy donors at a given threshold of .66. (G–H) The indices of clonality and Gini coefficient between TRRS-high group and TRRS-low group. (I–J) The amino acid motif between TAT probability high group (top panel) and low group (bottom panel) Then, to test whether the TAT prediction probability of our model can be used as a biomarker for differentiating tumour patients from healthy donors, we obtained seven independent datasets55, 59-64 containing PBMC TCR sequencing data of patients with different cancer types (Figure S2A). The Healthy69 cohort was used as the negative controls set. We used a TRRS (Methods) to estimate the degree of TAT enrichment in the PBMC population of each sample. Briefly, we counted the number of TATs our model has predicted and then divided it by the number of healthy TCRs in PBMCs that overlapped with the Healthy351 pool. At a threshold above .66, the TRRS can differentiate PBMC samples of normal individuals from those of patients with various cancer types effectively (Figure 2E,F), indicating the feasibility of using TRRS of TATs for non-invasive tumour screening. We found that our result is robust by setting the threshold at different levels (Figure S3A–F). To provide evidence that the prediction of our model was not simply a generic active cell-mediated immune response, we introduced an experimentally validated HCMV positive cohort with TCR-seq data from their PBMC samples (Table S1). We calculated the TRRS for these individuals and compared them with that of the cancer patients. We show that the TRRS of HCMV positive cohort is lower than that of the cancer patients, indicating our model is able to distinguish individuals with cancer from those with infectious diseases. Based on the TRRS, we divided the samples in the independent validation cohort into high-risk and low-risk group separated at the 50% quantile. We found that the clonality and Gini coefficient in the high-risk group were significantly higher (p < 1e−3) than those in the low-risk group (Figure 2G,H), implying that the TCRs in the high-risk group were associated with more clonal expansion and active immune functions. In addition, we analysed sequence motif enrichment in TCRs with the top 25% and bottom 25% probabilities of being a TAT (Figure 2I,J). We found that CDR3 sequences in the high probability group showed an enrichment of serine in the second position, while those in the low probability group tended to have an alanine in this position. Sending publicly available databases of virus/bacterial TCR sequences into the prediction model shows that CDR3 sequences among different lengths tended to have an alanine in the second position, which indicates that features specifically associated with the TCR of clonally expanded TATs are not enriched in virus/bacteria TCRs (Figure S3L). In summary, we used TCR sequences from TATs and healthy donors to build a binary predictive CNN model and designed a TRRS based on the model prediction for effective non-invasive cancer screening with PBMCs. Single-cell-RNA-multiplexed-TCR-sequencing (scRNA-TCR-seq) technology makes it possible to not only trace the TCR clone sharing relationship between tumour and paired PBMCs samples, but also quantify the transcriptomics patterns at the single cell level.43, 54, 65-69 We performed a comprehensive literature review and obtained 14 high-quality scRNA-TCR-seq datasets that met our criteria (Table S2). We found no significant difference (p > .05) in the CD4+/ CD8+ ratio between PBMCs from cancer patients and healthy donors (Figure 3A), implying that the relative ratio of CD4+ and CD8+ T cells remains unchanged after tumour initiation, in contrast to the ratio in patients with acute infection, which is usually lower than that in uninfected healthy samples.70-72 However, the proportion of clonal T cells (clone frequency > 2) was higher (p < 1e−3) in tumour patients than in healthy donors (Figure S4A), indicating a higher clonal expansion of T cells in cancer patients. Moreover, we found that the CD4+/ CD8+ ratio was significantly lower in TATs than in non-clonal T cells (clone frequency = 1) in patient PBMCs (Figure 3B), suggesting that the expansion of CD8+ T cell is greater than that of the CD4+ cells upon tumour antigen stimulation. FIGURE 3Open in figure viewerPowerPoint Transcriptional signature gene analysis using single cell data shows T cell activation is involved in the tumour-associated T cells (TATs). (A) CD4+/ CD8+ T cell relative ratio in PBMCs between tumour and healthy samples calculated by single cell data. (B) CD4+/ CD8+ T cell relative ratio between TATs and non-clonal T cells in PBMCs. (C) The proportion of TATs in tumour and healthy-specific clusters (in which tumour patients or healthy donors derived T cells occupy more than 70%, see Methods) by TCR functional landscape estimation supervised with scRNA-seq analysis (TESSA) clustering analysis. (D) T cell subtype distribution in different T cell receptor repertoire (TCR) compartments. (E) Heatmap shows differentially expressed genes between TATs and non-clonal T cells. (F) Gene Ontology (GO) analysis of TAT differentially expressed genes Next, to provide further evidence that TATs represent tumour-specific T cells among PBMCs, we performed clustering analysis utilizing both single-cell RNA and TCR information of T cells among tumour and healthy donor PBMCs by the TCR functional landscape estimation supervised with scRNA-seq analysis (TESSA)73 algorithm. We found that tumour-specific clusters had a higher proportion of TATs than normal specific clusters (p < 1e−4) in nearly all the 14 datasets (Figure 3C, Methods), indicating that TATs in the blood of patients with various types of cancer are tumour specific and dissimilar to T cells in the blood of healthy donors. To integrate datasets from various sources, we used a label transfer method74 by taking one clear cell renal cell carcinoma (ccRCC) dataset75 as the reference due to its detailed cell type annotation information. Then we projected the cells from other datasets onto the reference map to transfer the cell type annotation. We found that most T cells in the non-clonal group were CD4+ naïve/proliferating/effector T cells, while most TATs were CD8+ NK-like/effector T cells (Figure 3D). These results demonstrated that TATs are mostly activated CD8+ T cells and may exert cytotoxic functions upon tumour stimulation. To further explore transcriptional signatures of TATs among PBMCs, we performed differential gene expression analysis between CD8+ TATs and non-clonal T cells in each dataset. To prevent potential batch effects caused by using different data sources and guarantee a robust analysis, genes that were