Large-scale GPU-based network analysis of the human T-cell receptor repertoire

Paul Richter
DOI: https://doi.org/10.48550/arXiv.2112.06613
2021-12-13
Abstract:Understanding the structure of the human T-cell receptor repertoire is a crucial precondition to understand the ability of the immune system to recognize and respond to antigens. T-cells are often compared via the complementarity determining region 3 (CDR3) of their respective T-cell receptor beta chains. Nevertheless, previous studies often simply compared if CDR3beta sequences were equal, while network theory studies were usually limited to several thousand sequences due to the high computational effort of constructing the network. To overcome that hurdle, we introduce the GPU-based algorithm TCR-NET to construct large-scale CDR3beta similarity networks using model-generated and empirical sequence data with up to 800,000 CDR3beta sequences on a normal computer for the first time. Using network analysis methods we study the structural properties of these networks and conclude that (i) the fraction of public TCRs depends on the size of the TCR repertoire, along with the exact (not unified) definition of "public" sequences, (ii) the TCR network is assortative with the average neighbor degree being proportional to the squareroot of the degree of a node and (iii) the repertoire is robust against losses of TCRs. Moreover, we analyze the networks of antigen-specific TCRs for different antigen families and find differing clustering coefficients and assortativities. TCR-NET offers better access to assess large-scale TCR repertoire networks, opening the possibility to quantify their structure and quantitatively distinguish their ability to react to antigens, which we anticipate to become a useful tool in a time of increasingly large amounts of repertoire sequencing data becoming available.
Molecular Networks,Biological Physics
What problem does this paper attempt to address?