Lightning-fast adaptive immune receptor similarity search by symmetric deletion lookup

Touchchai Chotisorayuth,Andreas Tiffeau-Mayer
2024-03-14
Abstract:An individual's adaptive immune receptor (AIR) repertoire records immune history due to the exquisite antigen specificity of AIRs. Reading this record requires computational approaches for inferring receptor function from sequence, as the diversity of possible receptor-antigen pairs vastly outstrips experimental knowledge. Identification of AIRs with similar sequence and thus putatively similar function is a common performance bottleneck in these approaches. Here, we benchmark the time complexity of five different algorithmic approaches to radius-based search for Levenshtein neighbors. We show that a symmetric deletion lookup approach, originally proposed for spell-checking, is particularly scalable. We then introduce XTNeighbor, a variant of this algorithm that can be massively parallelized on GPUs. For one million input sequences, XTNeighbor identifies all sequence neighbors that differ by up to two edits in seconds on commodity hardware, orders of magnitude faster than existing approaches. We also demonstrate how symmetric deletion lookup can speed up search with more complex sequence-similarity metrics such as TCRdist. Our contribution is poised to greatly speed up existing analysis pipelines and enable processing of large-scale immunosequencing data without downsampling.
Quantitative Methods,Genomics
What problem does this paper attempt to address?