TCR-TRANSLATE: Conditional Generation of Real Antigen Specific T-cell Receptor Sequences

Dhuvarakesh Karthikeyan,Colin Raffel,Benjamin Vincent,Alexander Rubinsteyn
DOI: https://doi.org/10.1101/2024.11.11.623124
2024-11-12
Abstract:The paradoxical nature of T-cell receptor (TCR) specificity, which requires both precise recognition and adequate coverage of antigenic peptide-MHCs (pMHCs), poses a fundamental challenge in immunology. Efforts at modeling this complex many-to-many mapping have been greatly impeded by a severe lack of experimental data. To address this, we present TCR-TRANSLATE, a novel framework that adapts low-resource machine translation techniques to the TCR:pMHC specificity domain. Here, we explore sequence-to-sequence (seq2seq) modeling with various training strategies, including semi-synthetic data augmentation and multi-task objectives to generate antigen specific TCR sequences for a given target of interest. We benchmark twelve model variants derived from the BART and T5 model architectures on a target-rich validation set of well-studied pMHCs, finding an optimal model, TCRT5, that generated validated antigen-specific CDR3β sequences for previously unseen antigens. While current limitations include a narrow validation set and a focus on the CDR3β loop, our approach demonstrates the potential of seq2seq models in rapidly generating antigen-specific TCR repertoires, offering a promising avenue for increasing throughput in precision immunotherapies. Our findings highlight both the capabilities and limitations of sequence-based conditional TCR design, emphasizing the need for experimental validation to bridge the gaps between predictions, metrics, and functional capacity.
Synthetic Biology
What problem does this paper attempt to address?