Abstract:High-throughput sequencing of B- and T-cell receptors makes it possible to track immune repertoires across time, in different tissues, and in acute and chronic diseases or in healthy individuals. However, quantitative comparison between repertoires is confounded by variability in the read count of each receptor clonotype due to sampling, library preparation, and expression noise. Here, we present a general Bayesian approach to disentangle repertoire variations from these stochastic effects. Using replicate experiments, we first show how to learn the natural variability of read counts by inferring the distributions of clone sizes as well as an explicit noise model relating true frequencies of clones to their read count. We then use that null model as a baseline to infer a model of clonal expansion from two repertoire time points taken before and after an immune challenge. Applying our approach to yellow fever vaccination as a model of acute infection in humans, we identify candidate clones participating in the response.High-throughput immune repertoire sequencing (RepSeq) experiments are becoming a common way to study the diversity, structure and composition of lymphocyte repertoires, promising to yield unique insight into individuals' past infection history. However, the analysis of these sequences remains challenging, especially when comparing two different temporal or tissue samples. Here we develop a new theoretical approach and methodology to extract the characteristics of the lymphocyte repertoire response from different samples. The method is specifically tailored to RepSeq experiments and accounts for the multiple sources of noise present in these experiments. Its output provides expansion parameters, as well as a list of potentially responding clonotypes. We apply the method to describe the response to yellow fever vaccine obtained from samples taken at different time points. We also use our results to estimate the diversity and clone size statistics from data.

Anchor Clustering for million-scale immune repertoire sequencing data

Hierarchical Clustering Can Identify B Cell Clones with High Confidence in Ig Repertoire Sequencing Data

Comparison of sequence- and structure-based antibody clustering approaches on simulated repertoire sequencing data

Computational Strategies for Dissecting the High-Dimensional Complexity of Adaptive Immune Repertoires

Manifold distance based artificial immune incremental data clustering algorithm

Inferring the immune response from repertoire sequencing

Scaling Monte-Carlo-Based Inference on Antibody and TCR Repertoires

Interpolative Multidimensional Scaling Techniques for the Identification of Clusters in Very Large Sequence Sets

Evaluating methods for B-cell clonal family assignment

Lightning-fast adaptive immune receptor similarity search by symmetric deletion lookup

Combining mutation and recombination statistics to infer clonal families in antibody repertoires

Current Status and Recent Advances of Next Generation Sequencing Techniques in Immunological Repertoire

Somatic hypermutation analysis for improved identification of B cell clonal families from next-generation sequencing data

ImmCluster: an ensemble resource for immunology cell type clustering and annotations in normal and cancerous tissues

Benchmarking antibody clustering methods using sequence, structural, and machine learning similarity measures for antibody discovery applications

Towards a Mathematical Foundation of Immunology and Amino Acid Chains

Dace: A Scalable Dp-Means Algorithm for Clustering Extremely Large Sequence Data

Comparison of Methods for Biological Sequence Clustering.

SAFE-clustering: Single-cell Aggregated (from Ensemble) Clustering for Single-Cell RNA-seq Data

Exploring the impact of clonal definition on B-cell diversity: implications for the analysis of immune repertoires

scAGCI: an anchor graph-based method for cell clustering from integrated scRNA-seq and scATAC-seq data