TCRklass: A New K-String–Based Algorithm for Human and Mouse TCR Repertoire Characterization

Xi Yang,Di Liu,Na Lv,Fangqing Zhao,Fei Liu,Jing Zou,Xue Xiao,Jun Wu,Peipei Liu,Jing Gao,Yongfei Hu,Yi Shi,Jun Liu,Ruifen Zhang,Chen,Juncai Ma,George F. Gao,Baoli Zhu
DOI: https://doi.org/10.4049/jimmunol.1400711
2015-01-01
Abstract:The next-generation sequencing technology has promoted the study on human TCR repertoire, which is essential for the adaptive immunity. To decipher the complexity of TCR repertoire, we developed an integrated pipeline, TCRklass, using K-string-based algorithm that has significantly improved the accuracy and performance over existing tools. We tested TCRklass using manually curated short read datasets in comparison with in silico datasets; it showed higher precision and recall rates on CDR3 identification. We applied TCRklass on large datasets of two human and three mouse TCR repertoires; it demonstrated higher reliability on CDR3 identification and much less biased V/J profiling, which are the two components contributing the diversity of the repertoire. Because of the sequencing cost, short paired-end reads generated by next-generation sequencing technology are and will remain the main source of data, and we believe that the TCRklass is a useful and reliable toolkit for TCR repertoire analysis.
What problem does this paper attempt to address?