A rigorous benchmarking of alignment-based HLA typing algorithms for RNA-seq data
Dottie Yu,Ram Ayyala,Sarah Hany Sadek,Likhitha Chittampalli,Hafsa Farooq,Junghyun Jung,Abdullah Al Nahid,Grigore Boldirev,Mina Jung,Sungmin Park,Austin Nguyen,Alex Zelikovsky,Nicholas Mancuso,Jong Wha J. Joo,Reid F. Thompson,Houda Alachkar,Serghei Mangul
DOI: https://doi.org/10.1101/2023.05.22.541750
2024-01-16
Abstract:Accurate identification of human leukocyte antigen (HLA) alleles is essential for various clinical and research applications, such as transplant matching and drug sensitivities. Recent advances in RNA-seq technology have made it possible to impute HLA types from sequencing data, spurring the development of a large number of computational HLA typing tools. However, the relative performance of these tools is unknown, limiting the ability for clinical and biomedical research to make informed choices regarding which tools to use. Here we report the study design of a comprehensive benchmarking of the performance of 12 HLA callers across 682 RNA-seq samples from 8 datasets with molecularly defined gold standard at 5 loci, HLA-A, -B, -C, -DRB1, and -DQB1. For each HLA typing tool, we will comprehensively assess their accuracy, compare default with optimized parameters, and examine for discrepancies in accuracy at the allele and loci levels. We will also evaluate the computational expense of each HLA caller measured in terms of CPU time and RAM. We also plan to evaluate the influence of read length over the HLA region on accuracy for each tool. Most notably, we will examine the performance of HLA callers across European and African groups, to determine discrepancies in accuracy associated with ancestry. We hypothesize that RNA-Seq HLA callers are capable of returning high-quality results, but the tools that offer a good balance between accuracy and computational expensiveness for all ancestry groups are yet to be developed. We believe that our study will provide clinicians and researchers with clear guidance to inform their selection of an appropriate HLA caller.
Bioinformatics