StarPhase: Comprehensive Phase-Aware Pharmacogenomic Diplotyper for Long-Read Sequencing Data

James M. Holt,John Harting,Xiao Chen,Daniel Baker,Christopher T. Saunders,Zev Kronenberg,Nina Gonzaludo,Byunggil Yoo,Georgi Hudjashov,Maarja Joeloo,James M.J. Lawlor,Weng Khong Lim,Estonian Biobank Research Team,Saumya S. Jamuar,Gregory M. Cooper,Lili Milani,Tomi Pastinen,Michael A. Eberle
DOI: https://doi.org/10.1101/2024.12.10.627527
2024-12-11
Abstract:Pharmacogenomics is central to precision medicine, informing medication safety and efficacy. Pharmacogenomic diplotyping of complex genes requires full-length DNA sequences and detection of structural rearrangements. We introduce StarPhase, a tool that leverages PacBio HiFi sequence data to diplotype 21 CPIC Level A pharmacogenes and provides detailed haplotypes and supporting visualizations for HLA-A, HLA-B, and CYP2D6. StarPhase diplotypes have high concordance with benchmarks where 99.5% are either exact matches or minor discrepancies. Manual inspection of the 0.5% mismatches indicates they were correctly called by StarPhase. With StarPhase, we update or correct 26.2% of GeT-RM pharmacogenomic diplotypes. Population distributions from StarPhase mostly reflect those of the All of Us cohort, while also highlighting gaps in existing pharmacogenomic databases that long-read sequencing can fill. With a single HiFi whole genome sequencing assay, StarPhase enables robust PGx diplotyping even as additional pharmacogenes and haplotypes are discovered.
Bioinformatics
What problem does this paper attempt to address?