Abstract 5301: SubHap: an Efficient Algorithm for Reconstructing Clonal Haplotypes of Tumor Sample from NGS Data
Rong Zhang,Yu Geng,Jianye Liu,Zhongmeng Zhao,Xuanping Zhang,Jiayin Wang
DOI: https://doi.org/10.1158/1538-7445.am2018-5301
IF: 11.2
2018-01-01
Cancer Research
Abstract:Abstract Clonal architecture is one of the important characteristics of tumor heterogeneity and tumor microenvironment. It often embodies the selective advantage along with the evolution and inheritance of subclones. Identifying and inferring subclonal heterogeneity is suggested, which may facilitate the comprehensive understanding of tumor progression and the interactions among microenvironment. Several state-of-the-art approaches are designed to estimate clonal architecture from the paired tumor-normal sequencing data. However, the existing approaches often suffer an accuracy loss when the loci with allelic imbalance interfere in the distribution of read depth. Some methods attempt to overcome this weakness by reconstructing the clonal genotypes, but still have difficulties to efficiently reach the haplotype resolution, where the latter one is considered to have greater values on both research and clinical implications. Here we propose a novel approach, implemented as SubHap, to reconstruct the clonal haplotypes of tumor sample. The input of the approach is two sets of mapped reads, which are sequenced from the tumor and normal samples, respectively. The outputs include the number of subclones and the possible haplotypes of each subclone. The proposed approach establishes a probabilistic model. It first clusters the somatic mutations according to the variant allelic frequencies, which is roughly proportional to the prior distribution of the proportion of each subclone. Then, an improved maximum spanning tree algorithm is designed. For any mutation site, this algorithm extracts the reads covered the site and iteratively strips the reads from each subclone, guided by both the prior distribution and the local read depth. Each group of the peeled reads is used to assemble the corresponding clonal haplotypes. During the assembly process, SubHap calculates and corrects the posterior distribution of the proportion of each subclone based on an inverse convolution algorithm, which solves the conflicts across some sites. We conduct a series of simulation experiments to test the performance of SubHap under different configurations. The given data include both the 2nd- and 3rd-generation sequencing data. Compared to some popular approaches, e.g., HapCompass, SubHap significantly improves the accuracy varying the coverages, numbers of preset subclones and proportions, etc. When the coverage is greater than 50X, the proposed algorithm achieves 85% on accuracy. Moreover, it requires less computational resources than the existing approaches. The software package SubHap is freely available for academic uses at https://github.com/xjtu712-lab/SubHap. Citation Format: Rong Zhang, Yu Geng, Jianye Liu, Zhongmeng Zhao, Xuanping Zhang, Jiayin Wang. SubHap: An efficient algorithm for reconstructing clonal haplotypes of tumor sample from NGS data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2018; 2018 Apr 14-18; Chicago, IL. Philadelphia (PA): AACR; Cancer Res 2018;78(13 Suppl):Abstract nr 5301.