Abstract LB543: Tmhrd: Accurately Estimating the Homologous Recombination Deficiency Status from Clinical Panel Sequencing Data
Xuwen Wang,Ying Xu,Yanfang Guan,Xin Yi,Jiayin Wang
DOI: https://doi.org/10.1158/1538-7445.am2022-lb543
IF: 11.2
2022-01-01
Cancer Research
Abstract:Abstract Estimating the status of homologous recombination deficiency (HRD) is of clinical interest because the homologous recombination-deficient cells are reported sensitive to poly-ADP ribose polymerase (PARP) inhibitors. As HRD status becomes a critical indicator, many approaches have been proposed to measure the HRD status, among which estimating HRD status from cancer sequencing data is the popular strategy, where the representative methods include HRDetect, SigMA, FoundationFocusCDx, MyChoiceHRD, etc. Although sequencing data is the same input of these bioinformatics tools, algorithms prefer different aspects. Some methods focus on germline variants in BRCA1/2 genes and somatic mutational signatures, while some others capture the genomic scars, including loss of heterozygosity (LOH), telomere allele imbalance (TAI) and large-scale state transition (LST). However, there are two major controversial points when applying these tools on clinical sequencing data. First, panel sequencing is the most popular sequencing plan in clinic, but gene panels usually carry only hundreds to a thousand genes. Thus, the panel sequencing can neither capture a large number of mutations to calculate mutational signatures, nor obtain large genomic scars, such as TAI and LST. Second, the associations between the mutations and HRD status is still unclear. Thus, scoring mutations and genomic scars is not reasonable. To solving these issues, we here proposed a machine learning-based method, called tmHRD, which estimates HRD status according to limited mutations and LOHs captured by clinical panel sequencing data. It consists of a semi-supervised learning (TRI) model and a multi-instance learning (MIL) model. The TRI model filters the false positives from mutation calls. Then, the filtered mutation calls from one patient is defined as a package, where each mutation call is an instance in the package. When training the MIL model, in each iteration, a package with known HRD status (label) is given to MIL model. MIL model calculates the correlations among instances and assigns an adjusted weight for each. Thus, the HRD status is only contributed by those mutations associated with. We tested tmHRD on several datasets obtained by the clinical trials we participated. Compared to the existing methods, tmHRD achieved better performance. The average recognition rate of tmHRD was approaching 90%, while the comparison methods show around 80%. The software package, tmHRD, is freely available at https://github.com/Sherwin-xjtu/TriMILhrd/ for academic usages only. Citation Format: Xuwen Wang, Ying Xu, Yanfang Guan, Xin Yi, Jiayin Wang. tmHRD: Accurately estimating the homologous recombination deficiency status from clinical panel sequencing data [abstract]. In: Proceedings of the American Association for Cancer Research Annual Meeting 2022; 2022 Apr 8-13. Philadelphia (PA): AACR; Cancer Res 2022;82(12_Suppl):Abstract nr LB543.