An Accurate Algorithm for Identifying Mutually Exclusive Patterns on Multiple Sets of Genomic Mutations

Siyu He,Jiayin Wang,Zhongmeng Zhao,Xuanping Zhang
DOI: https://doi.org/10.1007/978-3-031-34960-7_11
2023-01-01
Abstract:In cancer genomics, the mutually exclusive patterns of somatic mutations are important biomarkers that are suggested to be valuable in cancer diagnosis and treatment. However, detecting these patterns of mutation data is an NP-hard problem, which pose a great challenge for computational approaches. Existing approaches either limit themselves to pair-wise mutually exclusive patterns or largely rely on prior knowledge and complicated computational processes. Furthermore, the existing algorithms are often designed for genotype datasets, which may lose the information about tumor clonality, which is emphasized in tumor progression. In this paper, an algorithm for multiple sets with mutually exclusive patterns based on a fuzzy strategy to deal with real-type datasets is proposed. Different from the existing approaches, the algorithm focuses on both similarity within subsets and mutual exclusion among subsets, taking the mutual exclusion degree as the optimization objective rather than a constraint condition. Fuzzy clustering of the is done mutations by method of membership degree, and a fuzzy strategy is used to iterate the clustering centers and membership degrees. Finally, the target subsets are obtained, which have the characteristics of high similarity within subsets and the largest number of mutations, and high mutual exclusion among subsets and the largest number of subsets. This paper conducted a series of experiments to verify the performance of the algorithm, including simulation datasets and truthful datasets from TCGA. According to the results, the algorithm shows good performance under different simulation configurations, and some of the mutually exclusive patterns detected from TCGA datasets were supported by published literatures. This paper compared the performance to MEGSA, which is the best and most widely used method at present. The purities and computational efficiencies on simulation datasets outperformed MEGSA.
What problem does this paper attempt to address?