De novo and somatic structural variant discovery with SVision-pro

Songbo Wang,Jiadong Lin,Peng Jia,Tun Xu,Xiujuan Li,Yuezhuangnan Liu,Dan Xu,Stephen J. Bush,Deyu Meng,Kai Ye
DOI: https://doi.org/10.1038/s41587-024-02190-7
IF: 46.9
2024-03-23
Nature Biotechnology
Abstract:Long-read-based de novo and somatic structural variant (SV) discovery remains challenging, necessitating genomic comparison between samples. We developed SVision-pro, a neural-network-based instance segmentation framework that represents genome-to-genome-level sequencing differences visually and discovers SV comparatively between genomes without any prerequisite for inference models. SVision-pro outperforms state-of-the-art approaches, in particular, the resolving of complex SVs is improved, with low Mendelian error rates, high sensitivity of low-frequency SVs and reduced false-positive rates compared with SV merging approaches.
biotechnology & applied microbiology
What problem does this paper attempt to address?
The paper primarily addresses the issue of discovering de novo and somatic structural variants (SVs) from long-read sequencing data in genomics research. Specifically: - **Challenges in detecting de novo and somatic structural variants**: De novo variants are those that appear for the first time in the offspring's genome and are not present in the parents; somatic variants are those that appear in specific tissues such as tumors. These variants are crucial for understanding the development of Mendelian diseases and cancer, but their detection remains challenging. - **Limitations of existing methods**: Current methods are mainly divided into two categories—callset-merge strategies and read-inference strategies. The former is prone to introducing false positives, while the latter can effectively detect simple structural variants (SSVs) but is inadequate in detecting complex structural variants (CSVs). - **Introduction of SVision-pro**: To address the above issues, researchers have developed a new method called SVision-pro. It is based on deep learning and uses an instance segmentation framework to compare genomic differences between different samples, thereby achieving accurate detection of de novo and somatic structural variants. - **Key features of SVision-pro**: - It employs a sequence-to-image representation module that encodes genomic features into image form, making it easier for neural networks to process. - It uses a neural network recognition module that directly identifies structural variants and their differences between samples through image instance segmentation technology. - SVision-pro can effectively detect complex structural variants and, compared to the best existing methods, excels in detection accuracy, reducing false positive rates, and improving sensitivity to low-frequency variants. In summary, this paper aims to provide a new solution—SVision-pro, to overcome the current limitations in de novo and somatic structural variant detection, enhancing the accuracy and reliability of detection.