CSV-Filter: a deep learning-based comprehensive structural variant filtering method for both short and long reads

Zeyu Xia,Weiming Xiang,Qingzhe Wang,Xingze Li,Yilin Li,Junyu Gao,Tao Tang,Canqun Yang,Yingbo Cui
DOI: https://doi.org/10.1093/bioinformatics/btae539
IF: 5.8
2024-09-06
Bioinformatics
Abstract:Motivation: Structural variants (SVs) play an important role in genetic research and precision medicine. As existing SV detection methods usually contain a substantial number of false positive calls, approaches to filter the detection results are needed. Result: We developed a novel deep learning-based SV filtering tool, CSV-Filter, for both short and long reads. CSV-Filter uses a novel multi-level grayscale image encoding method based on CIGAR strings of the alignment results and employs image augmentation techniques to improve SV feature extraction. CSV-Filter also utilizes self-supervised learning networks for transfer as classification models, and employs mixed-precision operations to accelerate training. The experiments showed that the integration of CSV-Filter with popular SV detection tools could considerably reduce false positive SVs for short and long reads, while maintaining true positive SVs almost unchanged. Compared with DeepSVFilter, a SV filtering tool for short reads, CSV-Filter could recognize more false positive calls and support long reads as an additional feature. Availability and implementation: https://github.com/xzyschumacher/CSV-Filter.
What problem does this paper attempt to address?