SAVMD: an Adaptive Signal Processing Method for Identifying Protein Coding Regions
Qian Zheng,Tao Chen,Wenxiang Zhou,Sajid A. Marhon,Lei Xie,Hongye Su
DOI: https://doi.org/10.1016/j.bspc.2021.102998
IF: 5.1
2021-01-01
Biomedical Signal Processing and Control
Abstract:The identification of protein coding regions is a major topic of research in the field of gene prediction. A number of digital signal processing (DSP) based approaches, which exploit 3-base periodicity to detect coding regions, have been proposed. According to these previously published approaches, we summarize that an effective method or filter for identifying protein coding regions should fulfill three important properties, including the independence of the window length, an effective and adaptive frequency response, a fixed basic frequency of 1/3f. However, most of published approaches cannot simultaneously satisfy these three points, which causes that their identification accuracy is still limited. In this paper, we propose an adaptive signal processing method, called sinusoidal-assisted variational mode decomposition (SAVMD) for identifying coding regions. The adaptability of SAVMD reflects in two aspects including: (i) The proposed method analyzes numerical sequences without needing any window information; (ii) The spectrum of period-3 component can be automatically fitted by SAVMD in Fourier domain. From this, our proposed method outperforms other DSP-based methods in terms of identification accuracy, which is verified by the experimental results on five benchmark datasets. When processing the dataset where most sequences contain undetermined nucleotides (UDT), SAVMD shows more superior performance than the model-dependent method AUGUSTUS as well as other model-independent methods. In addition, we conduct a comparative analysis on different numerical conversions of DNA sequences using SAVMD. Several applicable methods for SAVMD, which are selected from this experimentation, can provide a reference to the applications of other time-frequency decomposition methods in the field of gene prediction.