Quantitative Sequence-Activity Model Analysis of Oligopeptides Coupling an Improved High-Dimension Feature Selection Method with Support Vector Regression.

Lifeng Wang,Zhijun Dai,Hongyan Zhang,Lianyang Bai,Zheming Yuan
DOI: https://doi.org/10.1111/cbdd.12242
2014-01-01
Chemical Biology & Drug Design
Abstract:Five hundred and thirty‐one physicochemical property parameters of amino acids were directly used as descriptors to characterize the structure of oligopeptides. Based on support vector regression (SVR), a novel rapid selection method called binary matrix resetting filter (BMRF) was proposed to nonlinearly select high‐dimensional features and then multiround last‐elimination (MRLE) was used for subtle screening. The reserved descriptors were used to construct the regression model with SVR, which was then applied to the quantitative sequence–activity model (QSAM) analysis for two oligopeptide systems. Compared with the widely used 16 kinds of amino acid descriptors, four QSAM modeling methods and four feature selection methods, our work shows a significant improvement in modeling performance, especially in external prediction. Furthermore, the real biochemical significance corresponding to reserved descriptors can be given directly, and the interpretability of the established QSAM model is improved significantly. This novel method has a high potential to become an available tool for regression analysis of high‐dimension data, such as QSAM modeling of peptides or even proteins.
What problem does this paper attempt to address?