Abstract 473: Identification of Apolipoproteins Using Feature Selection Technique

Hua Tang,Hao Lin
DOI: https://doi.org/10.1161/atvb.36.suppl_1.473
2016-01-01
Abstract:Objective: Apolipoproteins are of great physiological importance and are associated with different diseases such as dyslipidemia, thrombogenesis and angiocardiopathy. Apolipoproteins have therefore emerged as key risk markers and important research targets yet the types of apolipoproteins has not been fully elucidated. Accurate identification of the apoliproproteins is very crucial to the comprehension of cardiovascular diseases and drug design. The aim of this study is to develop a powerful model to precisely identify apolipoproteins. Approach and Results: We manually collected a non-redundant dataset of 53 apoliproproteins and 136 non-apoliproproteins with the sequence identify of less than 40% from UniProt. After formulating the protein sequence samples with g -gap dipeptide composition (here g =1~10), the analysis of various (ANOVA) was adopted to find out the best feature subset which can achieve the best accuracy. Support Vector Machine (SVM) was then used to perform classification. The predictive model was evaluated using a five-fold cross-validation which yielded a sensitivity of 96.2%, a specificity of 99.3%, and an accuracy of 98.4%. The study indicated that the proposed method could be a feasible means of conducting preliminary analyses of apoliproproteins. Conclusion: We demonstrated that apoliproproteins can be predicted from their primary sequences. Also we discovered the special dipeptide distribution in apoliproproteins. These findings open new perspectives to improve apoliproproteins prediction by considering the specific dipeptides. We expect that these findings will help to improve drug development in anti-angiocardiopathy disease. Key words: Apoliproproteins Angiocardiopathy Support Vector Machine
What problem does this paper attempt to address?