Abstract:With the avalanche of genomic and proteomic data generated in the postgenomic age, it is highly desirable to develop automated methods for rapidly and effectively analyzing and predicting the structure, function, and other properties of DNA and protein. The machine learning methods have become an important strategy for the discovery of potential knowledge in genomics and proteomics. Researches in recent years have shown tremendous advances in the properties prediction of DNA fragments and protein sequences by various pattern recognition methods. These techniques provide economical and timesaving solutions for identifying the properties of DNA and protein. This special issue was hosted for the recent development of the application of machine learning methods in genomics and proteomics. In this special issue, five works focused on the protein classification. How to extract key features from a protein was a key step in the discrimination of protein class. B. Liu et al. proposed to use Position-Specific Score Matrix (PSSM) and Accessible Surface Area (ASA) to formulate protein samples. The hidden Markov support vector machine (HM-SVM) was employed to predict protein binding site. Simulation in fivefold cross-validation on a benchmark dataset including 1124 protein chains showed that their method is more accurate for protein binding site prediction than some state-of-the art methods. This method can also be applied in DNA binding site, vitamin binding site, and posttranslational modification of proteins. Based on chemical shift (CS) information derived from nuclear magnetic resonance (NMR), F. Yonge proposed a novel feature to predict protein supersecondary structures. The quadratic discriminant (QD) analysis was selected as the prediction algorithm. Overall accuracy in threefold cross-validation is 77.3% for predicting four types of supersecondary structures. According to the concept of pseudo amino acids, G.-L. Fan et al. proposed the average chemical shifts (ACS) composition and established an online webserver called acACS which was calculated from average chemical shift information and protein secondary structure. By using SVM as the classification algorithm, the acACS was used in the discrimination between acidic and alkaline enzymes and between bioluminescent and nonbioluminescent proteins. Encouraging results were achieved. The protein secondary structure, structure class, and disorder region can be predicted using the AC-based method. L. Nanni et al. proposed to combine different features to improve protein prediction. These features include amino acids composition, PSSM, and substitution matrix representation (SMR). Each feature is used to train a separate SVM. Total of 15 benchmark datasets were used to evaluate the performance of their proposed method. Comparative results show that the PSSM always produces good accuracies. However, no single descriptor is superior to all others across all test datasets. The major contribution in this paper is to propose an ensemble of classifiers for sequence-based protein classification. H. Lin et al. briefly reviewed the development of ion channel prediction using machine learning method. They initially introduced how to construct a valid and objective benchmark dataset to train and test the predictor. Subsequently, the mathematical descriptors were presented to formulate the ion channel sequences. Moreover, two feature selection techniques on how to optimize feature set were described. Finally, the support vector machine was suggested performing classification. The methods introduced in that review can be generalized into other protein prediction fields as well. The paper from P. Feng et al. was the unique work focused on DNA prediction using machine learning method. They proposed a novel descriptor called pseudo K-tuple nucleotide composition (PseKNC) to formulate the DNA sequences. The feature is calculated from K-tuple nucleotide composition and the structural correlation of DNA dinucleotides. Subsequently, the SVM was used to predict DNase I hypersensitive sites. The jackknife cross-validated accuracy is 83%, which is competitive with that of the existing method. This new descriptor can also be widely used in DNA regulatory elements prediction. Hao Lin Wei Chen Ramu Anandakrishnan Dariusz Plewczynski

Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features

Identification of DNA-binding proteins by incorporating evolutionary information into pseudo amino acid composition via the top-n-gram approach

An improved sequence based prediction protocol for DNA-binding proteins using SVM and comprehensive feature analysis

Identification of DNA-binding proteins by auto-cross covariance transformation

Use Chou's 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information

Improved DNA-Binding Protein Identification by Incorporating Evolutionary Information into the Chou's PseAAC

Use Chou’s 5-Step Rule to Predict DNA-Binding Proteins with Evolutionary Information

Newdna-Prot: Prediction of DNA-binding Proteins by Employing Support Vector Machine and a Comprehensive Sequence Representation.

gDNA-Prot: Predict DNA-binding proteins by employing support vector machine and a novel numerical characterization of protein sequence.

Prediction of nucleic acid-binding proteins using support vector machines

PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation

DNA binding protein identification by combining pseudo amino acid composition and profile-based protein representation

SVM-Based Approach for Predicting DNA-Binding Residues in Proteins from Amino Acid Sequences

Using Pseudo-Amino Acid Composition and Support Vector Machine to Predict Protein Structural Class.

Analysis and Prediction of Single-Stranded and Double-Stranded DNA Binding Proteins Based on Protein Sequences

Application of machine learning method in genomics and proteomics.

Computational Methods for Predicting DNA Binding Proteins

Predicting the Classification of Transcription Factors by Incorporating Their Binding Site Properties into A Novel Mode of Chou'S Pseudo Amino Acid Composition

A Novel Sequence-Based Method of Predicting Protein DNA-Binding Residues, Using a Machine Learning Approach

Identifying DNA-binding proteins by combining support vector machine and PSSM distance transformation

Identification of DNA-binding proteins by Kernel Sparse Representation via <mml:math xmlns:mml="http://www.w3.org/1998/Math/MathML" altimg="si1.svg"><mml:mrow><mml:msub><mml:mi mathvariant="bold">L</mml:mi><mml:mn mathvariant="bold">2,1</mml:mn></mml:msub></mml:mrow></mml:math>-matrix norm