Abstract:BACKGROUND:Gene expression profiling has become a useful biological resource in recent years, and it plays an important role in a broad range of areas in biology. The raw gene expression data, usually in the form of large matrix, may contain missing values. The downstream analysis methods that postulate complete matrix input are thus not applicable. Several methods have been developed to solve this problem, such as K nearest neighbor impute method, Bayesian principal components analysis impute method, etc. In this paper, we introduce a novel imputing approach based on the Support Vector Regression (SVR) method. The proposed approach utilizes an orthogonal coding input scheme, which makes use of multi-missing values in one row of a certain gene expression profile and imputes the missing value into a much higher dimensional space, to obtain better performance.RESULTS:A comparative study of our method with the previously developed methods has been presented for the estimation of the missing values on six gene expression data sets. Among the three different input-vector coding schemes we tried, the orthogonal input coding scheme obtains the best estimation results with the minimum Normalized Root Mean Squared Error (NRMSE). The results also demonstrate that the SVR method has powerful estimation ability on different kinds of data sets with relatively small NRMSE.CONCLUSION:The SVR impute method shows better performance than, or at least comparable with, the previously developed methods in present research. The outstanding estimation ability of this impute method is partly due to the use of the most missing value information by incorporating orthogonal input coding scheme. In addition, the solid theoretical foundation of SVR method also helps in estimation of performance together with orthogonal input coding scheme. The promising estimation ability demonstrated in the results section suggests that the proposed approach provides a proper solution to the missing value estimation problem. The source code of the SVR method is available from http://202.38.78.189/downloads/svrimpute.html for non-commercial use.

Missing value imputation for microRNA expression data by using a GO-based similarity measure

MiRGOFS: a GO-based Functional Similarity Measurement for Mirnas, with Applications to the Prediction of Mirna Subcellular Localization and Mirna-Disease Association

Improving Clustering of MicroRNA Microarray Data by Incorporating Functional Similarity

A New Method for Measuring Functional Similarity of Micrornas

A hybrid imputation approach for microarray missing value estimation

Comparative Analysis of Similarity Measurements in Mirnas with Applications to Mirna-Disease Association Predictions

Inferring Human Mirna Functional Similarity Based on Gene Ontology Annotations.

Microarray Missing Value Imputation

A Global Learning with Local Preservation Method for Microarray Data Imputation

Microarray Missing Value Imputation: A Regularized Local Learning Method

An efficient ensemble method for missing value imputation in microarray gene expression data

Functional Similarities Between Micrornas Inferred from Biomedical Texts.

Inferring the Human Microrna Functional Similarity and Functional Network Based on Microrna-Associated Diseases

A meta-data based method for DNA microarray imputation

Evaluations on Several Imputation Approaches of Integrated Omics Data

Microarray Missing Data Imputation Based on A Set Theoretic Framework and Biological Constraints

DNA Microarray Data Imputation and Significance Analysis of Differential Expression

DeepMiR2GO: Inferring Functions of Human MicroRNAs Using a Deep Multi-Label Classification Model.

Missing Value Estimation for DNA Microarray Gene Expression Data by Support Vector Regression Imputation and Orthogonal Coding Scheme

Missing value estimation for microarray data based on fuzzy C-means clustering

Predicting Gene Ontology Function of Human MicroRNAs by Integrating Multiple Networks