Abstract:Variable or feature selection is one of the most important steps in model specification. Especially in the case of medical-decision making, the direct use of a medical database, without a previous analysis and preprocessing step, is often counterproductive. In this way, the variable selection represents the method of choosing the most relevant attributes from the database in order to build a robust learning models and, thus, to improve the performance of the models used in the decision process. In biomedical research, the purpose of variable selection is to select clinically important and statistically significant variables, while excluding unrelated or noise variables. A variety of methods exist for variable selection, but none of them is without limitations. For example, the stepwise approach, which is highly used, adds the best variable in each cycle generally producing an acceptable set of variables. Nevertheless, it is limited by the fact that it commonly trapped in local optima. The best subset approach can systematically search the entire covariate pattern space, but the solution pool can be extremely large with tens to hundreds of variables, which is the case in nowadays clinical data. Genetic algorithms (GA) are heuristic optimization approaches and can be used for variable selection in multivariable regression models. This tutorial paper aims to provide a step-by-step approach to the use of GA in variable selection. The R code provided in the text can be extended and adapted to other data analysis needs.

Variable Selection in Credit Risk Models for Chinese Listed Companies

Relationship Between Credit Risk and Market Value of Listed Companies Based on Copula Method

Gradient Learning Approach For Variable Selection In Credit Scoring

PENALIZED VARIABLE SELECTION PROCEDURE FOR COX MODELS WITH SEMIPARAMETRIC RELATIVE RISK

A Unified Variable Selection Approach for Varying Coefficient Models

Variable Selection in Logistic Regression Model with Genetic Algorithm.

Variable Selection in High-Dimensional Quantile Varying Coefficient Models

Variable Selection in Robust Joint Mean and Covariance Model for Longitudinal Data Analysis

Huber Loss Meets Spatial Autoregressive Model: A Robust Variable Selection Method with Prior Information

Variable Selection for High Dimensional Gaussian Copula Regression Model: an Adaptive Hypothesis Testing Procedure.

Principal Component Analysis and Factor Analysis for Feature Selection in Credit Rating

Bayesian Variable Selection for Single Index Logistic Model

Variable Selection with Copula Entropy

Variable selection in censored quantile regression with high dimensional data

A Transparent and Nonlinear Method for Variable Selection

Variable selection and model prediction based on Lasso, adaptive lasso and elastic net

Variable Selection of Spatial Logistic Autoregressive Model with Linear Constraints

Robust exponential squared loss-based variable selection for high-dimensional single-index varying-coefficient model

A Logit Model Of Credit Risk Management And Its Application In A China'S Bank

Variable Importance Assessments and Backward Variable Selection for High-Dimensional Data

Variable selection for competing risk regression models: recommendations for analyzing data from epidemiological studies