Two Screening Methods for Genetic Association Study with Application to Psoriasis Microarray Data Sets

Maggie Haitian Wang,Kelvin K. F. Tsoi,Xin Lai,Marc Chong,Benny Zee,Tian Zheng,Shaw-Hwa Lo,Inchi Hu
DOI: https://doi.org/10.1109/BigDataCongress.2015.55
2015-01-01
Abstract:Feature selection in genome data faces the challenge of high dimensionality of variables. When the goal of analytics is to identify susceptible loci for complex disease, interaction effects need to be considered, and the actual number of variables to be screened is even larger than the original number of variables due to variable combination. Previous methods of feature selection for interactions either exhaustively calculate pair-wise combination across genome or adopt a pre-screening step by marginal effect of genetic markers. However, these methods might still result in a considerably large number of candidate markers that demand further selection, some genes that have moderate main effect but are important for forming subsets of strong interaction might be filtered out. In this article, we introduce two alternative screening methods: one uses the variable appearance frequency (VAF) to select features, the other uses a non-overlapping criteria to reduce the candidate pool. The methods are applied to two real gene-expression datasets for psoriasis, and their advantages are discussed.
What problem does this paper attempt to address?