Nonconvex Fusion Penalties for High-dimensional Hierarchical Categorical Variables

Zixuan Zhao,Yuehan Yang
DOI: https://doi.org/10.1016/j.ins.2024.121143
IF: 8.1
2024-01-01
Information Sciences
Abstract:Hierarchical categorical data is commonly encountered in social science, genetics, and other fields. The interactions between variables in hierarchical structures introduce complexity in modeling and predicting. We focus on modeling the high-dimensional linear models with hierarchical categorical variables and introduce an efficient method. The proposed method offers computational advantages when dealing with high-dimensional categorical data. In the theoretical part, we demonstrate the uniqueness of the solution and show that the proposed estimator converges the least square solution under the high probability. Additionally, we showcase the effectiveness of our method on two real-world datasets, a cancer-reg dataset and an adult dataset, and simulated datasets, where our method outperforms comparative approaches in terms of predictive accuracy, variable selection, and model complexity.
What problem does this paper attempt to address?