Fuzzy neuron modeling of incomplete data for missing value imputation

Zheng Zhang,Xiaoming Yan,Liyong Zhang,Xiaochen Lai,Wei Lu
DOI: https://doi.org/10.1016/j.ins.2023.120065
IF: 8.1
2024-01-05
Information Sciences
Abstract:Missing values are a common problem found in many real-world datasets, and cannot be avoided. It is a challenging task to model incomplete data and reasonably impute missing values. This paper focuses on regression imputation and uses a tracking-removed autoencoder (TRAE) to construct the mutual fitting correlation on incomplete data. Considering the differences in regression relationships across different sample categories, we introduce Takagi-Sugeno (TS) fuzzy architecture and propose a category-based tracking-removed autoencoder (TS-TRAE) to model incomplete data for missing value imputation. The TS-TRAE model partitions the incomplete dataset into several subclusters using membership information obtained from fuzzy clustering, then establishes a TRAE-based submodel to mine relationships within each subcluster for precise modeling of incomplete data. During model training, in order to fully utilize all existing values, we treat missing values as variables and propose an iterative learning method that optimizes missing variables and network parameters collaboratively. This method allows incomplete samples to participate in model training while also enabling the imputation of missing values. The TS-TRAE model integrates the inner category structure of incomplete data and the attribute association features effectively. The experimental results verify the effectiveness of the proposed method.
computer science, information systems
What problem does this paper attempt to address?