An attribute-weighted isometric embedding method for categorical encoding on mixed data

Zupeng Liang,Shengfen Ji,Qiude Li,Sigui Hu,Yang Yu
DOI: https://doi.org/10.1007/s10489-023-04899-5
IF: 5.3
2023-08-25
Applied Intelligence
Abstract:Mixed data containing categorical and numerical attributes are widely available in real-world. Before analysing such data, it is typically necessary to process (transform/embed/represent) them into high-quality numerical data. The conditional probability transformation method (CPT) can provide acceptable performance in the majority of cases, but it is not satisfactory for datasets with strong attribute association. Inspired by the one dependence value difference metric method, the concept of relaxing the attributes conditional independence has been applied to CPT, but this approach has the drawback of dramatically-expanding the attribute dimensionality. We employ the isometric embedding method to tackle the problem of dimensionality expansion. In addition, an attribute weighting method based on the must-link and cannot-link constraints is designed to optimize the data transformation quality. Combining these methods, we propose an attribute-weighted isometric embedding (AWIE) for categorical encoding on mixed data. Extensive experimental results obtained on 16 datasets demonstrate that AWIE significantly improves upon the classification performance (increasing the F1-score by 2.54%, attaining 6/16 best results, and reaching average ranks of 1.94/8), compared with 28 competitors.
computer science, artificial intelligence
What problem does this paper attempt to address?