Automated Feature Interaction and Feature Representation Learning of Multi-field Categorical Data

Jinxiao Du,Donghua Yang,Yun Liu,Mengmeng Li,Haifeng Guo,Bo Zheng,Hongzhi Wang
DOI: https://doi.org/10.1109/BigDIA60676.2023.10429200
2023-01-01
Abstract:Categorical data across diverse domains has been extensively employed, encompassing areas such as online advertising, recommendation systems, and internet search. Conventional approaches, which represent it as a binary feature in a high-dimensional space using one-hot encoding, encounter significant data sparsity challenges. Therefore, feature embedding technique is required. FM (Factory Machine), recognized for its efficacy in feature embedding, struggles to effectively uncover intricate high-order patterns. This study introduces a novel Cat2vec-based Factory Machine (CFM) designed to acquire distributed representations of multi-field categorical data. The model employs an Embedding Layer for extracting hidden vectors from initial features, an Interaction Layer + K-Max Pooling Layer for automatic feature interaction and the capture of significant high-order interactions, and finally, an FM Layer to determine the model’s loss function. Empirical findings on an extensive public CTR prediction dataset showcase the superior performance of CFM over several robust benchmarks.
What problem does this paper attempt to address?