DANets: Deep Abstract Networks for Tabular Data Classification and Regression

Jintai Chen,Kuanlun Liao,Yao Wan,Danny Z. Chen,Jian Wu
DOI: https://doi.org/10.48550/arXiv.2112.02962
2022-09-07
Abstract:Tabular data are ubiquitous in real world applications. Although many commonly-used neural components (e.g., convolution) and extensible neural networks (e.g., ResNet) have been developed by the machine learning community, few of them were effective for tabular data and few designs were adequately tailored for tabular data structures. In this paper, we propose a novel and flexible neural component for tabular data, called Abstract Layer (AbstLay), which learns to explicitly group correlative input features and generate higher-level features for semantics abstraction. Also, we design a structure re-parameterization method to compress the learned AbstLay, thus reducing the computational complexity by a clear margin in the reference phase. A special basic block is built using AbstLays, and we construct a family of Deep Abstract Networks (DANets) for tabular data classification and regression by stacking such blocks. In DANets, a special shortcut path is introduced to fetch information from raw tabular features, assisting feature interactions across different levels. Comprehensive experiments on seven real-world tabular datasets show that our AbstLay and DANets are effective for tabular data classification and regression, and the computational complexity is superior to competitive methods. Besides, we evaluate the performance gains of DANet as it goes deep, verifying the extendibility of our method. Our code is available at <a class="link-external link-https" href="https://github.com/WhatAShot/DANet" rel="external noopener nofollow">this https URL</a>.
Machine Learning,Artificial Intelligence
What problem does this paper attempt to address?
The main problem that this paper attempts to solve is: **Improve the performance of neural networks on tabular data classification and regression tasks**. Specifically, the author points out that although deep learning has achieved great success in the fields of computer vision and natural language processing, there are few effective neural network architectures specifically designed for tabular data. Existing methods are either based on ensemble learning (such as XGBoost) or use combinations of shallow neural networks. These methods fail to fully utilize the potential of deep models and have high computational costs. To address these problems, the author proposes a new neural network component - **Abstract Layer (ABSTLAY)** and **Deep Abstract Networks (DAN ETs)** built based on it. The following are the main contributions of the paper: 1. **Propose ABSTLAY**: - ABSTLAY is a new neural network component designed to perform high - level feature abstraction on tabular data. It learns to group related input features and generates higher - level semantic features. - ABSTLAY uses a learnable sparse weight mask to select feature groups and extracts high - level features from these feature groups through a simple attention mechanism. - To reduce the computational complexity in the inference stage, the author also develops a structural re - parameterization method to combine the two - step operation of ABSTLAY into one step. 2. **Construct DAN ETs**: - DAN ETs are deep neural networks constructed by stacking multiple ABSTLAYs for tabular data classification and regression tasks. - A special shortcut path is introduced in DAN ETs, which can obtain information from the original tabular features, help feature interactions between different levels, and increase feature diversity. 3. **Experimental verification**: - The author conducted extensive experiments on seven real - world tabular datasets. The results show that ABSTLAY and DAN ETs perform excellently in classification and regression tasks, and their computational complexity is also better than other competing methods. - The experiments also verify that as the network depth increases, the performance of DAN ETs is further improved, demonstrating its scalability. ### Formula Summary - **Feature Selection Function**: \[ M=\text{entmax}_\alpha(W_{\text{mask}}),\quad f' = M\odot f \] where \(W_{\text{mask}}\in\mathbb{R}^m\) is a learnable parameter vector, \(\odot\) represents element - wise multiplication, and \(f'\in\mathbb{R}^m\) is the selected feature. - **Feature Abstraction Function**: \[ q = \text{sigmoid}(\text{BN}(W_1 f')),\quad f^*=\text{ReLU}(q\odot\text{BN}(W_2 f')) \] where \(W_c\in\mathbb{R}^{d\times m}\) (\(c = 1,2\)) are two learnable parameter matrices, and \(\text{BN}\) represents batch normalization. - **Parallel Processing and Output Fusion**: \[ f_o=\sum_{k = 1}^K p_k\circ s_k(f) \] where \(p_k\circ s_k\) represents the composite function of the \(k\)-th feature selection function and feature abstraction function, and \(K\) is the number of feature groups. - **Re - parameterized ABSTLAY Operation**: \[ f_o=\sum_{k = 1}^K\text{ReLU}(\text{sigmoid}(W^*_{k,1}f + b^*_{k,1})\odot(W^*_{k,2}f + b^*_{k,2})) \] where \(W^*_{k,c}\) and \(b^*_{k,c}\)