Abstract:Deep learning for tabular data has garnered increasing attention in recent years, yet employing deep models for structured data remains challenging. While these models excel with unstructured data, their efficacy with structured data has been limited. Recent research has introduced retrieval-augmented models to address this gap, demonstrating promising results in supervised tasks such as classification and regression. In this work, we investigate using retrieval-augmented models for anomaly detection on tabular data. We propose a reconstruction-based approach in which a transformer model learns to reconstruct masked features of \textit{normal} samples. We test the effectiveness of KNN-based and attention-based modules to select relevant samples to help in the reconstruction process of the target sample. Our experiments on a benchmark of 31 tabular datasets reveal that augmenting this reconstruction-based anomaly detection (AD) method with sample-sample dependencies via retrieval modules significantly boosts performance. The present work supports the idea that retrieval module are useful to augment any deep AD method to enhance anomaly detection on tabular data.

What problem does this paper attempt to address?

### What problem does this paper attempt to solve? This paper aims to solve the problem of **deep anomaly detection (AD) in tabular data**. Specifically, the author focuses on how to use retrieval - augmented methods to improve the performance of deep learning models in anomaly detection on tabular data. #### Background and Challenges 1. **Limitations of deep learning on tabular data**: - Although deep learning performs well in processing unstructured data (such as images, text), its effectiveness is limited when processing structured data (such as tabular data). - The characteristics of tabular data (such as the dependencies between features and the dependencies between samples) make it difficult for standard deep models to process effectively. 2. **Deficiencies of existing methods**: - Existing anomaly detection methods perform poorly when dealing with tabular data, especially when sample - sample dependencies need to be considered. - Traditional deep anomaly detection methods mainly rely on feature - feature dependencies and ignore sample - sample dependencies. #### Solutions The author proposes a **retrieval - augmented deep anomaly detection method**, which specifically includes the following points: 1. **Introducing an external retrieval module**: - By introducing an external retrieval module, the model can use sample - sample dependencies to enhance the performance of anomaly detection. - The retrieval module can select samples similar to the target sample from the training set to help the model better reconstruct and detect anomalies. 2. **Transformer - based reconstruction framework**: - Use the Transformer model to learn to reconstruct masked features, thereby constructing anomaly scores. - By comparing different types of retrieval modules (such as KNN - based and attention - based), evaluate their impact on the performance of anomaly detection. 3. **Combining feature - feature and sample - sample dependencies**: - Experiments have proven that combining these two types of dependencies can significantly improve the effect of anomaly detection, especially when dealing with different types of anomalies (such as global anomalies, local anomalies, cluster anomalies, etc.). #### Main Contributions 1. **Comprehensive evaluation of retrieval - augmented methods**: Through experiments on multiple tabular datasets, the effectiveness of the retrieval - augmented method has been verified. 2. **Improving anomaly detection performance**: Experiments have proven that using a retrieval module can significantly improve the performance of existing deep anomaly detection methods. 3. **Explaining the role of dependencies**: A detailed analysis of why combining feature - feature and sample - sample dependencies can better identify anomalies in tabular data. Through these improvements, this paper provides a new and effective method for anomaly detection in tabular data and provides a valuable reference for future research.

Retrieval Augmented Deep Anomaly Detection for Tabular Data

Feature Interaction-Based Reinforcement Learning for Tabular Anomaly Detection

Data-Efficient and Interpretable Tabular Anomaly Detection

Disentangling Tabular Data towards Better One-Class Anomaly Detection

SemanticMask: A Contrastive View Design for Anomaly Detection in Tabular Data

Boosting Anomaly Detection Using Unsupervised Diverse Test-Time Augmentation

DTAAD: Dual Tcn-Attention Networks for Anomaly Detection in Multivariate Time Series Data

Retrieval-Based Transformer for Table Augmentation

Deep Anomaly Detection Via Active Anomaly Search.

Anomaly Detection of Tabular Data Using LLMs

Rethinking Data Augmentation for Tabular Data in Deep Learning

AnoGAN for Tabular Data: A Novel Approach to Anomaly Detection

TracInAD: Measuring Influence for Anomaly Detection

BTAD: A binary transformer deep neural network model for anomaly detection in multivariate time series data

SORTAD: Self-Supervised Optimized Random Transformations for Anomaly Detection in Tabular Data

Deep Anomaly Detection and Search via Reinforcement Learning

Learning to Detect Interesting Anomalies

TabADM: Unsupervised Tabular Anomaly Detection with Diffusion Models

TUT: Template-Augmented U-Net Transformer for Unsupervised Anomaly Detection

A Comprehensive Augmentation Framework for Anomaly Detection