Abstract:Storing tabular data to balance storage and query efficiency is a long-standing research question in the database community. In this work, we argue and show that a novel DeepMapping abstraction, which relies on the impressive memorization capabilities of deep neural networks, can provide better storage cost, better latency, and better run-time memory footprint, all at the same time. Such unique properties may benefit a broad class of use cases in capacity-limited devices. Our proposed DeepMapping abstraction transforms a dataset into multiple key-value mappings and constructs a multi-tasking neural network model that outputs the corresponding values for a given input key. To deal with memorization errors, DeepMapping couples the learned neural network with a lightweight auxiliary data structure capable of correcting mistakes. The auxiliary structure design further enables DeepMapping to efficiently deal with insertions, deletions, and updates even without retraining the mapping. We propose a multi-task search strategy for selecting the hybrid DeepMapping structures (including model architecture and auxiliary structure) with a desirable trade-off among memorization capacity, size, and efficiency. Extensive experiments with a real-world dataset, synthetic and benchmark datasets, including TPC-H and TPC-DS, demonstrated that the DeepMapping approach can better balance the retrieving speed and compression ratio against several cutting-edge competitors.

What problem does this paper attempt to address?

This paper attempts to solve the problem of how to achieve efficient compression and fast query under limited computing and storage resources when storing and querying tabular data in edge devices. Specifically, the paper proposes a novel data abstraction method named DeepMapping, which utilizes the memory ability of deep neural networks to integrate compression and indexing functions in order to better balance storage cost, query latency and runtime memory footprint. ### Core Problems of the Paper 1. **Real - time and Resource Constraints**: As real - time computing is increasingly pushed to edge servers with limited computing and storage capabilities, how to balance storage (such as disk and memory) and computing costs (such as query execution latency) on such platforms to achieve real - time response has become a key issue. 2. **Limitations of Existing Methods**: - **Regression Approximation**: For example, ModelarDB uses regression to approximate piecewise numerical data, but it needs to scan each segment, resulting in high query latency. - **Ordered Compression**: For example, separation coding compresses by forcing ordering, but it requires binary search, also resulting in high query latency. 3. **Importance of Queries**: For many emerging edge applications (such as self - service retail, quality control in large - scale manufacturing, autonomous robots, etc.), random query and update are essential functions. However, existing solutions are not effective in integrating compression and indexing techniques to achieve both low storage cost and low query latency simultaneously. ### DeepMapping's Solution DeepMapping utilizes the powerful memory ability of deep neural networks to convert the data set into multiple key - value mappings and constructs a multi - task neural network model that can output the value corresponding to a given input key. To handle memory errors, DeepMapping combines the learned neural network with a lightweight auxiliary data structure to correct errors. In addition, the auxiliary structure is designed so that DeepMapping can efficiently handle insertion, deletion and update operations without retraining the mapping. ### Key Contributions 1. **Novel Hybrid Data Representation**: - **Compact Multi - task Neural Network Model**: Used to capture the correlation between keys (input features) and values (labels). - **Auxiliary Precision - guaranteeing Structure**: Compresses misclassified data of the model and records the existence of data to ensure query accuracy. 2. **Multi - task Hybrid Architecture Search (MHAS)**: - Adaptively adjusts the number and size of shared and private layers through deep reinforcement learning to minimize the overall size of the hybrid architecture. 3. **Workflow Supporting Insertion, Deletion and Update**: - Proposes a lazy update process. By implementing modification operations in the auxiliary structure, the retraining of the neural network model is triggered only when the size of the auxiliary structure exceeds a threshold. ### Experimental Results The experimental results show that DeepMapping outperforms existing baseline methods on TPC - H, TPC - DS, synthetic data sets and real - world data sets, achieving a speedup of up to 15 times in scenarios with limited memory capacity and significantly reducing I/O and decompression costs. ### Summary DeepMapping provides a novel and effective solution by combining deep learning and auxiliary data structures, which can achieve efficient data compression and fast query in edge devices and solve the deficiencies of existing methods in terms of accuracy and efficiency.

DeepMapping: Learned Data Mapping for Lossless Compression and Efficient Lookup

3D LiDAR Map Compression Using Deep Neural Network.

Decomposition of Neural Discrete Representations for Large-Scale 3D Mapping

Learning-Based Data Storage [Vision] (Technical Report)

Deep Model Transferability from Attribution Maps

Deep Memory-Augmented Proximal Unrolling Network for Compressive Sensing

DeepPointMap: Advancing LiDAR SLAM with Unified Neural Descriptors

SemiMap: A Semi-Folded Convolution Mapping for Speed-Overhead Balance on Crossbars.

Activation Map Compression through Tensor Decomposition for Deep Learning

DeepMIF: Deep Monotonic Implicit Fields for Large-Scale LiDAR 3D Mapping

Optimizing for In-memory Deep Learning with Emerging Memory Technology

Memory and Computation Coordinated Mapping of DNNs Onto Complex Heterogeneous SoC.

Dictionary Pair-based Data-Free Fast Deep Neural Network Compression

A Spatial Mapping Algorithm with Applications in Deep Learning-Based Structure Classification

Deep Learning with Passive Optical Nonlinear Mapping

Towards Large-Scale Incremental Dense Mapping using Robot-centric Implicit Neural Representation

Mind Mappings: Enabling Efficient Algorithm-Accelerator Mapping Space Search

Memory-Scalable and Simplified Functional Map Learning

ADA-Tucker: Compressing Deep Neural Networks via Adaptive Dimension Adjustment Tucker Decomposition

Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding

TransMap: an Efficient CGRA Mapping Framework Via Transformer and Deep Reinforcement Learning