In-Machine-Learning Database: Reimagining Deep Learning with Old-School SQL

Len Du
DOI: https://doi.org/10.48550/arXiv.2004.05366
2020-04-15
Abstract:In-database machine learning has been very popular, almost being a cliche. However, can we do it the other way around? In this work, we say "yes" by applying plain old SQL to deep learning, in a sense implementing deep learning algorithms with SQL. Most deep learning frameworks, as well as generic machine learning ones, share a de facto standard of multidimensional array operations, underneath fancier infrastructure such as automatic differentiation. As SQL tables can be regarded as generalisations of (multi-dimensional) arrays, we have found a way to express common deep learning operations in SQL, encouraging a different way of thinking and thus potentially novel models. In particular, one of the latest trend in deep learning was the introduction of sparsity in the name of graph convolutional networks, whereas we take sparsity almost for granted in the database world. As both databases and machine learning involve transformation of datasets, we hope this work can inspire further works utilizing the large body of existing wisdom, algorithms and technologies in the database field to advance the state of the art in machine learning, rather than merely integerating machine learning into databases.
Machine Learning,Databases
What problem does this paper attempt to address?
The problem that this paper attempts to solve is to combine deep learning with database technology. However, contrary to the traditional "In - Database Machine Learning", this paper explores the possibility of using the traditional SQL language to implement deep - learning algorithms, namely the so - called "In - Machine - Learning Database". Specifically, the author attempts to express common deep - learning operations such as convolution, pooling, and fully - connected layers through SQL, and shows how to implement a simple Convolutional Neural Network (CNN) and Graph Convolutional Network (GCN) with SQL. ### Core Problems of the Paper 1. **Combination of Deep Learning and SQL**: The paper discusses how to use SQL, a relational database query language, to implement deep - learning algorithms, thereby providing a new perspective and method for designing and implementing deep - learning models. 2. **Handling of Sparsity**: In modern deep learning, especially in Graph Convolutional Networks (GCN), sparsity is an important characteristic. And database systems have rich experience and optimization techniques in handling sparse data. The paper attempts to use this experience to improve the efficiency and performance of deep - learning models. 3. **Implementation of Automatic Differentiation**: Automatic differentiation is a key technique in deep learning. The paper discusses how to implement automatic differentiation in SQL to support the training process of deep - learning models. ### Main Contributions of the Paper 1. **Proof of Concept**: Through specific examples (such as CNN and GCN), it shows how to implement deep - learning models with SQL, proving the feasibility of this idea. 2. **Sparse Data Processing**: It emphasizes the advantages of databases in handling sparse data and proposes the potential of applying these advantages to deep learning. 3. **Future Directions**: It proposes future research directions for further combining database technology with deep learning, such as implementing automatic differentiation in databases. ### Innovation Points of the Paper - **New Perspective**: Re - examine deep learning from the perspective of the database and propose a new implementation method. - **Technology Integration**: Combine the techniques and advantages of the two fields of databases and deep learning, providing new ideas for the design and implementation of deep - learning models. - **Sparsity Handling**: Utilize the rich experience of databases in handling sparse data to provide a new solution to the sparsity problem in deep learning. In short, this paper aims to implement deep - learning algorithms through SQL, explore the application potential of database technology in deep learning, and provide new directions and ideas for future deep - learning research.