Towards Multi-Modal DBMSs for Seamless Querying of Texts and Tables

Matthias Urban,Carsten Binnig
2023-04-28
Abstract:In this paper, we propose Multi-Modal Databases (MMDBs), which is a new class of database systems that can seamlessly query text and tables using SQL. To enable seamless querying of textual data using SQL in an MMDB, we propose to extend relational databases with so-called multi-modal operators (MMOps) which are based on the advances of recent large language models such as GPT-3. The main idea of MMOps is that they allow text collections to be treated as tables without the need to manually transform the data. As we show in our evaluation, our MMDB prototype can not only outperform state-of-the-art approaches such as text-to-table in terms of accuracy and performance but it also requires significantly less training data to fine-tune the model for an unseen text collection.
Databases,Computation and Language
What problem does this paper attempt to address?
### Problems the Paper Aims to Solve This paper aims to propose a new database system—Multimodal Database (MMDB), which can seamlessly query both text and tabular data. Specifically: 1. **Addressing the Insufficient Support for Multimodal Data in Traditional Databases**: - Current database systems are primarily optimized for tabular data and provide poor support for other data types (such as text, images, etc.). - Modern data applications often need to handle various data types, and existing relational database systems are not adept at managing these multimodal scenarios. 2. **Achieving Seamless Querying of Multimodal Data**: - Proposes the introduction of multimodal operators (MMOps) by extending relational database systems, allowing seamless handling of text and other unstructured data directly within SQL queries. - By using large-scale pre-trained models (such as GPT-3), text data can be processed as tabular data without the need for manual data format conversion. 3. **Improving Query Efficiency and Accuracy**: - Proposes a new method based on large-scale pre-trained models to implement multimodal operators (such as multimodal joins). - This method not only improves the accuracy and performance of queries but also significantly reduces the amount of training data required, performing well even with unseen text collections. With these improvements, MMDB can better support the querying needs of multimodal data without sacrificing efficiency.