Executing SQL queries over encrypted character strings in the Database-As-Service model

Zongda Wu,Guandong Xu,Zong Yu,Xun Yi,Enhong Chen,Yanchun Zhang
DOI: https://doi.org/10.1016/j.knosys.2012.05.009
IF: 8.139
2012-01-01
Knowledge-Based Systems
Abstract:Rapid advances in the networking technologies have prompted the emergence of the ''software as service'' model for enterprise computing, moreover, which is becoming one of the key industries quickly. ''Database as service'' model provides users power to store, modify and retrieve data from anywhere in the world, as long as they have access to the Internet, thus, being increasingly popular in current enterprise data management systems. However, this model introduces several challenges, an essential issue being how to implement SQL queries over encrypted data efficiently. To ensure data security, this model generally encrypts sensitive data at the trusted client's site, before storing them into the non-trusted database service provider's site, which, unfortunately, results in that SQL queries cannot be executed over the encrypted data immediately at the database service provider. In this paper we only focus on how to query encrypted character strings efficiently. Our strategy is that when storing character strings to the database service provider, we not only store the encrypted character strings themselves, but also generate some characteristic index values for these character strings, and store them in an additional field; and when querying the encrypted character strings, we first execute a coarse query over the characteristic index fields at the database service provider, in order to filter out most of tuples not related to the querying conditions, and then, we decrypt the rest tuples and execute a refined query over them again at the client site. In our strategy, we define an n-phase reachability matrix for a character string and use it as the characteristic index values, and based on such a definition, we present some theorems to split a SQL query into its server-side representation and client-side representation for partitioning the computation of a query across the client and the server and thus improving query performance. Finally, experimental results validate the functionality and effectiveness of our strategy.
What problem does this paper attempt to address?