Query Recovery from Easy to Hard: Jigsaw Attack against SSE

Hao Nie,Wei Wang,Peng Xu,Xianglong Zhang,Laurence T. Yang,Kaitai Liang
2024-03-02
Abstract:Searchable symmetric encryption schemes often unintentionally disclose certain sensitive information, such as access, volume, and search patterns. Attackers can exploit such leakages and other available knowledge related to the user's database to recover queries. We find that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing keywords with high volumes/frequencies are more susceptible to recovery, even when countermeasures are implemented. Attackers can also effectively leverage these ``special'' queries to recover all others.
Cryptography and Security
What problem does this paper attempt to address?
The problem that this paper attempts to solve is: How to recover users' query content in the Searchable Symmetric Encryption (SSE) scheme by using leaked information and knowledge of the user database. Specifically, the paper focuses on gradually recovering queries from easy to difficult through attack means, and proposes an attack method named Jigsaw. ### Problem Background The SSE scheme allows users to securely search encrypted databases on remote servers, but in this process, some sensitive information may be inadvertently leaked, such as access patterns, volume patterns, and search patterns. These leaked information can be exploited by attackers to recover users' query content. In particular, the paper points out that the effectiveness of query recovery attacks depends on the volume/frequency distribution of keywords. Queries containing high - frequency keywords are more likely to be recovered, even in the presence of some defensive measures. ### Main Challenges 1. **Ease of Recovery of High - Frequency Queries**: Queries containing high - frequency keywords are more likely to be recovered. 2. **Gradual Query Recovery**: Recover queries gradually from easy to difficult, rather than attempting to recover all queries at the start. 3. **Effectiveness of Existing Defensive Measures**: Existing defensive measures (such as padding and obfuscation) may not be able to completely prevent query recovery attacks. ### Solutions To solve these problems, the paper proposes the Jigsaw attack method, which gradually recovers queries through the following three core modules: 1. **Locate and Recover the Most Distinctive Queries**: - Use the volume and frequency information of keywords to identify and recover the most distinctive queries. - Calculate the difference distance \( d_{td_i} \) for each query: \[ d_{td_i}=\min_{td_j\in T_{dr}\wedge j\neq i}(\alpha\cdot|v_{td_i} - v_{td_j}|+(1 - \alpha)\cdot|f_{td_i}-f_{td_j}|) \] where \( \alpha \) is the weight of volume, and \( 1-\alpha \) is the weight of frequency. - Sort according to the difference distance, and select the top \( \text{BaseRec} \) queries as the most distinctive queries for recovery. 2. **Accurate Verification Based on the Co - occurrence Matrix**: - Use the co - occurrence matrix to further refine the recovered queries to ensure that these queries match the keywords in the actual data. - Construct a co - occurrence matrix of queries and keywords, and improve the matching accuracy through normalization processing. 3. **Recover the Remaining Queries**: - Use the results of the previous two modules, combined with co - occurrence matrix, volume, and frequency information, generate scores for query - keyword combinations, and maximize the scores to obtain the final query match. - Iteratively use the recovered queries to help recover subsequent queries. ### Experimental Results Experiments show that the Jigsaw attack achieves an accuracy of about 90% on three test datasets, and can still maintain high accuracy (60% and 85% respectively) when facing existing defensive measures (such as padding and obfuscation). In addition, when the keyword universe scale is large (≥3k), the Jigsaw attack is more than 20% higher than the current state - of - the - art attack methods. ### Summary This paper effectively solves the problem of gradually recovering SSE queries from easy to difficult by proposing the Jigsaw attack method, and demonstrates its superior performance in various scenarios.