Abstract:As the data produced by individuals and enterprises that need to be stored and utilized are rapidly increasing, data owners are motivated to outsource their local complex data management systems into the cloud for its great flexibility and economic savings. However, as sensitive cloud data may have to be encrypted before outsourcing, which obsoletes the traditional data utilization service based on plaintext keyword search, how to enable privacy-assured utilization mechanisms for outsourced cloud data is thus of paramount importance. Considering the large number of on-demand data users and huge amount of outsourced data files in cloud, the problem is particularly challenging, as it is extremely difficult to meet also the practical requirements of performance, system usability, and high-level user searching experiences.In this paper, we investigate the problem of secure and efficient similarity search over outsourced cloud data. Similarity search is a fundamental and powerful tool widely used in plaintext information retrieval, but has not been quite explored in the encrypted data domain. Our mechanism design first exploits a suppressing technique to build storage-efficient similarity keyword set from a given document collection, with edit distance as the similarity metric. Based on that, we then build a private trie-traverse searching index, and show it correctly achieves the defined similarity search functionality with constant search time complexity. We formally prove the privacy-preserving guarantee of the proposed mechanism under rigorous security treatment. To demonstrate the generality of our mechanism and further enrich the application spectrum, we also show our new construction naturally supports fuzzy search, a previously studied notion aiming only to tolerate typos and representation inconsistencies in the user searching input. The extensive experiments on Amazon cloud platform with real data set further demonstrate the validity and practicality of the proposed mechanism.

Large-scale document similarity computation based on cloud computing platform

SCAN: A Smart Application Platform for Empowering Parallelizations of Big Genomic Data Analysis in Clouds

Visual Analysis of Cloud Computing Performance Using Behavioral Lines

Examination Data Analysis and Evaluation Platform Based on Cloud Computing

The performance of MapReduce: an in-depth study

Vhadoop: A Scalable Hadoop Virtual Cluster Platform for MapReduce-Based Parallel Machine Learning with Performance Consideration

A Cloud Computing Application in Land Resources Information Management

The Performance of MapReduce

CLUSTER-BASED OCEAN REMOTE SENSING IMAGE FUSION PARALLEL COMPUTING STRATEGY

Medical Cloud Computing Data Processing to Optimize the Effect of Drugs

Query similarity computing based on system similarity measurement

Secure and Efficient Similarity Retrieval in Cloud Computing Based on Homomorphic Encryption

Parallel architectures for fuzzy triadic similarity learning

Using Link-Based Content Analysis to Measure Document Similarity Effectively

Implementation Issues of A Cloud Computing Platform.

Achieving Usable and Privacy-Assured Similarity Search over Outsourced Cloud Data

Towards Efficient Subgraph Search In Cloud Computing Environments

Research on method for extracting large-scale social network based on Mapreduce

An Efficient Organization Method for Large-Scale and Long Time-Series Remote Sensing Data in a Cloud Computing Environment

Diving Into Cloud-Based File Synchronization With User Collaboration

Studies on the Large Scale Data Processing Technologies Used in Servers for Cloud Computing