Fuzzy-Folded Bloom Filter-as-a-Service for Big Data Storage in the Cloud
Amritpal Singh,Sahil Garg,Kuljeet Kaur,Shalini Batra,Neeraj Kumar,Kim-Kwang Raymond Choo
DOI: https://doi.org/10.1109/tii.2018.2850053
IF: 12.3
2019-04-01
IEEE Transactions on Industrial Informatics
Abstract:With the ongoing trend of smart and Internet-connected objects being deployed across a broad range of applications, there is also a corresponding increase in the amount of data movement across different geographical regions. This, in turn, poses a number of challenges with respect to big data storage across multiple locations, including cloud computing platform. For example, the underlying distributed file system has a large number of directories and files in the form of gigantic trees, which are difficult to parse in polynomial time. Moreover, with the exponential increase of big data streams (i.e., unbounded sets of continuous data flows), challenges associated with indexing and membership queries are compounded. The capability to process such significant amount of data with high accuracy can have significant impact on decision-making and formulation of business and risk-related strategies, particularly in our current Industrial Internet of Things environment (IIoT). However, existing storage solutions are deterministic in nature. In other words, they tend to consume considerable memory and CPU time to yield accurate results. This necessitates the design of efficient quality of service-aware IIoT applications that are able to deal with the challenges of data storage and retrieval in the cloud computing environment. In this paper, we present an effective space-effective strategy for massive data storage using bloom filter (BF). Specifically, in the proposed scheme, the standard BF is extended to incorporate fuzzy-enabled folding approach, hereafter referred to as fuzzy folded BF (FFBF). In FFBF, fuzzy operations are used to accommodate the hashed data of one BF into another to reduce storage requirements. Evaluations on UCI ML AReM and Facebook datasets demonstrate the efficacy of FFBF, in terms of dealing with approximately 1.9 times more data as compared to using the standard BF. This is also achieved without affecting the false positive rate and query time.
automation & control systems,computer science, interdisciplinary applications,engineering, industrial