Indexing on Healthcare Big Data

Aneri Mehta,Vahishta Vandriwala,Jigna Patel,Jitali Patel
DOI: https://doi.org/10.1007/978-981-16-2712-5_23
2021-01-01
Abstract:Extensive use of tools and technology in the healthcare field results in big data. Indexing helps in the faster retrieval of intended data from very big datasets. Indexing of big data becomes a major concern when the querying field is present in very few of the input files, but the number of input files is more. In this paper, the authors have built an index structure for the datasets for the faster-querying purposes by reducing the number of files to be scanned for querying the database. A focused literature survey imparts different indexing methodology with a suitable application area. The inverted index is the most suitable indexing for the COVID-19 dataset, including the details of the patients of India. The inverted index is created on the column with state names in the dataset. MapReduce programming framework is used for the implementation with the Apache Hadoop platform. Algorithms for grouping the data, creation of an index, and querying are exemplified with the explanation. Test cases are designed on the COVID-19 dataset with indexing and without indexing. Time taken to create an index and to fetch the records is recorded for various volumetric input files. Results and discussion part of this paper indicates that the use of an index enhances the performance of querying on big data. The performance that is achieved while querying the database after index construction is much better than normal querying.
What problem does this paper attempt to address?