Abstract:Secondary use of medical big data is increasingly popular in healthcare services and clinical research. Understanding the logic behind medical big data demonstrates tendencies in hospital information technology and shows great significance for hospital information systems that are designing and expanding services. Big data has four characteristics--Volume, Variety, Velocity and Value (the 4 Vs)--that make traditional systems incapable of processing these data using standalones. Apache Hadoop MapReduce is a promising software framework for developing applications that process vast amounts of data in parallel with large clusters of commodity hardware in a reliable, fault-tolerant manner. With the Hadoop framework and MapReduce application program interface (API), we can more easily develop our own MapReduce applications to run on a Hadoop framework that can scale up from a single node to thousands of machines. This paper investigates a practical case of a Hadoop-based medical big data processing system. We developed this system to intelligently process medical big data and uncover some features of hospital information system user behaviors. This paper studies user behaviors regarding various data produced by different hospital information systems for daily work. In this paper, we also built a five-node Hadoop cluster to execute distributed MapReduce algorithms. Our distributed algorithms show promise in facilitating efficient data processing with medical big data in healthcare services and clinical research compared with single nodes. Additionally, with medical big data analytics, we can design our hospital information systems to be much more intelligent and easier to use by making personalized recommendations.

Optimization strategy of Hadoop small file storage for big data in healthcare

Design and Implementation of Clinical Data Integration and Management System Based on Hadoop Platform

Design and Implementation of Clinical Data Center Based on Hadoop

Storage-Optimization Method for Massive Small Files of Agricultural Resources Based on Hadoop

Small Files Problem Resolution via Hierarchical Clustering Algorithm

Processing Technology of Massive Human Health Data Based on Hadoop

A Proposed Approach for Improving Hadoop Performance for Handling Small Files

An archive‐based method for efficiently handling small file problems in HDFS

An HBase-Based Optimization Model for Distributed Medical Data Storage and Retrieval

A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: A Case Study by PowerPoint Files

Survey on Resource Management Solutions to Speed up Processing Small Files in Hadoop Cluster

Addressing the Small Files Issue in Hadoop

Data Management Techniques in Hadoop Framework for Handling Small Files: A Survey

Design and development of a medical big data processing system based on Hadoop

Cost-Based Optimization Of Logical Partitions For A Query Workload In A Hadoop Data Warehouse

Location-Aware Data Block Allocation Strategy for HDFS-Based Applications in the Cloud

Impact of Small Files on Hadoop Performance: Literature Survey and Open Points

CSFC: A New Centroid Based Clustering Method to Improve the Efficiency of Storing and Accessing Small Files in Hadoop

Optimizing the storage of massive electronic pedigrees in HDFS

Performance optimization of computing task scheduling based on the Hadoop big data platform

Using MapReduce for Large-scale Medical Image Analysis