Parallel Versus Distributed Data Access for Gigapixel-Resolution Histology Images: Challenges and Opportunities

Esma Yildirim,David J Foran
DOI: https://doi.org/10.1109/JBHI.2016.2580145
Abstract:Recent advances in digital pathology technology have led to significant improvements in terms of both the quality and resolution of the resulting images, which now often exceed several gigabytes each. Today, several leading institutions across the country utilize whole-slide imaging (WSI) as part of their routine workflow. WSIs have utility in a wide range of diagnostic and investigative pathology applications. The fact that these images are both large in size (about 30 GB when uncompressed) and are generated in nonstandard proprietary formats has limited wider adoption of these technologies and makes the task of accessing, processing, and analyzing them in high-throughput fashion extremely challenging. The common approach for such data analytic applications is to preprocess the large whole-slide images into smaller size files and store them in a generic format. However, this approach limits the advantages that might be realized if different scalability levels and data unit sizes could be dynamically changed based on the specifications of the task at hand and the architectural limits of the infrastructure (e.g., node memory size). Such strategies also introduce extra processing time to the workflow. To address these challenges, we present, in this paper, novel scalable access methods for parallel file systems and distributed file/object storage systems. Experimental results gathered during the course of our studies show that these methods provide opportunities not realizable using traditional approaches. We demonstrate tangible, scalability, and high-throughput advantages using a Lustre parallel file system and AWS S3 distributed storage system.
What problem does this paper attempt to address?