Abstract:Today extending virtualization technology into high-performance, cluster platforms generates exciting new possibilities, including dynamic allocation of resources to job, easier to share resources between different jobs, easy checkpointing of jobs, and deployment of job-specific work environment. However, there still exists an I/O scalability problem in virtualization layer which may impede virtualization technology to be widely used in high-performance computing. Because we meet a sharp performance degradation when a virtual machine uses the multiqueue high performance non-volatile storage device as the secondary storage. Such a problem is caused by the current virtual block I/O layer which uses only one I/O thread to handle all I/O operations to a virtualized storage device. As the number of I/O intensive workloads increases, the rate of mutex contention of the I/O thread is accelerated because only one of them is allowed to run at any given instant. Therefore, it is the key problem that should be settled immediately so as to improve block I/O performance in virtualization. In this paper, we propose a novel design of high performance block I/O stack to solve this problem. The workloads will be free of the I/O contention inside the hypervisor by using the proposed method which uses multi-threaded I/O threads to handle all I/O operations to one storage device in parallel. Meanwhile, we use switch-less mechanisms to reduce the overhead caused by sending notification between a VM and its hypervisor; and improve I/O affinity by assigning a distinct dedicated core to each I/O thread in order to eliminate unnecessary scheduling. The prototype system is implemented on Linux 3.19 kernel and Quick Emulator (QEMU) 2.3.1. We deploy it to the POWER8 server for a detailed evaluation. The experimental results show that the proposed architecture scales graciously with multi-core environment. For example, test on 10-ways parallel I/O intensive workloads gets an 800\% increase than the single core implementation, indicating that the block I/O performance in a virtual machine is close to that of a bare metal system.

UrsaX: Integrating Block I/O and Message Transfer for Ultrafast Block Storage on Supercomputers

HadaFS: A File System Bridging the Local and Shared Burst Buffer for Exascale Supercomputers.

A New Approach to Double I/O Performance for Ceph Distributed File System in Cloud Computing

BM-Store: A Transparent and High-performance Local Storage Architecture for Bare-metal Clouds Enabling Large-scale Deployment

A Practical Cross-Datacenter Fault-Tolerance Algorithm in the Cloud Storage System.

LightPool: A NVMe-oF-based High-performance and Lightweight Storage Pool Architecture for Cloud-Native Distributed Database

High Performance and Scalable Virtual Machine Storage I/O Stack for Multicore Systems

OCStore: Accelerating Distributed Object Storage with Open-Channel SSDs

A distributed file system for a wide-area high performance computing infrastructure

Exploring Scientific Application Performance Using Large Scale Object Storage

Optimizing NVMe Storage for Large-scale Deployment: Key Technologies and Strategies in Alibaba Cloud

UStore: A Low Cost Cold and Archival Data Storage System for Data Centers.

Exploring the Future of Out-of-core Computing with Compute-Local Non-Volatile Memory

A Survey on User-Space Storage and Its Implementations

Understanding the Performance of Ceph Block Storage for Hyper-Converged Cloud with All Flash Storage

A novel non-volatile memory storage system for I/O-intensive applications

Survey the storage systems used in HPC and BDA ecosystems

Performance Measurements of Supercomputing and Cloud Storage Solutions

xNVMe: Unleashing Storage Hardware-Software Co-design

XOS: An Application-Defined Operating System for Data Center Servers

Demystifying CXL Memory with Genuine CXL-Ready Systems and Devices