Abstract:Supercomputing technology has been supporting the solution of cutting-edge scientific and complex engineering problems since its inception—serving as a comprehensive representation of the most advanced computer hardware and software technologies over a period of time. Over the course of nearly 80 years of development, supercomputing has progressed from being oriented towards computationally intensive tasks, to being oriented towards a hybrid of computationally and data-intensive tasks. Driven by the continuous development of high performance data analytics (HPDA) applications—such as big data, deep learning, and other intelligent tasks—supercomputing storage systems are facing challenges such as a sudden increase in data volume for computational processing tasks, increased and diversified computing power of supercomputing systems, and higher reliability and availability requirements. Based on this, data-intensive supercomputing, which is deeply integrated with data centers and smart computing centers, aims to solve the problems of complex data type optimization, mixed-load optimization, multi-protocol support, and interoperability on the storage system—thereby becoming the main protagonist of research and development today and for some time in the future. This paper first introduces key concepts in HPDA and data-intensive computing, and then illustrates the extent to which existing platforms support data-intensive applications by analyzing the most representative supercomputing platforms today (Fugaku, Summit, Sunway TaihuLight, and Tianhe 2A). This is followed by an illustration of the actual demand for data-intensive applications in today’s mainstream scientific and industrial communities from the perspectives of both scientific and commercial applications. Next, we provide an outlook on future trends and potential challenges data-intensive supercomputing is facing. In a word, this paper provides researchers and practitioners with a quick overview of the key concepts and developments in supercomputing, and captures the current and future data-intensive supercomputing research hotspots and key issues that need to be addressed.

Proceedings of the 5th International Workshop on Data-Intensive Computing in the Clouds

Data-Intensive Cloud Computing: Requirements, Expectations, Challenges, and Solutions

Cloud Computing: Data-Intensive Computing and Scheduling

Cloud Computing and Big Data: A Review of Current Service Models and Hardware Perspectives

Big Data and cloud computing: innovation opportunities and challenges

Towards Data Intensive Many-Task Computing

Status, Challenges and Trends of Data-Intensive Supercomputing

Big Data computing and clouds: Trends and future directions

Guest Editorial: Big Data Infrastructure II

A Survey on Emerging Computing Paradigms for Big Data

Recent Developments in Parallel and Distributed Computing for Remotely Sensed Big Data Processing

Data-intensive applications, challenges, techniques and technologies: A survey on Big Data

Studies on the Large Scale Data Processing Technologies Used in Servers for Cloud Computing

Opportunities and Challenges in Running Scientific Workflows on the Cloud

Spatial cloud computing: how can the geospatial sciences use and help shape cloud computing?

Data-Intensive Supercomputing in the Cloud: Global Analytics for Satellite Imagery

An Efficient Massive Data Processing Model in the Cloud -- A Preliminary Report

Data-Intensive Science and Engineering:Requirements and Challenges

Design and Implementation of the Tianhe-2 Data Storage and Management System

Big Data Computing Using Cloud-Based Technologies, Challenges and Future Perspectives

Federation in Cloud Data Management: Challenges and Opportunities