’ Introduction : Special Section on Many-Task Computing

Ioan Raicu,Ian T. Foster,Yong Zhao
2011-01-01
Abstract:IT is our honor to serve as guest editors of this special section of the IEEE Transactions on Parallel and Distributed Systems (TPDS) on many-task computing (MTC). This section focuses on the methods required to manage and execute large multiple program multiple data (MPMD) computations on large clusters, grids, clouds, and supercomputers. We are pleased to present 10 high-quality contributions chosen from 42 submissions, on resource management, data-intensive computing, applications, and MTC on supercomputers, grids, and clouds. We introduce the term many-task computing (MTC) [2] for computations that bridge the gap between high-performance computing (HPC) and high-throughput computing (HTC) [1]. MTC differs from HTC in its emphasis on using many computing resources over short periods of time to accomplish many computational tasks (both dependent and independent), for which primary metrics are measured in seconds (e.g., FLOPS, tasks/sec., MB/s I/O rates), as opposed to jobs per month. MTC computations comprise multiple distinct activities, coupled via files, shared memory, or message passing. Tasks may be small or large, uniprocessor or multiprocessor, or compute-intensive or data-intensive. The set of tasks may be static or dynamic, homogeneous or heterogeneous, or loosely coupled or tightly coupled. The number of tasks, quantity of computing, and volumes of data may be large. Today’sHPCsystemsareaviableplatformforMTC[3],but large MTC applications can stress HPC hardware and sotware. Challenges include local resource manager scalability and granularity, efficient utilization of raw hardware, parallel file system contention and scalability, data management, I/O management, reliability at scale, application scalability, and understanding the limitations of HPC systems in order to identify good candidate MTC applications [4]. MTC applications can also be executed on cloud systems, but face other challenges there, for example, relating to internode communication performance. Three recent MTC workshops (MTAGS, http://dsl.cs. uchicago.edu/MTAGS10/) and this special section attracted 142 abstracts and 110 paper submissions, from which 41 papers were accepted. Papers covered resource management, data-intensive computing, applications, and MTC on supercomputers, grids, and clouds. More than 1,000 people have participated as coauthors, program committee members, reviewers, and attendees in these venues. We are well beyond a critical mass for a new, thriving community, which is quickly expanding.
What problem does this paper attempt to address?