Study on fault tolerance for virtualization-based computer simulation systems

Lei Ren,Yongliang Luo,Yabin Zhang
DOI: https://doi.org/10.4028/www.scientific.net/AMR.201-203.677
2011-01-01
Abstract:Modern computer simulation system has developed towards the direction of large-scale and distributed computing pattern. The large-scale simulation applications always deploy over heterogeneous networks across geographically dispersed locations, and the simulation process often lasts for a long time without intermission. The challenge is that various errors cannot be avoided during a long continuous running time in such a broad network environment with a huge number of simulation resources. The problem of simulation fault tolerance has become a hot issue. This paper introduces live migration method to virtualization-based computer simulation system, handling reliability problems, especially fault tolerance issues. The paper presents a framework of simulation fault tolerance. Then the detailed live migration mechanism of run-time simulation is discussed. The method can provide an approach to consolidating the reliable simulation in distributed and long-term simulation applications.
What problem does this paper attempt to address?