A new tool for the performance analysis of massively parallel computer systems

Anton Stefanek,Richard Hayden,Jeremy Bradley
DOI: https://doi.org/10.4204/EPTCS.28.11
2010-06-26
Abstract:We present a new tool, GPA, that can generate key performance measures for very large systems. Based on solving systems of ordinary differential equations (ODEs), this method of performance analysis is far more scalable than stochastic simulation. The GPA tool is the first to produce higher moment analysis from differential equation approximation, which is essential, in many cases, to obtain an accurate performance prediction. We identify so-called switch points as the source of error in the ODE approximation. We investigate the switch point behaviour in several large models and observe that as the scale of the model is increased, in general the ODE performance prediction improves in accuracy. In the case of the variance measure, we are able to justify theoretically that in the limit of model scale, the ODE approximation can be expected to tend to the actual variance of the model.
Performance,Distributed, Parallel, and Cluster Computing,Numerical Analysis
What problem does this paper attempt to address?
The problem that this paper attempts to solve is the challenges in performance analysis of large - scale parallel computer systems, especially how to efficiently generate key performance indicators. Specifically, the author introduces a new tool, GPA (Generalized PEPA Analyzer), which can generate performance metrics for very large - scale systems based on solving systems of ordinary differential equations (ODEs). This method has higher scalability compared to stochastic simulation, and the GPA tool is the first tool that can generate high - order moment analysis from differential equation approximations, which is crucial for obtaining accurate performance predictions. ### Main contributions of the paper: 1. **High - order moment analysis**: - The GPA tool can generate high - order moment analysis, which cannot be achieved by previous methods. High - order moment analysis is very important for understanding the variability of system behavior, especially in cases where accurate performance prediction is required. 2. **Identification of error sources**: - The author identifies the so - called "switch points" as error sources in differential equation approximations. These switch points occur when the total rates of cooperative actions between different component groups are equal, causing the minimum function to switch and thus introducing errors. 3. **Theoretical verification**: - The author proves through theoretical analysis that as the model scale increases, the accuracy of the differential equation approximation will improve. In particular, for the variance metric, it can be theoretically expected that in the limit of the model scale, the differential equation approximation will tend to the actual variance. 4. **Tool implementation**: - The GPA tool implements high - order moment approximation analysis for large - scale PEPA models and can visualize the distance between the model and the switch points, helping users identify potential errors. ### Specific problem examples: - **Processor/resource model**: - The author demonstrates the capabilities of the GPA tool through a simple processor/resource model. This model describes the situation where multiple processors share limited resources, and each processor needs to periodically acquire resources to perform tasks. By comparing the results of ODE approximation and stochastic simulation, it is shown that near certain time points (such as \( t = 0.2 \)), the occurrence of errors is consistent with the location of the switch points. - **Client/server model**: - The author also discusses a more complex two - stage client/server model, where the client first requests a service and then waits for the server to respond. Using the GPA tool, the author analyzes the behavior of the switch points in this model and verifies the accuracy of the high - order moment approximation. ### Theoretical support: - **Convergence of mean approximation**: - Based on Kurtz's results, the author proves that when the number of components tends to infinity, the mean approximation will converge to the ODE solution. - **Convergence of variance approximation**: - The author further proves that under a large population size, after appropriate scaling, the approximation of the variance will converge to the solution given by the ODEs. This is achieved by decomposing the underlying CTMC into the sum of a deterministic process and a Gaussian process, and deriving the ODEs that describe the evolution of the covariance of the Gaussian process. ### Conclusion: Through the development and application of the GPA tool, this paper solves key problems in performance analysis of large - scale parallel computer systems, especially making important progress in high - order moment analysis and error identification. These results not only improve the accuracy of performance prediction but also provide a powerful tool for behavior analysis of complex systems.