On the Importance of End-to-End Application Performance Monitoring and Workload Analysis at the Exascale

Abstract

This paper sets out to examine the future of performance monitoring on exascale HPC systems. In particular we put forth the idea that such machines will be sufficiently complex that performance monitoring of individual applications and the workload as a whole will change from being a beneficial option to being a necessity. This complexity arises from the number of components and concurrencies expected for such systems. We see the need for a shift from performance monitoring being a useful add-on toward it being a core requirement for basic operation and suggest some first steps toward meeting that need.

Keywords

exascale parallel computing performance monitoring workload characterization applications

Get full access to this article

View all access options for this article.

References

Kramer, W.T.C. , Carter, J. , Skinner, D. , Oliker, L. , Husbands, P. , Hargrove, P. , Shalf, J. , et al. (2006). Software roadmap to plug and play petaflop/s. Technical Report, LBNL-59999, LBNL.

Skinner, D. and Kramer, W. Understanding the causes of performance variability in HPC workloads, IEEE International Symposium on Workload Characterization, IISWC05.