Sage Journals: Discover world-class research

Abstract

Columbia, NASA's 10,240-processor supercluster, has been ranked as one of the fastest computers in the world since November 2004. In this paper we examine the performance characteristics of its production subclusters, which are typically configurations ranging in size from 512 to 2048 processors. We evaluate floating-point performance, memory bandwidth, and message passing communication speeds using a subset of the HPC Challenge benchmarks, the NAS Parallel Benchmarks, and a computational fluid dynamics application. Our experimental results quantify the performance improvement resulting from changes in interconnect bandwidth, processor speed, and cache size across the different types of SGI Altix 3700s that constitute Columbia. We also report on our experiments that investigate the performance impact of processors sharing a path to memory. Finally, our tests of the different interconnect fabrics available indicate substantial promise for scaling applications to run on configurations of more than 512 CPUs.

Keywords

SGI Altix HPC Challenge benchmarks NAS Parallel Benchmarks computational fluid dynamics

Get full access to this article

View all access options for this article.

References

Bailey, D. , Barton, J. , Lasinski, T. , and Simon, H. (Eds.) (1991). The NAS Parallel Benchmarks, Technical Report NAS-91-002, NASA Ames Research Center , Moffett Field, CA.

Bailey, D. , Harris, T. , Saphir, W. , Van der Wijngaart, R. , Woo, A. , and Yarrow, M. (1995). The NAS Parallel Benchmarks 2.0, Technical Report NAS-95-020, NASA Ames Research Center, Moffett Field, CA.

Biswas, R. , Djomehri, M.J. , Hood, R. , Jin, H. , Kiris, C. , and Saini, S. (2005). An applications-based performance characterization of the Columbia supercluster, In Proceedings of SC|05, Seattle, WA. ACM & IEEE.

Borrill, J. , Carter, J. , Oliker, L. , Skinner, D. , and Biswas, R. (2005). Integrated performance monitoring of a cosmology application on leading HEC platforms, In Proceedings of 34th International Conference on Parallel Processing ( ICPP), Oslo, Norway, pp. 119—128.

Buning, P.G. , Jespersen, D.C. , Pulliam, T.H. , Chan, W.M. , Slotnick, J.P. , Krist, S.E. , and Renze, K.J. (1999). Overflow user's manual, version 1.8g, Technical report, NASA Langley Research Center, Hampton, VA.

Djomehri, M.J. and Biswas, R. (2003). Performance analysis of a hybrid overset multi-block application on multiple architectures, In Proceedings of 10th International Conference on High Performance Computing (HiPC), Hyderabad, India, IEEE & ACM, pp. 383—392.

Djomehri, M.J. , Biswas, R. , and Lopez-Benitez, N. (2003). Load balancing strategies for multi-block overset grid applications, In Proc. 18th International Conference on Computers and Their Applications (CATA) , Honolulu, HI, ISCA, pp. 373—378.

Effective Bandwidth Benchmark (2006). Effective Bandwidth Benchmark, http://www.hlrs.de/organization/par/services/models/mpi/b_eff/

HPC Challenge Benchmarks (2006). HPC Challenge Benchmarks, http://icl.cs.utk.edu/hpcc

10.

InfiniBand Specifications (2006). InfiniBand Specifications, http://www.infinibandta.org/specs

11.

Jin, H. and Van der Wijngaart, R. (2004). Performance characteristics of the multi-zone NAS Parallel Benchmarks, In Journal of Parallel and Distributed Computing, 66 ( 2006) pp. 674—685. Santa Fe, NM.

12.

Liu, J. , Chandrasekaran, B. , Wu, J. , Jiang, W. , Kini, S. , Yu, W. , Buntinas, D. , Wyckoff, P. , and Panda, D. (2003). Performance comparison of MPI implementations over InifiniB and, Myrinet, and Quadrics, In Proceedings of SC2003, Phoenix, AZ. ACM & IEEE.

13.

McCalpin, J. (2006). The STREAM benchmark, http://www.cs.virginia.edu/stream/ref.html

14.

Meakin, R. and Wissink, A.M. (1999). Unsteady aerodynamic simulation of static and moving bodies using scalable computers, In Proceedings of 14th AIAA Computational Fluid Dynamics Conference, Paper 99-3302, Norfolk, VA. AIAA.

15.

NAS Parallel Benchmarks (2006). NAS Parallel Benchmarks, http://www.nas.nasa.gov/Software/NPB

16.

Strawn, R.C. and M.J. Djomehri (2002). Computational modeling of hovering rotor and wake aerodynamics, Journal of Aircraft, 39(5): 786—793.

17.

Top500 (2006). Top500 Supercomputer Sites, http://www.top500.org

18.

Voltaire (2006). Voltaire ISR 9288 InfiniB and switch router, http://www.voltaire.com/documents/9288dsweb.pdf

19.

Woodacre, M. (2003). SGI global shared-memory architecture: Enabling system-wide shared memory, http://www.embedded-computing.com/articles/woodacre/

Benchmarking the Columbia Supercluster

Abstract

Keywords

Get full access to this article

References