Abstract
Columbia, NASA's 10,240-processor supercluster, has been ranked as one of the fastest computers in the world since November 2004. In this paper we examine the performance characteristics of its production subclusters, which are typically configurations ranging in size from 512 to 2048 processors. We evaluate floating-point performance, memory bandwidth, and message passing communication speeds using a subset of the HPC Challenge benchmarks, the NAS Parallel Benchmarks, and a computational fluid dynamics application. Our experimental results quantify the performance improvement resulting from changes in interconnect bandwidth, processor speed, and cache size across the different types of SGI Altix 3700s that constitute Columbia. We also report on our experiments that investigate the performance impact of processors sharing a path to memory. Finally, our tests of the different interconnect fabrics available indicate substantial promise for scaling applications to run on configurations of more than 512 CPUs.
Get full access to this article
View all access options for this article.
