Abstract
Graphics processing unit accelerated supercomputers have proved to be very effective, especially with regard to power efficiency, for accelerating compute intensive applications like the high-performance Linpack used in the TOP500 list. This paper presents the details of a CUDA implementation of the high-performance conjugate gradient, a new proposed benchmark that better represents modern application workloads which rely more heavily on memory system and network performance than high-performance Linpack. The results obtained at full scale on the largest graphics processing unit supercomputers in the world, Titan, the Cray XK7 at ORNL and Piz-Daint, the Cray XC30 at CSCS, indicate that graphics processing unit accelerated supercomputers are also very effective for this type of workload. A comparison with other architectures is also presented, showing that graphics processing units, with their high memory bandwidth, are the highest performing devices for this new benchmark.
Get full access to this article
View all access options for this article.
