Abstract
A detailed study of the parallel performance of the interpolation supplemented lattice Boltzmann (ISLB) method using SHMEM and MPI on the Cray T3E-900 and Cray X1 architectures is presented. The noteworthy feature of the present implementation of the ISLB method is that it is able to achieve a sustained speed of 4.2 Tflop/s while using 504 processors on a Cray X1. The code is shown to achieve super-linear speedups on the Cray T3E-900. It is shown through detailed profiling that the computation and the communication scale well on the Cray X1, although the overall speedup is adversely affected by the cost of barrier synchronization.
Get full access to this article
View all access options for this article.
