Abstract
With processor speeds no longer doubling every 18—24 months owing to the exponential increase in power consumption and heat dissipation, modern high-end computing systems tend to rely less on the performance of single processing units and instead rely on achieving high performance by using the parallelism of a massive number of low-frequency/low-power processing cores. Using such low-frequency cores, however, puts a premium on end-host pre- and post-communication processing required within communication stacks, such as the Message Passing Interface (MPI) implementation. Similarly, small amounts of serialization within the communication stack that were
Get full access to this article
View all access options for this article.
