Abstract
The graphic processing unit (GPU) is becoming increasingly popular as a performance accelerator in various applications requiring high-performance parallel computing capability. In a central processing unit (CPU) or GPU hybrid system, software pipelining is a major task in order to deliver accelerated performance, where hiding CPU–GPU communication overheads by splitting a large task into small units is the key challenge. In this paper, we carry out a systematic investigation into task partitioning in order to achieve maximum performance gain. We first validate the advantage of even partition strategy, and then propose the optimal scheduling, with detailed study into how to achieve optimal unit size (data granularity) in an analytical framework. Experiments on AMD and NVIDIA GPU platforms demonstrate that our approaches achieve around 31 – 59% performance improvement using software pipelining.
Keywords
Get full access to this article
View all access options for this article.
