Abstract
The effectiveness of Instruction Reuse (IR) – a technique to eliminate redundant computations at run time – is limited by the fact that performance gain seldom exceeds 3% and is dependent on the criticality of instructions being “reused”. In this paper, we focus on the power aspect of IR and propose a “resultbus optimization” that exploits communication reuse to reduce the power dissipated over a high capacitance resultbus. The effectiveness of this optimization depends on the number of result producing instructions that are reused and improves overall power and Energy-Delay Product (EDP) by 3% over a base IR policy for a 1024 entry “Reuse Buffer” (RB).
As a domain specific study, we examine the impact of multithreading on IR in the context of packet header processing applications. Specifically, sharing the RB among threads can lead to either constructive or destructive interference, thereby increasing or decreasing the amount of IR that can be uncovered. Further, packet header processing applications are unique in the sense that repetition in data values within “flows” are quite prevalent which can be exploited to improve IR. We find that an architecture that uses this “flow” information to govern accesses to the RB improves IR by as much as 4.6% for header processing kernels.
Get full access to this article
View all access options for this article.
