Abstract
This paper evaluates and analyzes multilevel parallelism on a chip multiprocessor (CMP) architecture. The environment is based on the experimental IBM BG/Cyclops architecture, where we have run the multi–zone parallel benchmarks. Multilevel parallelism is spawned using the Nanos OpenMP execution environment. We have performed the analysis with different execution parameters in order to evaluate different hardware threads distributions, cache utilization, and thread grouping configurations. Our results demonstrate that a large number of thread groups and good balancing algorithms are critical for high performance. We also show that a small number of threads can share the same data cache to increase the performance, but a large number of threads should better not share the same data caches.
Keywords
Get full access to this article
View all access options for this article.
