Communication and Optimization Aspects of Parallel Programming Models on Hybrid Architectures

Abstract

Most HPC systems are clusters of shared memory nodes. Parallel programming must combine the distributed memory parallelization on the node interconnect with the shared memory parallelization inside each node. The hybrid MPI+OpenMP programming model is compared with pure MPI, compiler based parallelization, and other parallel programming models on hybrid architectures. The paper focuses on bandwidth and latency aspects, and also on whether programming paradigms can separate the optimization of communication and computation. Benchmark results are presented for hybrid and pure MPI communication. This paper analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes.

Get full access to this article

View all access options for this article.

References

Ayguade, E., Gonzalez, M., Labarta, J., Martorell, X., Navarro, N., and Oliver, J. 1999. ManosCompiler: A research platform for OpenMP extensions . In: Proceedings of the 1st European Workshop on OpenMP (EWOMP'99), Lund, Sweden, September. www.it.1th.se/ewomp99/papers/ayguade.pdf.

Benkner, S. and Brandes, T. 2001. High-level data mapping for clusters of SMPs . In: Proceedings of the 6th International Workshop on High-Level Parallel Programming Models and Supportive Environments, HIPS 2001, San Francisco, April. Lecture Notes in Computer Science, 2026: 1-15 .

Berrendorf, R., Gerndt, M., Nagel, W.E., and Prumerr, J. 1993. SVM Fortran. Technical Report IB-9322, KFA Jülich, Germany. www.fz-juelich.de/zam/docs/printable/ib/ib-93/ib-9322.ps.

Cappello, F. and Etiemble, D. 2000. MPI versus MPI+OpenMP on the IBM SP for the NAS benchmarks . In: Proceddings of Supercomputing'00, Dallas, TX. http://citeseer.nj.nec.com/cappello00mpi.html; www.sc2000.org/techpapr/papers/pap.pap214.pdf.

Carlson, W.W., Draper, J.M., Culler, D.E., Yelick, K., Brooks, E., and Warren, K. 1999. Introduction to UPC and language specification . CCS-TR-99-157, May 13. http://www.super.org/upc/, www.gwu.edu and http://projects.seas.gwu.edu/~hpcl/upcdev/upctr.pdf.

Ciotti, R.B., Taft, J.R., and Petersohn, J. 2000. Early experiences with the 512 processor single system image Origin2000 . In: Proceedings of the 42nd International Cray User Group Conference, SUMMIT 2000, Noordwijk, The Netherlands, May 22-26, www.cug.org.

El-Ghazawi, T. and Chauvin, S. 2001. UPC benchmarking issues . In: Proceedings of the International Conference on Parallel Processing, pp. 365-372 . http://projects.seas.gwu.edu/~hpcl/upcdev/UPC/_bench.pdf.

Gropp, W., Lusk, E., Doss, N., and Skjellum, A. 1996. A high-performance, portable implementation of the MPI message passing interface standard . Parallel Computing, 22(6): 789-828 .

Hager, G., Deserno, F., and Wellein, G. 2003. Pseudo-vectorization and RISC optimization techniques for the Hitachi SR8000 architecture. Accepted for publication in High Performance Computing in Science and Engineering, Munich '02, Springer-Verlag .

10.

Harris, J. 2000. Extending OpenMP for NUMA architectures . In: Proceedings of the Second European Workshop on OpenMP, EWOMP 2000. www.epcc.ed.ac.uk/ewomp2000/proceedings.html.

11.

Henty, D.S. 2000. Performance of hybrid message-passing and shared-memory parallelism for discrete element modelling . In: Proceedings of Supercomputing'00, Dallas, TX. http://citeseer.nj.nec.com/henty00performance.html; www.sc2000.org/techpapr/papers/pap.pap154.pdf.

12.

Hess, M., Jost, G., Müller, M., and Rühle, R. 2002. Experiences using OpenMP based on compiler directed software DSM on a PC cluster . In: WOMPAT2002: Workshop on OpenMP Applications and Tools, University of Alaska, Fairbanks, August 5-7.

13.

Koniges, A.E., Rabenseifner, R., and Solchenbach, K. 2001. Benchmark design for characterization of balanced high-performance architectures . In: Proceedings of the 15th International Parallel and Distributed Processing Symposium (IPDPS'01), Workshop on Massively Parallel Processing, April 23-27, San Francisco, p. 196-196 .

14.

Loft, R.D., Thomas, S.J., and Dennis, J.M. 2001. Terascale spectral element dynamical core for atmospheric general circulation models . In: Proceedings of SC 2001, November, Denver. www.sc2001.org/papers/pap.pap189.pdf.

15.

Merlin, J. 2000. Distributed OpenMP: Extensions to OpenMP for SMP clusters . In: Proceedings of the Second European Workshop on OpenMP, EWOMP 2000. www.epcc.ed.ac.uk/ewomp2000/proceedings.html.

16.

Message Passing Interface Forum. 1995. MPI: A Message-Passing Interface Standard, Release 1.1, June, www.mpi-forum.org.

17.

Message Passing Interface Forum. 1997. MPI-2: Extensions to the Message-Passing Interface, July, www.mpi-forum.org.

18.

Müller, M.M. 2001. Compiler-generated vector-based prefetching on architectures with distributed memory. In: High Performance Computing in Science and Engineering '01, Transactions of the High Performance Computing Center Stuttgart (HLRS), W. Jäger and E. Krause (eds), Springer , pp. 527-539.

19.

Numrich, R.W. and Reid, J.K. 1998. Co-array Fortran for parallel programming . ACM Fortran Forum, 17(2): 1-31 . www.co-array.org and ftp://matisa.cc.rl.ac.uk/pub/reports/nrRAL98060.ps.gz.

20.

Rabenseifner, R. and Koniges, A.E. 2001. Effective communication and file-I/O bandwidth benchmarks . In: Recent Advances in Parallel Virtual Machine and Message Passing Interface, Proceedings of the 8th European PVM/MPI Users' Group Meeting, Santorini, Greece, LNCS 2131, Y. Cotronis, J. Dongarra (Eds.), Springer, pp. 24-35 , www.hlrs.de/mpi/b_eff/, www.hlrs.de/mpi/b_eff_io/.

21.

Sato, M., Satoh, S., Kusano, K., and Tanaka, Y. 1999. Design of OpenMP compiler for an SMP cluster . In: Proceedings of the 1st European Workshop on OpenMP (EWOMP'99), Lund, Sweden, September, pp. 32-39 . http://citeseer.nj.nec.com/sato99design.html.

22.

Sato, T. 2002. The Earth Simulator. In: SC2002, Baltimore Novembe,r 16-22. www.sc2002.org.

23.

Scherer, A., Lu, H., Gross, T., and Zwaenepoel, W. 1999. Transparent adaptive parallelism on NOWs using OpenMP . In: Proceedings of the Seventh Conference on Principles and Practice of Parallel Programming (PPoPP '99), May, pp. 96-106 .

24.

Shi, W., Hu, W., and Tang, Z. 1998. Shared virtual memory: A survey. Technical Report No. 980005, Center for High Performance Computing, Institute of Computing Technology, Chinese Academy of Sciences. www.ict.ac.cn/chpc/dsm/tr980005.ps.

25.

Shingu, S., Takahara, H., Fuchigami, H., Yamada, M., Tsuda, Y., Ohfuchi, W., Sasaki, Y., Kobayashi, K., Hagiwara, T., Habata, S.-i., Yokokawa, M., Itoh, H., and Otsuka, K. 2002. A 26.58 Tflops global atmospheric simulation with the spectral transform method on the Earth Simulator. In: SC2002, Baltimore, November 16-22. www.sc2002.org/paperpdfs/pap.pap331.pdf.

26.

Smith, L. and Bull, M. 2000. Development of mixed mode MPI/OpenMP applications . In: Proceedings of Workshop on OpenMP Applications and Tools (WOMPAT 2000), San Diego, July. www.cs.uh.edu/wompat2000/.

27.

Wellein, G., Hager, G., Basermann, A., and Fehske, H. 2002. Fast sparse matrix-vector multiplication for TeraFlop/s computers . In: Proceedings of VECPAR'2002, 5th International Conference on High Performance Computing and Computational Science, Porto, Portugal, June 26-28, part I, pp 57-70 . http://vecpar.fe.up.pt/.

28.

Uehara, H., Tamura, M., and Yokokawa, M. 2002. An MPI benchmark program library and its application to the Earth Simulator . In: Proceedings of the 4th International Symposium on High Performance Computing, ISHPC 2002, H. Zimaet al. (Eds.), Kansai Science City, Japan, May 15-17, Lecture Notes in Computer Science, 2327: 219-230 .