Sage Journals: Discover world-class research

Abstract

We study various implementations of block Gaussian elimination on full matrices and examine their perfor mance on three parallel computers, the Alliant FX/80, the CRAY-2, and the IBM 3090-400/VF. These imple mentations are expressed in terms of Level 3 BLAS matrix-matrix kernels. We consider the use of parallel Level 3 BLAS kernels and compare the parallelism ob tained within the computational kernels with that ob tained when parallelizing over the kernels. We show that the use of parallel Level 3 BLAS allows portability without sacrifice of efficiency, even in a parallel envi ronment, and that high speeds can be obtained if tuned versions of the kernels are available.

Get full access to this article

View all access options for this article.

References

Amestoy, P.R. , and Duff, I.S. 1989. Vectorization of a multiprocessor multifrontal code. Int. J. Supercomput. Appl. 3(3):41-59.

Amestoy, P.R. , and Duff, I.S. 1990. Efficient and portable implementation of a multifrontal method on a range of MIMD computers. Toulouse, France: CERFACS (in press).

Bischof, C. , and Van Loan, C. 1987. The WY representation for products of Householder matrices . SIAM J. Sci. Statist. Comput. 8:s2-s13.

Bischof, C. , Demmel, J. , Dongarra, J. , Du Croz, J. , Greenbaum, A. , Hammarling, S. , and Sorensen, D. 1988. LAPACK working note #5: provisional contents. Report ANL-88-38. Argonne, Ill.: Mathematics and Computer Science Division, Argonne National Laboratory .

Bucher, I. , and Jordan, T. 1984. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in computer methods for partial differential equations, edited by E. R. Vichnevetsky and R. Stepleman. New Brunswick, N.J.: IMACS, pp. 546-550.

Calahan, D.A. 1986. Block-oriented, local-memory-based linear equation solution on the CRAY-2: uniprocessor algorithms. In Proceedings international conference on parallel processing. Washington, D.C.: IEEE Computer Society Press, pp. 375-378.

Carnevali, P. , Radicati Di Brozolo, G. , Robert, Y. , and Sguazzero, P. 1987. Efficient FORTRAN implementation of the Gaussian elimination and

Householder reduction algorithms on the IBM 3090 vector multiprocessor. Report ICE-0012. Rome: IBM European Center for Scientific and Engineering Computing.

Daydé, M. , and Duff, I.S. 1989. Use of Level 3 BLAS in LU factorization on the CRAY-2, the ETA 10-P, and the IBM 3090/ VF. Int. J. Supercomput. Appl. 3(2):40-70.

10.

Dekker, E. 1989. Some aspects of the CRAY-2 architecture. Report TR 89/8. Toulouse, France: CERFACS .

11.

Demmel, J.W. , Dongarra, J.J. , Du Croz, J. , Greenbaum, A. , Hammarling, S. , and Sorensen, D.C. 1987. Prospectus for the development of a linear algebra library for high-performance computers. Report TM-97. Argonne, Ill.: Mathematics and Computer Science Division , Argonne National Laboratory.

12.

Dongarra, J.J. 1988. Performance of various computers using standard linear equations software in a Fortran environment. Report TM-23. Argonne, Ill.: Mathematics and Computer Science Division , Argonne National Laboratory.

13.

Dongarra, J.J. , Du Croz, J. , Hammarling, S. , and Hanson, R.J. 1988. An extended set of Fortran basic linear algebra subprograms . ACM Trans. Math. Softw. 14:1-17, 18-32.

14.

Dongarra, J.J. , Du Croz, J. , Duff, I.S. , and Hammarling, S. 1990a. A set of level 3 basic linear algebra subprograms . ACM Trans. Math. Softw. 16:1-17.

15.

Dongarra, J.J. , Du Croz, J. , Duff, I.S. , and Hammarling, S. 1990b. Algorithm 679. A set of level 3 basic linear algebra subprograms: model implementation and test programs. ACM Trans. Math. Softw. 16:18-28.

16.

Dongarra, J.J. , Gustavson, F.G. , and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26:91-112.

17.

Gallivan, K. , Jalby, W. , and Meier, U. 1987. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. Timely communications. SIAM J. Sci. Statist. Comput. 8:1079-1084.

18.

Gallivan, K. , Jalby, W. , Meier, U. , and Sameh A. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl. 2(1): 12-48.

19.

Gallivan, K. , Plemmons, R.J. , and Sameh, A.H. 1990. Parallel algorithms for dense linear algebra computations . SIAM Rev. 32:54-135.

20.

Ibm. 1986. Engineering and scientific subroutine library. Program number: 5668-863. IBM.

21.

Kågström, B. , and Ling, P. 1988. Level 2 and Level 3 BLAS routines for IBM 3090 VF/400: implementations and experiences. Report UMINF 154.88. Umea, Sweden: University of Umea.

22.

Lawson, C.L. , Hanson, R.J. , Kincaid, D.R. , and Krogh, F.T. 1979a. Basic linear algebra subprograms for Fortran usage . ACM Trans. Math. Softw. 5:308-323.

23.

Lawson, C.L. , Hanson, R.J. , Kincaid, D.R. , and Krogh, F.T. 1979b. Algorithm 539. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Softw. 5:324-325.

24.

Mayes, P. , and Radicati Di Brozolo, G. 1989. Portable and efficient factorization algorithms on the IBM 3090/VF. In Proceedings international conference on supercomputing. New York: ACM, pp. 263-270.

25.

Robert, Y. , and Sguazzero, P. 1987. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. Report ICE-0006. Rome: IBM European Center for Scientific and Engineering Computing.

26.

Sheikh, Q. , and Liu, J. 1989. Basic linear algebra subprogram optimization on the CRAY-2 system. Cray Channels, Spring 1989.

Use of Level 3 Blas in Lu Factorization in a Multiprocessing Environment On Three Vector Multiprocessors: the Alliant Fx/80,the Cray-2,and the Ibm 3090 Vf

Abstract

Get full access to this article

References