Level 3 Blas in Lu Factorization On the Cray-2,Eta-10P,and Ibm 3090-200/Vf

Abstract

We study various implementations of block Gaussian elimination on full matrices and examine their perfor mance on three vector supercomputers, the CRAY-2, the ETA-10P, and the IBM 3090-200/VF. We show that the use of Level 3 BLAS kernels allows portability without sacrifice of efficiency and that good speeds can be ob tained if tuned versions of the kernels are available. In deed our results show that without using any assembler language outside the kernels we can approach the per formance of assembler-coded routines on all machines.

Get full access to this article

View all access options for this article.

References

Bischof, C. , and Van Loan, C. 1987. The WY representation for products of Householder matrices . SIAM J. Sci. Statist. Comput. 8:s2—s13.

Bucher, I. , and Jordan, T. 1984. Linear algebra programs for use on a vector computer with a secondary solid state storage device. In Advances in computer methods for partial differential equations, edited by E. R. Vichnevetsky and R. Stepleman. New Brunswick, N.J.: IMACS, pp. 546-550.

Calahan, D.A. 1986. Block-oriented, local-memory-based linear equation solution on the CRAY-2: uniprocessor algorithms. In Proc. of ICPP 1986. Washington D.C.: IEEE Computer Society Press, pp. 375-378.

Carnevali, P. , Radicati di Brozolo, G. , Robert, Y. , and Sguazzero, P. 1987. Efficient FORTRAN implementation of the Gaussian elimination and Householder reduction algorithms on the IBM 3090 vector multiprocessor . Report ICE-0012. Rome: IBM European Center for Scientific and Engineering Computing.

Dayde, M.J. , and Duff, I.S. 1989. Use of Level 3 BLAS in LU factorization in a multitasking environment on three vector multiprocessors, the CRAY-2, the IBM 3090 VF, and the Alliant FX/80. Technical Report. Toulouse: CERFACS.

Demmel, J.W. , Dongarra, J.J. , Du Croz, J. , Greenbaum, A. , Hammarling, S. , and Sorensen, D.C. 1987. Prospectus for the development of a linear algebra library for high-performance computers. Report TM-97. Argonne, Ill.: Mathematics and Computer Science Division , Argonne National Laboratory.

Dongarra, J.J. 1988. Performance of various computers using standard linear equations software in a Fortran environment. Report TM 23. Argonne, Ill.: Mathematics and Computer Science Division , Argonne National Laboratory.

Dongarra, J.J. , Du Croz, J. , Hammarling, S. , and Hanson, R.J. 1988a. An extended set of Fortran basic linear algebra subprograms . ACM Trans. Math. Software 14:1-17; 18-32.

Dongarra, J.J. , Du Croz, J. , Duff, I.S. , and Hammarling, S. 1988b. A set of Level 3 basic linear algebra subprograms . Report AERE R 13297. Oxford: Computer Science and Systems Division, Harwell Laboratory . To appear in ACM Trans. Math Softw.

10.

Dongarra, J.J. , Du Croz, J. , Duff, I.S. , and Hammarling, S. 1988c. A set of Level 3 basic linear algebra subprograms: model implementation and test programs. Report AERE R 13298. Oxford: Computer Science and Systems Division, Harwell Laboratory, Oxfordshire. To appear in ACM Trans. Math Softw.

11.

Dongarra, J.J. , Gustavson, F.G. , and Karp, A. 1984. Implementing linear algebra algorithms for dense matrices on a vector pipeline machine. SIAM Rev. 26:91-112.

12.

Gallivan, K. , Jalby, W. , and Meier, U. 1987. The use of BLAS3 in linear algebra on a parallel processor with a hierarchical memory. Timely communications. SIAM J. Sci. Statist. Comput. 8:1079-1084.

13.

Gallivan, K. , Jalby, W. , Meier, U. , Sameh, A. 1988. Impact of hierarchical memory systems on linear algebra algorithm design. Internal. J. Supercomput. Appl. 2(1):12-48.

14.

Ibm. 1986. Engineering and Scientific Subroutine Library. Program Number: 5668-863, IBM.

15.

Lawson, C.L. , Hanson, R.J. , Kincaid, D.R. , and Krogh, F.T. 1979a. Basic linear algebra subprograms for Fortran usage . ACM Trans. Math. Software 5:308-323.

16.

Lawson, C.L. , Hanson, R.J. , Kincaid, D.R. , and Krogh, F.T. 1979b. Algorithm 539. Basic linear algebra subprograms for Fortran usage. ACM Trans. Math. Software 5:324-325.

17.

Robert, Y. , and Sguazzero, P. 1987. The LU decomposition algorithm and its efficient Fortran implementation on the IBM 3090 vector multiprocessor. Report ICE-0006. Rome: IBM European Center for Scientific and Engineering Computing.