Anderson, E., Bai, Z., Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D.1990. LAPACK: a portable linear algebra library for high-performance computers., Technical Report CS-90-105 (LAPACK Working Note No. 20). Knoxville: University of Tennessee.
2.
Andersson, E., and Dongarra, J.1990a. Implementation guide for LAPACK. Technical Report CS-90-101 (LAPACK Working Note No. 18). Knoxville: University of Tennessee.
3.
Andersson, E., and Dongarra, J.1990b. Evaluating block algorithm variants in LAPACK . In Parallel processing for scientific computing, edited by J. Dongarra, P. Messina, D. Sorensen, and R. Voigt.Philadelphia : SIAM Publications , pp. 3-8.
4.
Bischof, C.1989. Adaptive blocking in the QR factorization. J. Supercomput.3:193-208.
5.
Bischof, C., Demmel, J., Dongarra, J., Du Croz, J., Greenbaum, A., Hammarling, S., McKenney, A., and Sorensen, D.1988. LAPACK provisional contents. Math. Comp. Sci. Report ANL-88-38 (LAPACK Working Note No. 5). Argonne, Illinois : Argonne National Laboratory.
6.
Bischof, C., and Van Loan, C.1987. The WY representation for products of Householder matrices . SIAM J. Sci. Statist. Comput.8:s2-s13.
7.
Cohen, E., King, G., and Brady, J.1988. Storage hierarchies. IBM Systems J.28(1):62-76.
8.
Daydé, M., and Duff, I.1990. Use of level 3 BLAS in LU factorization in a multiprocessing environment on three vector multiprocessors: the Alliant FX/80, the CRAY-2, and theIBM3090 VF. Technical Report. CERFACS.
9.
Dongarra, J., Du Croz, J., Hammarling, S., and Hanson, R.1988. An extended set of Fortran basic linear algebra subprograms . ACM Trans. Math. Software14:1-17, 18-32.
10.
Dongarra, J., Du Croz, J., Hammarling, S., and Duff, I.1990. A set of level 3 basic linear algebra subprograms . ACM Trans. Math. Software16:1-17, 18-28.
11.
Dongarra, J., Duff, I., Sorensen, D., and Van der Vorst, H.1991. Solving linear systems on vector and shared memory computers. Philadelphia: SIAM Publications.
12.
Dongarra, J., and Sorensen, D.1986. Linear algebra on high-performance computers. In High-performance computers 85, edited by U. Schendel.New York: North-Holland , pp. 3-32.
13.
Eriksson, J., Jacobson, P., Kågström, B., and Lindström, E.1990. The CONLAB environment:
14.
algorithm design for and simulation of MIMD architectures. In Parallel processing for scientific computing, edited by J. Dongarra, P. Messina, D. Sorensen, and R. Voigt.Philadelphia: SIAM Publications , pp. 406-412.
15.
Gallivan, K., Jalby, W., Meier, U., and Sameh, A.1988. Impact of hierarchical memory systems on linear algebra algorithm design. Int. J. Supercomput. Appl.2:12-48.
16.
Gallivan, K., Plemmons, R., and Sameh, A.1990. Parallel algorithms for dense linear algebra computations . SIAM Rev.32:54-135.
17.
Golub, G., and Van Loan, C.1989. Matrix computations, 2nd ed. Baltimore: Johns Hopkins Press.
18.
Ibm.1988. Engineering and scientific subroutine library guide and reference. SC23-0184-3.
19.
Jacobson, P.1990. The CONLAB environment. Report UMINF-173.90Umeå, Sweden: University of Umeå, Institute of Information Processing.
20.
Kågström, B., and Ling, P.1989. Level 2 and 3 BLAS routines forIBM3090 VF: implementation and experiences. In Vector and parallel computing, edited by J. Dongarra, I. Duff, P. Gaffney, and S. McKee. Chichester, England: Ellis-Horwood, pp. 229-240.
21.
Kågström, B., and Van Loan, C.1989. GEMM-based level-3 BLAS. Technical Report. Ithaca, New York: Cornell University, Department of Computer Science.
22.
Lawson, C., Hanson, R., Kincaid, R., and Krogh, F.1979. Basic linear algebra subprograms for Fortran usage . ACM Trans. Math. Software5:308-323.
23.
Ling, P.1990. A set of high performance level-3 BLAS structured and tuned for theIBM3090 VF and implemented in Fortran 77. Report UMINF-179.90. Umeå. Sweden: University of Umeå, Institute of Information Processing.
24.
Liu, B., and Strother, N.1988. Programming in VS Fortran on the IBM 3090 for maximum vector performance. IEEE Computer June (1988 ): 65-76.
25.
Moler, C., Little, J., and Bangert, S.1987. PRO-MATLAB user's guide. The MathWorksInc.
26.
Radicati, G., Robert, Y., and Sguazzero, P.1988. Block processing in linear algebra on the IBM 3090 vector multiprocessor. Supercomputer5(1):15-25.
27.
Sheikh, Q., and Liu, J.1990. Performance of block matrix factorization algorithms and LAPACK on CRAY Y-MP and CRAY-2. Cray Research Inc.
28.
Schreiber, R., and Van Loan, C.1989. A storage efficient WY representation for products of Householder transformations. SIAM J. Sci. Statist. Comput.10:53-57.
29.
Toomey, L., Plachy, E., Scarborough, R., Sahulka, R., Shaw, J., and Shannon, A.1988. IBM Parallel Fortran. IBM Systems J.27:416-435.
30.
Tucker, S.1986. The IBM 3090 system: an overview. IBM Systems J.25(1):4-20.