LAPACK (Linear Algebra PACKage) is a subroutine library for solving the most common problems in numerical linear algebra, designed to run efficiently on shared-memory vector and parallel processors. Only the general s...LAPACK (Linear Algebra PACKage) is a subroutine library for solving the most common problems in numerical linear algebra, designed to run efficiently on shared-memory vector and parallel processors. Only the general sequential code of LAPACK is available on INTERNET, the optimization of it on a special machine is very burdensome. To solve this problem, we develop an automatic parallelizing tool on SGI POWER Challenge, and it shows good results.展开更多
This paper shows two approaches to improve the performance of numeral al- gebra software by describing block algorithms in LAPACK. The block algorithms can make up higher level and more effcient BLAS programs. This pa...This paper shows two approaches to improve the performance of numeral al- gebra software by describing block algorithms in LAPACK. The block algorithms can make up higher level and more effcient BLAS programs. This paper further presents the relations between the effciency of the block algorithm and the size of block, and shows the relations relates to not only scale of algorithms and problems but also architectures and Characters of destination machines. Finally The paper gives the test results on Hitachi SR2201& SR8000.展开更多
文摘LAPACK (Linear Algebra PACKage) is a subroutine library for solving the most common problems in numerical linear algebra, designed to run efficiently on shared-memory vector and parallel processors. Only the general sequential code of LAPACK is available on INTERNET, the optimization of it on a special machine is very burdensome. To solve this problem, we develop an automatic parallelizing tool on SGI POWER Challenge, and it shows good results.
文摘This paper shows two approaches to improve the performance of numeral al- gebra software by describing block algorithms in LAPACK. The block algorithms can make up higher level and more effcient BLAS programs. This paper further presents the relations between the effciency of the block algorithm and the size of block, and shows the relations relates to not only scale of algorithms and problems but also architectures and Characters of destination machines. Finally The paper gives the test results on Hitachi SR2201& SR8000.