fbpx
Wikipedia

GotoBLAS

In scientific computing, GotoBLAS and GotoBLAS2 are open source implementations of the BLAS (Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types. GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center. As of 2003, it was used in seven of the world's ten fastest supercomputers.[1]

GotoBLAS
Original author(s)Kazushige Goto
Final release
2-1.13 / 5 February 2010; 14 years ago (2010-02-05)
TypeLinear algebra library; implementation of BLAS
LicenseBSD License

GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's Nehalem architecture (contemporary in 2008).[2] OpenBLAS is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science, ISCAS.

GotoBLAS was written by Goto during his sabbatical leave from the Japan Patent Office in 2002. It was initially optimized for the Pentium 4 processor and managed to immediately boost the performance of a supercomputer based on that CPU from 1.5 TFLOPS to 2 TFLOPS.[1] As of 2005, the library was available at no cost for noncommercial use.[1] A later open source version was released under the terms of the BSD license.

GotoBLAS's matrix-matrix multiplication routine, called GEMM in BLAS terms, is highly tuned for the x86 and AMD64 processor architectures by means of handcrafted assembly code.[3] It follows a similar decomposition into smaller "kernel" routines that other BLAS implementations use, but where earlier implementations streamed data from the L1 processor cache, GotoBLAS uses the L2 cache.[3] The kernel used for GEMM is a routine called GEBP, for "General block-times-panel multiply",[4] which was experimentally found to be "inherently superior" over several other kernels that were considered in the design.[3]

Several other BLAS routines are, as is customary in BLAS libraries, implemented in terms of GEMM.[4]

As of January 2022, the Texas Advanced Computing Center website[5] states that Goto BLAS in no more maintained and suggests the use of BLIS or MKL.

See also edit

References edit

  1. ^ a b c Markoff, John Gregory (2005-11-28). "Writing the Fastest Code, by Hand, for Fun: A Human Computer Keeps Speeding Up Chips". New York Times. Seattle, Washington, USA. from the original on 2020-03-23. Retrieved 2010-03-04. [1]
  2. ^ Milfeld, Kent. "GotoBLAS2". Texas Advanced Computing Center. from the original on 2020-03-23. Retrieved 2013-08-28.
  3. ^ a b c Goto, Kazushige; van de Geijn, Robert A. (2008). "Anatomy of High-Performance Matrix Multiplication". ACM Transactions on Mathematical Software. 34 (3): 12:1–12:25. CiteSeerX 10.1.1.111.3873. doi:10.1145/1356052.1356053. ISSN 0098-3500. (25 pages) [2]
  4. ^ a b Goto, Kazushige; van de Geijn, Robert A. (2008). "High-performance implementation of the level-3 BLAS" (PDF). ACM Transactions on Mathematical Software. 35 (1): 1–14. doi:10.1145/1377603.1377607.
  5. ^ "BLAS-LAPACK at TACC". Texas Advanced Computing Center. {{cite journal}}: Cite journal requires |journal= (help)

gotoblas, scientific, computing, open, source, implementations, blas, basic, linear, algebra, subprograms, with, many, hand, crafted, optimizations, specific, processor, types, developed, kazushige, goto, texas, advanced, computing, center, 2003, update, used,. In scientific computing GotoBLAS and GotoBLAS2 are open source implementations of the BLAS Basic Linear Algebra Subprograms API with many hand crafted optimizations for specific processor types GotoBLAS was developed by Kazushige Goto at the Texas Advanced Computing Center As of 2003 update it was used in seven of the world s ten fastest supercomputers 1 GotoBLASOriginal author s Kazushige GotoFinal release2 1 13 5 February 2010 14 years ago 2010 02 05 TypeLinear algebra library implementation of BLASLicenseBSD License GotoBLAS remains available but development ceased with a final version touting optimal performance on Intel s Nehalem architecture contemporary in 2008 2 OpenBLAS is an actively maintained fork of GotoBLAS developed at the Lab of Parallel Software and Computational Science ISCAS GotoBLAS was written by Goto during his sabbatical leave from the Japan Patent Office in 2002 It was initially optimized for the Pentium 4 processor and managed to immediately boost the performance of a supercomputer based on that CPU from 1 5 TFLOPS to 2 TFLOPS 1 As of 2005 update the library was available at no cost for noncommercial use 1 A later open source version was released under the terms of the BSD license GotoBLAS s matrix matrix multiplication routine called GEMM in BLAS terms is highly tuned for the x86 and AMD64 processor architectures by means of handcrafted assembly code 3 It follows a similar decomposition into smaller kernel routines that other BLAS implementations use but where earlier implementations streamed data from the L1 processor cache GotoBLAS uses the L2 cache 3 The kernel used for GEMM is a routine called GEBP for General block times panel multiply 4 which was experimentally found to be inherently superior over several other kernels that were considered in the design 3 Several other BLAS routines are as is customary in BLAS libraries implemented in terms of GEMM 4 As of January 2022 the Texas Advanced Computing Center website 5 states that Goto BLAS in no more maintained and suggests the use of BLIS or MKL See also editAutomatically Tuned Linear Algebra Software ATLAS Intel Math Kernel Library MKL References edit a b c Markoff John Gregory 2005 11 28 Writing the Fastest Code by Hand for Fun A Human Computer Keeps Speeding Up Chips New York Times Seattle Washington USA Archived from the original on 2020 03 23 Retrieved 2010 03 04 1 Milfeld Kent GotoBLAS2 Texas Advanced Computing Center Archived from the original on 2020 03 23 Retrieved 2013 08 28 a b c Goto Kazushige van de Geijn Robert A 2008 Anatomy of High Performance Matrix Multiplication ACM Transactions on Mathematical Software 34 3 12 1 12 25 CiteSeerX 10 1 1 111 3873 doi 10 1145 1356052 1356053 ISSN 0098 3500 25 pages 2 a b Goto Kazushige van de Geijn Robert A 2008 High performance implementation of the level 3 BLAS PDF ACM Transactions on Mathematical Software 35 1 1 14 doi 10 1145 1377603 1377607 BLAS LAPACK at TACC Texas Advanced Computing Center a href Template Cite journal html title Template Cite journal cite journal a Cite journal requires journal help Retrieved from https en wikipedia org w index php title GotoBLAS amp oldid 1172969675, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.