X
Chat
Articles developer.amd.com

AMD ACL 1.0 Beta 2: A Slew of Features & Improvements

In August of this year, AMD released the first version of the AMD Compute Library (ACL), a consolidated package providing the clBLAS, clFFT, clSPARSE, and clRNG libraries under one roof. ...

Oct 21, 2015 Karthik 1 views

In August of this year, AMD released the first version of the AMD Compute Library (ACL), a consolidated package providing the clBLAS, clFFT, clSPARSE, and clRNG libraries under one roof. Encouraged by the response to the Beta 1 release, the team worked hard (and fast!) to release AMD ACL 1.0 Beta 2, which provides a slew of improvements in almost all of the libraries.

The main improvements in this release include:

clBLAS

AutoGemm, the new high-performing GEneric Matrix Matrix multiplication (GEMM) backend for clBLAS, is a suite of Python scripts that:360x500_greenorange

  • Generates thousands of optimized GEMM OpenCL kernels.
  • Benchmarks these kernels for a particular GPU and different matrix sizes to determine the fastest kernels.
  • Automatically chooses the optimal kernel within clBLAS for peak performance.
  • Allows applications with unique GEMM requirements (such as very small or very skinny matrices) to generate customized application-specific GEMM kernels for additional performance.
  • Stay tuned for a detailed blog post in our ACL blog series that covers significant performance improvements.

An updated DTRSM algorithm, which is:

  • Optimized for modern GPU architectures.
  • Available for both online and offline compilation.
  • Leverages the DGEMM performance improvement from AutoGemm.

clFFT

AMD ACL 1.0 Beta 2 provides significant improvements in the clFFT library, specifically:

  • Support for transforms of power-of-7 size and combinations of powers of 2, 3, 5 and 7 sizes (earlier only powers or combinations of powers of 2, 3 and 5), significantly increasing the range of input sizes supported.
  • A pre-callback feature that enables custom pre-processing of input data directly by the library with user callback function. This feature provides new APIs through which a user can pass a callback function as a string to incorporate pre-processing functionality (such as data format conversion) within the clFFT library itself. This eliminates an extra kernel call and provides roughly 1.5x-1.6x improvement over calling a pre-processing kernel separately before passing on data to the clFFT library, as this blog details.
  • Support for 1D large size transform computations with no extra memory allocation requirement for certain sizes. The memory saved allows larger-size transforms to be processed by the library.

clSPARSE

AMD ACL 1.0 Beta 2 includes a much improved version of clSPARSE, v0.8, which provides:

Important Links

Have fun.

All said, it’s still a beta. Please provide your feedback in our forum at: https://community.amd.com/community/devgurus/amd-compute-libraries.

Your feedback will help us define and improve the quality and capability of the AMD Compute Library.


Karthik Dakshinamoorthy is the program manager for AMD Compute Libraries. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

Read Next

Latest Articles

All Articles
Community

Comments