VainSoftGames | AMD ACL 1.0 Beta 2: A Slew of Features & Improvements

In August of this year, AMD released the first version of the AMD Compute Library (ACL), a consolidated package providing the clBLAS, clFFT, clSPARSE, and clRNG libraries under one roof. Encouraged by the response to the Beta 1 release, the team worked hard (and fast!) to release AMD ACL 1.0 Beta 2, which provides a slew of improvements in almost all of the libraries.

The main improvements in this release include:

clBLAS

AutoGemm, the new high-performing GEneric Matrix Matrix multiplication (GEMM) backend for clBLAS, is a suite of Python scripts that: 360x500_greenorange

Generates thousands of optimized GEMM OpenCL kernels.
Benchmarks these kernels for a particular GPU and different matrix sizes to determine the fastest kernels.
Automatically chooses the optimal kernel within clBLAS for peak performance.
Allows applications with unique GEMM requirements (such as very small or very skinny matrices) to generate customized application-specific GEMM kernels for additional performance.
Stay tuned for a detailed blog post in our ACL blog series that covers significant performance improvements.

An updated DTRSM algorithm, which is:

Optimized for modern GPU architectures.
Available for both online and offline compilation.
Leverages the DGEMM performance improvement from AutoGemm.

clFFT

AMD ACL 1.0 Beta 2 provides significant improvements in the clFFT library, specifically:

Support for transforms of power-of-7 size and combinations of powers of 2, 3, 5 and 7 sizes (earlier only powers or combinations of powers of 2, 3 and 5), significantly increasing the range of input sizes supported.
A pre-callback feature that enables custom pre-processing of input data directly by the library with user callback function. This feature provides new APIs through which a user can pass a callback function as a string to incorporate pre-processing functionality (such as data format conversion) within the clFFT library itself. This eliminates an extra kernel call and provides roughly 1.5x-1.6x improvement over calling a pre-processing kernel separately before passing on data to the clFFT library, as this blog details.
Support for 1D large size transform computations with no extra memory allocation requirement for certain sizes. The memory saved allows larger-size transforms to be processed by the library.

clSPARSE

AMD ACL 1.0 Beta 2 includes a much improved version of clSPARSE, v0.8, which provides:

A new single precision SpM-SpM (SpGEMM) function that allows multiplication of two sparse matrices, saving important space and computation costs in applications such as graph theory, linear solvers, and sparse signal processing. Stay tuned for a blog that details how this works, and the significant performance improvements over NVIDIA’s CUSPARSE SpGEMM, Stay tuned for a blog in our ACL blog post series that details how this works.
Enhanced sparse matrix conversion routines.
API documentation: (http://clmathlibraries.github.io/clSPARSE/)
Improved precision accuracy of SpM-dV routines (https://github.com/clMathLibraries/clSPARSE/wiki/Precision )

Important Links

clMath Github: https://github.com/clMathLibraries
clBLAS: https://github.com/clMathLibraries/clBLAS

AutoGEMM: Information regarding functionality: https://github.com/clMathLibraries/clBLAS/wiki/AutoGemm
Information regarding performance: https://github.com/clMathLibraries/clBLAS/tree/master/doc/performance/clBLAS_2.7.1/S9150

clFFT: https://github.com/clMathLibraries/clFFT
clSPARSE: https://github.com/clMathLibraries/clSPARSE
clRNG: https://github.com/clMathLibraries/clRNG
AMD Compute Library Blogs

Have fun.

All said, it’s still a beta. Please provide your feedback in our forum at: https://community.amd.com/community/devgurus/amd-compute-libraries.

Your feedback will help us define and improve the quality and capability of the AMD Compute Library.

Karthik Dakshinamoorthy is the program manager for AMD Compute Libraries. His postings are his own opinions and may not represent AMD’s positions, strategies or opinions. Links to third party sites, and references to third party trademarks, are provided for convenience and illustrative purposes only. Unless explicitly stated, AMD is not responsible for the contents of such links, and no third party endorsement of AMD or any of its products is implied.

AMD ACL 1.0 Beta 2: A Slew of Features & Improvements

clBLAS

clFFT

clSPARSE

Important Links

Latest Articles

Comments