January 24, 2022
Editor's Note: This content is contributed by Rehan Tahir, Sr. Product LIne Manger for Versal AI Edge ACAPs at Xilinx
Up until very recently, any mention of TOPS always referred to dense TOPS. However, with the recent push to support zero-compression in sparse matrices, the term sparse TOPS has appeared. What is the difference between dense TOPS and sparse TOPS? And why should you care about sparsity? Let’s dive into these topics.
Artificial Intelligence (AI) is heavily dependent on Machine Learning (ML), and ML is almost entirely performed by multiplying matrices together. A matrix can represent an object, where nonzero values refer to pixels in an image, for example, and zero values represent blank space. These zero values can be compressed or eliminated, and that compression reduces the number of operations needed to multiply two matrices. The compression and elimination of these zero values is called sparsity.
Tera-Operations per second or TOPS is a rudimentary calculation where you simply assess the number of operations that your system can compute. TOPS can be determined by multiplying the number of operations per second by the clock frequency of the system. For example, a device that can perform 512 Multiply-Accumulate (MACs) operations per second running at 1 GHz has TOPS of 512 x 1GHz x 2 = 1024 TOPS. This number represents dense TOPS.
The number above was calculated without taking into consideration the improvement in performance that can be realized if the zero values in the matrices are compressed. If half of the zeroes are removed, you reduce the number of unnecessary operations by 50%, which results in a performance improvement of 2X. This is the definition of sparse TOPS. A matrix that has been compressed to eliminate zero-values is a sparse matrix, whereas a matrix with zero and nonzero values is a dense matrix.
Sparsity is powerful because it can theoretically improve system performance by up to 2X. However, it’s important to understand the difference between sparse TOPS and dense TOPS. When comparing systems or devices, make sure you don’t fall into the trap of comparing dense TOPS to sparse TOPS. Also, the theoretical performance improvement usually cannot be implemented in a practical system, so take any performance claims with a grain of salt. Using ML networks like ResNet50, Yolov3, MobileNet, etc., reveals much more about the performance of any AI chip than TOPS.
Results with batch size = 18, INT8 precision
Sparsity support is one of the key features in Xilinx’s AI Engines for machine learning (AIE-ML), available in the Versal® AI Edge and Versal AI Core adaptive compute acceleration platforms (ACAPs).
The TOPS numbers provided for the Versal AI Edge series in the product selection guide, and all associated collateral use dense TOPS. Here is an estimate of sparse TOPS achievable in each of the Versal AI Edge series devices: