Xilinx is now part ofAMDUpdated Privacy Policy

What's New in Vitis™

2022.1

Vitis Software Platform 2022.1 Release Highlights:

Vitis™ Flow Enhancement for Versal® ACAP and AI Engine

  • Supports Xilinx base DFX platform with one static region and one DFX region
  • AIE profiling supports stall/deadlock detection, generates AI Engine status (including error events) view reports​ in Vitis Analyzer
  • External Traffic Generators in x86sim, AIEsim, and SW emulation are much more flexible and can be inserted very easily in Simulation and Emulation flows
  • Vitis Model Composer supports Hardware Validation, Linux and HW emulation

Vitis for DC and Vitis HLS

  • Vitis Provides additional reporting support for the dynamic region generation process and Flow reporting enhancements include 3 new or updated reports
  • Vitis Improves PL profiling with the choice of offloading trace to memory resources (preferred) or FIFO in the PL for better performance
  • A new Timeline Trace Viewer to show the runtime profile and allows user to remain in the Vitis HLS GUI is now available after simulation
  • Vitis HLS now supports a higher-level type of "smart" construct via the new performance pragma or the set_performance_directive
  • Vitis Graph Library with L3 API enhancements (1 mS time saved for kernel call) for performance

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2022.1. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2022.1 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

  • new Genomics Accelerator Library Added (L1&L2 and L3
  • Graph Library, L3 enhancements for performance
  • Vitis Database Library, GQE Multi-Functional Kernel
  • New functions added in Vision Library
  • New functions in Vitis AIE Vision Library additions/enhancements
  • Vitis AIE DSP library, FIR resampler supersedes FIR fractional interpolator
  • Vitis Codec Library new APIs, API jxlEnc, API ‘leptonEnc’, API ‘resize’, API ‘WebpEnc’
  • ZLIB Compress Improvement, Customized Octa-Core compression for 8KB solution
  • ZLIB Decompression Improvement, Customized IP for 8KB file size
  • Platform Capability Query Improvement
  • HBM Easy-of-Use Improvement, Ability to choose a specific S_AXI entry point to the HMSS for a kernel M_AXI, RAMA insertion supported from the configuration files
  • AI Engine Automated Stall/Deadlock Detection & Analysis in Hardware
  • Analyzing the Automated Status Output
  • Analyzing the Automated Status Output – Buffers
  • Analyzing the Manual Status Output in Hardware
  • Analyzing the Manual Status Output
  • AI Engine Event Trace Enhancements
  • External traffic generators AIEsim
  • AI Engine Profiling Improvements on HW
  • AI Engine support for Broadcast windows
  • Vitis AI Engine Compiler Enhanced Graph Programming Model
  • Vitis AI Engine Compiler - PLIO/GMIO in ADF Graphs
  • Analysis Enhancements, New Timeline Trace Viewer
  • Coding Style Enhancements, Array Partition support for Stream of Blocks type
  • Pragma Abstraction, New Performance Pragma (and directive)
  • Vitis Core “one liner”, Vitis HLS - New Timeline Trace Viewer,  new PERFORMANCE pragma,  Stream of Blocks support windows
  • New Viewer introduced
    • Shows the runtime profile of all surviving functions in your design - i.e., those that get converted into modules
    • Especially useful to see the behavior of dataflow regions after Co-simulation
    • Native to Vitis HLS - No need to launch the xsim waveform viewer anymore (external tool)
  • Vitis Analyzer Improvement, Save/Restore Timeline Customization
  • Reporting Enhancement, report_qor_assessment, xclbin Clocking Information, Vivado Automation Summary
  • Profiling Enhancement, New PL profiling infrastructure enabled, Multiple trace_memory options can be added to insert multiple memory monitors (HW Only), Sample config file for v++ linker to offload trace data for all CUs in SLR0 to DDR0 and same for all CUs in SLR1 to DDR1 
  • Updated Bootgen GUI for Versal
  • Toolchain Update
  • XSCT, Support STAPL, Add Linker script generation command
  • System Compile Flow, Refer to system compile doc
  • Add Software Emulation support for Auto-restart and mailbox support for always running kernels
  • Free running kernel doesn’t need while(1) for sw-emu
  • Add Software Emulation support for external traffic generator
  • Hardware Emulation can use HLS C source code function model for Streaming IP.
  • Add API xrt::system for Probing number of devices
  • Add API xrt::message for Logging messages
  • XRT Native API host code now requires
    -std=c++17 or above
  • Add experimental xrt::queue APIs for asynchronous execution of synchronous operations
  • xbutil can show AIE FIFO counters that helps to debug AIE deadlock scenarios
  • xbutil --legacy option is removed.           
  • xclbinutil --info provides clock information for embedded platforms
  • xbutil on ARM can load SOM images
  • xbtop standalone utility to show linux top like output (replacing legacy xbutil -top)
  • XRT Utilities supports auto-completion in Bash with tab key.
  • Alveo Platform Updates, Platform Updates for improved stability, Card Management Updates, SC Firmware Update Tool
  • Embedded Platform, New VCK190 DFX Platform: xilinx_vck190_base_dfx_202210_1, Embedded Platforms are now installed with Vitis, Vivado adds a new Customizable Example Design: Vitis Platform for MPSoC
  • Major overhaul of the Vitis Model Composer hub block for scalability and ease of use
  • Hardware validation flow now supports Linux in addition to bare-metal
  • "AIE to HDL" and "HDL to AIE" blocks no longer include the HDL gateway blocks
  • 2022.1 now ships with a snapshot of the examples for customers who do not have access to the internet. The tool will prompt the user to download a new revision of the examples from GitHub if available
  • For ease of use, utility blocks that are not part of code generation are now presented with a white background color
  • Enhanced and reorganized the library browser for ease of use
  • RHEL 8.x support
  • MATLAB Support - R2021a and R2021b
2021.2

Vitis Software Platform 2021.2 Release Highlights:

  • New domain specific development environments
    • Vitis™ Video Analytics SDK on Kria™ SOM, Alveo™ U30/U50, and VCK5000 Versal™ development card:  Learn More > 
    • Vitis Blockchain solution on Varium™ C1100 card with Vitis libs:  Learn More >
  • Full end to end flow support for VCK5000 and Varium C1100 cards
  • Enhanced core tool features 
    • Vitis AI Engine Compiler C/C++ high level abstraction API, Auto Pragma Inference, Area Group Constraints 
    • Vitis AI Engine x86simulator enhancements: Trace Report, Memory Access Violation and Deadlock Detection
    • Vitis HLS EoU, Timing and QoR enhancement, HLS APIs for user-controlled burst inferencing 
    • Enhanced Vitis Analyzer for better timeline trace report, data visualization, stall analysis
    • Vitis XRT for AI Engine Multiple Process and Multi Thread Support for AI Engine graph control
    • Vitis IDE & Emulation support AI Engine Trace, SW Emulation for AI Engine applications
  • 39 new C/C++ library in diverse domains covering in DSP, Data Analytics, Vision, Compression, Database, Graph, Security, … total of over 1000 library functions, Database, Graph, Security, …
  •  Vitis Model Composer 
    • 3x compile/simulation time, 7x compilation time reduction with Parallel Compilation
    • New Hardware Validation Flow and Enhanced Functional Co-simulation

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2021.2. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2021.2 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

Library 2021.1 2021.2 New functions in 21.2
xf_blas 167 167 0
xf_codec 3 3 0
xf_DataAnalytics 33 36 3
xf_database 62 65 3
xf_compression 78 93 15
xf_dsp 94 96 2
xf_graph 53 59 6
xf_hpc 37 37 0
xf_fintech 116 116 0
xf_security 135 140 5
xf_solver 11 11 0
xf_sparse 11 11 0
xf_utils_hw 55 57 2
xf_opencv 147 150 3
total 1002 1041 39

Note: For vision, just count the number of sub folders in L*/tests, because each API  has multiple tests for different types

Programmable Logic (PL)

  • End-to-end Mono Image Processing(ISP)with CLAHE TMO
  • RGB-IR along with RGB-IR Image Processing(ISP) pipeline
  • Global Tone Mapping(GTM) along with an ISP pipeline using GTM
New Features Cat Customer/Strategic Segments Description
RGB-IR ISP Seeing Machines Automotive, ISM •Support 4x4 RGB-IR demosaicking
•Primarily for in-cabin monitoring system
•Low light surveillance camera
Mono (CCCC) ISP Strategic Automotive, ISM, A&D •Machine vision
•Low light applications
Global Tone Mapping (GTM) ISP Strategic Automotive, ISM, A&D •Improved dynamic range and contrast
•Lower cost version compared to local tone mapping (LTM)
Dense Optical Flow TV-L1 CV NTT ISM •Improved robustness (against illumination, noise, occlusions) for optical flow

AI Engine (AIE)

  • BlobFromImage
  • Back to back filter2D with batch size three support
New Features Cat Customer/Strategic Segments Description
RGB-IR ISP Seeing Machines Automotive, ISM •Support 4x4 RGB-IR demosaicking
•Primarily for in-cabin monitoring system
•Low light surveillance camera
ML+X ISP Strategic Automotive, ISM, A&D •ML interference pre-processing
Gaussian Pyramid CV Strategic Automotive, ISM, A&D •Fundamental for multi-scale image processing
Box Filter CV Strategic Automotive, ISM, A&D •Fundamental for smoothing, low pass filter

Vitis Blockchain Solution based on Vitis libraries

  • Out-of-Box Mining solutions for Ethereum
  • Open-Source & easy to use and deploy with Vitis Libs using C++
  • Flexible & Scalable with Vitis Libs
  • Be flexible to mine multiple coins
  • Customize and compile into hardware
  • Highly optimized design

Adding CSV parser API into library

  • CSV parser could parse comma-seperated value files and generate object stream which could easily be connected with DataFrame APIs
  • New L2 libraries added
  • Louvain with renumber
  • Renumbering
  • The ‘weight’ feature is supported for Cosin Similarity
  • GQE start to support asynchronous input / output feature, along with multi-card support.
    • Asynchronous support will allow the FPGA start to process as soon as part of the input data is ready. 
    • Multi-card support allows to identify multiple Alveo cards that suitable for working.
  • ZSTD Mult-Core Compression
    • Created new ZSTD multi-core architecture and provided >1GB/s throughput using quad-core.
  • ZSTD Decompress optimization
    • ZSTD decompress optimized for performance (increased by 20%) and resource (reduced  < 30%)
  • GZIP/ZLIB Stream Core Improvement for IBM
    • Customized Static & Dynamic compress streaming IP (4KB & 8KB)
    • Added functionality to provide compressed size in TUSER port
  • GZIP/ZLIB Decompress Improvement for IBM
    • Optimized huffman decoder to reduce latency < 1.5K cycles
    • Reduced resources significantly  from to 6.9K (older > 9K)
    • Added ADLR32 Checksum Functionality
  • GZIP System Compiler PoC
    • Created a System Compiler PoC for GZIP Compress solution and benchmarked against OpenCL Host.
  • DSPLib on Github since 2021
  • Fast Fourier Transform (FFT/iFFT)
    • Point size increase to 32k (data type dependent)
    • Support for stream API as well as window API.
    • Parallel Power (0-4)
      • Allows higher throughput and extends range of supported point sizes
  • FIR Filters
    • Initial Stream support for Single Rate asymmetric / symmetric FIR
  • DDS/Mixer
    • New library unit  in 2021.2

 

KECCAK-256 (hash function) and CRC32C (checksum function) are released

Two Data-Mover implementation are added for debugging hw issue.

  • LoadDdrToStreamWithCounter: For loading data from PL’s DDR to AI Engine through AXI stream and recording the data count sending to AI Engine.
  • StoreStreamToMasterWithCounter: For receiving data from AI Engine through AXI stream and saving them to PL’s DDR, as well as recording the data count sending to DDR.

AI Engine API

  • Implemented as a C++ header-only library that provides types and operations that get translated into efficient AI Engine intrinsics.
  • Provides parametrizable data types that enable generic programming
  • Implements most common operations in a uniform way for different data types
  • Transparently translates higher-level primitives into optimized AI Engine intrinsics
  • Improves portability across AI Engine architectures

AI Engine API will be the lead method for AI Engine kernel programming

High Level Optimizations

AI Engine compiler optimization options

  • --xlopt=0, no optimization applied.
  • --xlopt=1, automatic computation of heap size, guidance generation from LLVM IR analysis.
  • --xlopt=2, automatic inlining, loop peeling for unrolled loops, pragma insertion.

Introducing --xlopt=2 to improve performance, default remains --xlopt=1

  • Automatic inline
    • Automatically inlines functions if it is practical and possible to do so, even if the functions are not declared as __inline or inline
  • Automatic pragma insertion
    • Insert pragmas to kernel code automatically. (see next slide for more details)

Pragma Inference

Necessary for optimizing the kernels

  • Alleviate user’s responsibility of adding effective & correct chess pragmas

Support to auto-infer five pragmas in 2021.2

  • for performance:
    • chess_prepare_for_pipelining for innermost loop, and outer loops with known trip count
    • chess_loop_range for loops with known trip count
    • chess_unroll_loop/chess_flatten_loop for innermost loops with known trip count
  • for correctness:
    • chess_unroll_loop_preamble when trip count is not a multiple of unroll factor

Updated Graph Programming Model PLIO and GMIO

Model Changes Include:

  • Changes to usage of “simulation::platform”
  • Interaction with PLIO/GMIO objects in the graph, position determines input/output.
  • Changes of global PLIO/GMIO objects in the graph.
  • Changes around graph connect<> statements.

PLIO/GMIO in ADF Graphs

Current

  • Write PLIO, GMIO, simulation::platform, and connections at global scope

GMIO gm0(“GMIO_In0”, 64, 1);

GMIO gm1(“GMIO_In1”, 64, 1);

GMIO gm7(“GMIO_In7”, 64, 1);


PLIO pl0(“PLIO_Out0”, plio_32_bits, “data/output0.txt”, 250.0);

PLIO pl1(“PLIO_Out1”, plio_32_bits, “data/output1.txt”, 250.0);

PLIO pl7(“PLIO_Out7”, plio_32_bits, “data/output7.txt”, 250.0);

simulation::platform<8,8> plat(&gm0, &gm1,…, &gm7, &pl0, &pl1,…, &pl7,);

subgraph g;

connect<> net0(plat.src[0], g.in[0]);

connect<> net1(plat.src[1], g.in[1]);

connect<> net7(plat.src[7], g.in[7]);

connect<> net8(g.out[0], plat.sink[0]);

connect<> net9(g.out[1], plat.sink[1]);

connect<> net15(g.out[7], plat.sink[7]);

Alternative method

  • Create a top-level graph and move PLIO, GMIO, and connections inside
  • Allow managing connections within for loop

class topgraph

{

  input_gmio gm[8];

  output_plio pl[8];

  subgraph sg;

  topgraph()

  {

    for (i=0; i<8; i++)

    {

      gm[i] = input_gmio::create(“GMIO_In”+std::to_string(i), 64, 1);
      pl[i] = output_plio::create(“PLIO_Out”+std::to_string(i), plio_32_bits, “data/output”+std::to_string(i)+”.txt”, 250.0);
      connect<>(gm[i].out[0], sg.in[i]);
      connect<>(sg.out[i], pl[i].in[0]);

    }

  }

};

topgraph g;

Area Group Constraints Improvements

Ability to use flags in the ADF graph or constraints file to control the mapper and router

  • -contain_routing – when specified true ensures all routing, including nets between nodes contained in the nodeGroup, is contained within the area group.
  • -exclusive_routing - when specified true ensures all routing, excluding nets between nodes from the nodeGroup, is excluded from the area group.
  • -exclusive_placement - when specified true prevents all nodes not included in the nodeGroup from being placed within the area group bounding box.

Snapshots

Snapshots are textfiles containing comments and data relative to all kernel ports

  • streams, packet streams, cascade streams
  • windows, buffer
  • RTP

Includes also all platform ports

  • PLIO, GMIO, RTP

Allows users to inspect data traffic at kernel ports without using the debugger and without requiring instrumentation of kernel code

Deadlock Detection

  • Detects deadlocks in x86 simulations whether this situation arises from insufficient input data, or an imbalanced FIFO depth on a re-convergent path
  • The stop-on-deadlock feature must be  enabled during x86 simulation by specifying option --stop-on-deadlock
  • If the simulation is stopped because of a deadlock, the error message indicates that you should rerun with option -trace --timeout 

Memory Access Violation Detection

Integration with Valgrind for Memory Access Violation Detection

  • Detect
    • out-of-bounds read and write
    • read of uninitialized memory
  • No specific flag required for compilation
  • Simulation flags can be either
    • --valgrind : simulation runs as usual and valgrind displays a report
    • --valgrind-gdb : same thing but with gdb debug at the same time

Trace report

Deadlock situation results in poor simulation output and difficulties to analyze bug origin

X86 simulation trace option allows the simulator to log various timestamped information:

  • Start/End of Kernel iterations
  • Start/End of Stream stalls
  • Start/End of lock stall

Timestamps are different in between x86 simulation and AI Engine simulation

User Controlled Burst Inference 

  • For use cases that do not satisfy the automatic burst inference by Vitis HLS tool, user can adopt the newly introduced manual burst optimization
  • A new class 'hls::burst_maxi’ to support manual controlling burst behavior. New HLS APIs are provided to use together with the new class
  • User need to understand AXI AMBA protocol and the hardware transaction level modeling in HLS design

Timing and QoR Enhancements

  • Provide support for user to input high level throughput constraints
  • Improve HLS timing estimation accuracy. When HLS reports timing closure, the RTL synthesis in Vivado should also expect to meet timing

EoU Enhancements

Add interface adaptors report in the C synthesis reports

  • Users need to know the resource impact that interface adaptors have on their design
  • Interface adaptors have variable properties that impact design QoR
  • Some of these properties have associated user controls which should be reported to users
  • Text version of bind_op and bind_storage reports are provided

Add new section in synthesis report to show list of pragmas and warnings on pragmas

  • User can easily understand which of the pragmas that add have issues.

Analysis and Reporting Enhancements

The Function Call Graph Viewer has some new features

  • New mouse drag based zoom in and out capability
  • New Overview feature that shows the full graph and allows the user to zoom in on parts of the overall graph
  • All functions and loops are shown along with their simulation data

A new Timeline Trace Viewer is now available after simulation. This viewer shows the runtime profile of your design and allows the user to remain in the Vitis HLS GUI.

Link Summary Enhancement

  • Provide clock frequency information for the AI Engine, platform and compute units
  • Provide a new table called Clocks in system diagram and platform diagram

Platform Export Enhancement

  • XSA export from Vivado no source files required to be local to the project
  • XSA export from Vivado no change to the project structure
  • Package the IPs that are used in the hardware platform project instead of packaging the whole IP repo

AI Engine application emulation enhancements

  • Provide support for external testbench integration with aiesimulation
  • Provide support for external testbench integration with x86simulation
  • Support for GDB debugging with x86simulation
  • Provide support for snapshots of the data between kernels in a graph for x86simulation
  • Provide support for access violation checking to x86sim
  • Provide support for stop on deadlock to x86sim

Support AI Engine Trace

Support SW Emulation for AI Engine applications

Support external traffic generator in Verilog / System Verilog

Extend Profiling Monitor insertion to Monitor Memory

  • Currently the profiling monitor logic can be inserted on kernel/CU port basis. This feature provides user the option to insert monitor logic on memory interface directly
  • The visualization of memory bandwidth achieved directly on the memory interfaces can be reflected in profile summary report
  • DDR memory and PLRAM are supported
  • Hardware flow is supported
  • To enable this feature, both linking phase and xrt need to be set up
    • memory=all
    • data_transfer_trace= coarse|fine or
    • opencl_device_counter=true

Extend Profiling Monitor insertion to Monitor Memory

  • A vadd example that enables memory interface monitoring
    • A new table ‘Memory Bank Data Transfer’ is included

Vitis Analyzer Enhancements

Generic profile summary report generated for non-OpenCL applications

  • Provide the same level of support for XRT API and HAL API applications.
  • Users select which types of reports they want to create, the tool automatically generate and visualize them in Vitis Analyzer

Add OpenCL commands to PL event timeline

  • Profiling will add overhead, XRT provides capability to dump the OpenCL events on the timeline trace without overhead.
  • Vitis Analyzer can process the XRT output and show it in timeline trace view.
  • xocl_debug=true needs to set in the xrt.ini.

Flatten signal hierarchy in timeline trace report

  • By default, the timeline trace report displays the signal trace in hierarchical way
  • Vitis Analyzer provides the capability of flattening the hierarchy by toggling the “Flatten Signal” symbol
  • Comparing the waveform is supported for flattened timeline trace

Vitis Analyzer – Data Visualization

  • Display input/output data to AI Engine kernels in an AI Engine design
    • Helps debug AI Engine designs to show input/output data along with timeline
  • Works with aiesimulator
  • Supports
    • Window/stream/cascade data types
    • Packet streams
    • Templated kernels
    • data-dump utility

Vitis Analyzer – AI Engine Stall Analysis

  • Vitis Analyzer provide visualization capabilities to enable users to identify root cause of stalls
  • Support
    • Performance Metrics
    • Lock Stall Analysis
    • Stream Stall Analysis
    • Cascade Stall Analysis
    • Memory Stall Analysis
  • Support Flow
    • aiesimulator
    • HW emulation

Xilinx Runtime Library (XRT): www.xilinx.com/xrt

  • XRT API
    • The XRT native API supports user managed kernel control with xrt::ip
  • XRT Utilities
    • The xbutil and xbmgmt tools now becomes default
      • To use the legacy utilities, please use xbutil --legacy or xbmgmt --legacy with legacy sub-commands
    • New utility, xball
      • Apply xbutil or xbmgmt commands to all or a filtered part of the installed data center cards. Check xball --help for details
    • A new command, xbutil configure
      • Allow you to enable, disable, or configure the PCIe Host Memory and PCIe Peer to Peer features. See the XRT documentation for more details
    • All XRT utilities now globally support the --force option to skip user interactive confirmation
  • Profiling
    • A profile summary report is generated when any profiling option is enabled.
    • All applicable summary tables and guidance are generated based on the profiling options enabled in the xrt.ini file
    • New data transfer summary table for aggregate information on a memory resource when monitors are added to memory resources in the design
    • New AIE profiling metric sets to count different AIE events including (1) floating point exceptions in AIE, (2) tile execution counts, and (3) stream puts and gets
  • Embedded
    • zocl memory manager improvements to support any sptag

Vitis XRT for AI Engine Multiple Process Support

  • C and C++ APIs to define access modes for multiple processes to share access to the same AI Engine array and graphs.
    • ¬Protect AI Engine array & graphs from unwanted access.
  • Three modes are supported for opening AI Engine array & graphs
    • Exclusive Mode (prevent any other processes to access)
    • Primary Mode (only allow other processes to do nondestructive access)
    • Shared Mode (only do nondestructive access)
  • Take into consideration when multiple process support is needed. For example:
    • Prevent others to access AI Engine array(exclusive access)
    • Multiple users to control different graphs separately (multiple application support)
    • One primary user to control graph, and allow others to probe the running status (primary & shared access)

Vitis XRT for AI Engine Support Status

C and C++ APIs

  • C version API 
    • For AI Engine array:
      • xrtAIEDeviceOpenExclusive (Exclusive mode)
      • xrtAIEDeviceOpen (Primary mode)
      • xrtAIEDeviceOpenShared (Shared mode)
    • For AI Engine graph:
      • xrtGraphOpenExclusive (Exclusive mode)
      • xrtGraphOpen (Primary mode)
      • xrtGraphOpenShared (Shared mode)
  • C++ version API 
    • xrt::aie::device class support access mode in constructor
      • enum class access_mode : uint8_t { exclusive = 0, primary = 1, shared = 2 };
    • xrt::graph class support access mode in constructor
      • enum class access_mode : uint8_t { exclusive = 0, primary = 1, shared = 2, none = 3 };

Access latest Vitis Target Platforms for Alveo Cards:

Refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide 

AI Engine DSP Library – New Blocks

  • AIE DDS
  • AIE Mixer

Parallel Compilation

Reduced times vs. 2021.1 (As an example, the following numbers are for the 200 MHz TX Chain):

  • Time to compile and simulate reduced by factor of 3
  • Compilation times reduced by a factor of 7
  • Dead time after simulation reduced from 25s to ~0s

Constraint Editor Enhancement

  • 2021.2 Improved Navigation

To Fixed Size Improvements

To Variable Size Block  Improvements

Enhanced Functional Co-simulation Capabilities

  • Export Matlab data for AI Engine input – xmcVitisWrite
  • Import AI Engine Data into Matlab – xmcVitisRead
  • Import AI Engine Data into Matlab - xmcVitisRead

Others

  • Import an AI Engine or HLS Kernel block with no input (Source block)
  • New Data Type Support
    • the Simulink native int64 and uint64 for AI Engine development instead of Xilinx data types, x_sfix64 and x_ufix64.
    • accfloat and caccfloat for AI Engine Development
  • Support for Ubuntu 20.04
  • Support for MATALB 20a, 20b, 21a (No support for MATLAB 21b)
  • Addition of new examples
    • Dual stream SSR filter example with 64 kernels
    • Pseudo inverse(64x32) – commslib example.
  • Use xmcLibraryPath command to point to a custom DSPLib location.
  • Many more enhancements and bug fixes
2021.1

Vitis Software Platform 2021.1 Release Highlights:

  • Xilinx Kria System-on-Modules (SOMs) KV260 vision AI starter kit support. The full Vitis flow for ML (DPU inference engine) + X (RTL kernel and Vitis HLS based computer vision kernels). Learn More >
  • Support for new C/C++ Vision, DSP, Graph (Louvain Modularity), Codec in image processing, compression (GZIP, Facebook ZSTD, ZLIB whole application acceleration) performance-optimized libraries on FPGA and/or Versal ACAP over CPU/GPUs
  • Enhanced Vitis™  core development kit design flow on Versal ACAP devices: visualization improvements for AI engine design trace report, AI engine event tracing via GMIO, incremental recompile, new boot image wizard, and encrypted AI engine source file support
  • The new Vitis Model Composer tool enables rapid design exploration and verification within the MathWorks MATALB and Simulink® environment, enabling co-simulation of blocks targeting AI Engines and Programmable Logic, code generation, and test bench creation.  Learn More >
  • New Vitis HLS Flow Navigator GUI for quick access to flow phases and reports. Merge synthesis, analysis, and debug views into a general default context

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2021.1. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2021.1 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

  • AIE DSP
    • DSPLib published as part of the Vitis Acceleration Library set on Github
    • DSPLib contains common parameterizable DSP functions used in many advanced signal processing applications. All functions currently support window interfaces with streaming interface support.
      • FIR Filters

        Function

        Namespace

        Single rate, asymmetrical

        dsplib::fir::sr_asym::fir_sr_asym_graph

        Single rate, symmetrical

        dsplib::fir::sr_sym::fir_sr_sym_graph

        Interpolation asymmetrical

        dsplib::fir::interpolate_asym::fir_interpolate_asym_graph

        Decimation, halfband

        dsplib::fir::decimate_hb::fir_decimate_hb_graph

        Interpolation, halfband

        dsplib::fir::interpolate_hb::fir_interpolate_hb_graph

        Decimation, asymmetric

        dsplib::fir::decimate_asym::fir_decimate_asym_graph

        Interpolation, fractional, asymmetric

        dsplib::fir::interpolate_fract_asym:: fir_interpolate_fract_asym_graph

        Decimation, symmetric

        dsplib::fir::decimate_sym::fir_decimate_sym_graph

         

      • FFT/iFFT - The DSPLib contains one FFT/iFFT solution. This is a single channel, single kernel decimation in time, (DIT), implementation with configurable point size, complex data types, cascade length and FFT/iFFT function.

        Function

        Namespace

        Single Channel FFT/iFFT

        dsplib::fft::fft_ifft_dit_1ch_graph

      • Matrix Multiply (GeMM) - The DSPLib contains one Matrix Multiply/GEMM (GEneral Matrix Multiply) solution. This supports the Matrix Multiplication of 2 Matrices A and B with configurable input data types resulting in a derived output data type.

        Function

        Namespace

        Matrix Mult / GeMM

        dsplib::blas::matrix_mult::matrix_mult_graph
      • Widget Utilities - These widgets support converting between window and streams on the input to the DSPLib function and between streams to windows on the output of the DSPLib function where desired and additional widget for converting between real and complex data-types.

        Function

        Namespace

        Stream to Window / Window to Stream

        dsplib::widget::api_cast::widget_api_cast_graph

        Real to Complex / Complex to Real

        dsplib:widget::real2complex::widget_real2complex_graph
      • DSP Library functions are supported in Vitis Model Composer, enabling users to easily plug these functions into the Matlab/Simulink environment to ease AI Engine DSP Library evaluation and overall AI Engine ADF graph development.
  • Vitis HPC Library release introduces HLS primitives, prebuild kernles and software APIs for HPC applications on FPGAs. These applications are:

    • 2D Acoustic RTM (Reverse Time Migration) FDTD (Finite Difference Time Domain) algorithm, including forward kernel and backward kernel

    • 3D Acoustic RTM (Reverse Time Migration) FDTD (Finite Difference Time Domain) algorithm, including forward kernel

    • MLP (Mult-Layer Perceptron) components: activation functions and fully connected network kernels

    • PCG (Preconditioned Conjugate Gradient) Solvers for both dense matrix and sparse matrix

  • First release of selected vision functions for Versal AI Engines: 
  • Functions available 

    • Filter2D

    • absdiff

    • accumulate

    • accumulate_weighted

    • addweighted

    • blobFromImage

    • colorconversion

    • convertscaleabs

    • erode

    • gaincontrol

    • gaussian

    • laplacian

    • pixelwise_mul

    • threshold

    • zero

  • xfcvDataMovers : Utility datamovers to facilitate easy tiling of high resolution images and transfer to local memory of AI Engines cores. Two flavors

    • Using PL kernel : higher throughput at the expense of additional PL resources. 
    • Using GMIO : lower throughput than PL kernel version but uses Versal NOC (Network on chip) and no PL resources. 
  • New Programmable Logic (PL) functions and features 
  • ISP pipeline and functions:
    • Updated 2020.2 Non-HDR Pipeline 
      • Support to change few of the ISP parameters at runtime: gain parameters for red and blue channels, AWB enable/disable option, gamma tables for R,G,B, %pixels to compute min&max for awb normalization.
      • Gamma Correction and Color Space conversion (RGB2YUYV) made part of the pipeline.
    • New 2021.1 HDR Pipeline : 2020.2 Pipeline + HDR support
      • HDR merge for 2 exposures which supports sensors with digital overlap between short exposure frame and long exposure frame. 
        • Four Bayer patterns supported : RGGB,BGGR,GRBG,GBRB
      • HDR merge + isp pipeline with runtime configurations, which returns RGB output.
      • Extraction function : HDR extraction function is preprocessing function, which takes single digital overlapped stream as input and returns the 2 output exposure frames(SEF,LEF).
    • 3DLUT : provides input-output mapping to control complex color operators, such as hue, saturation, and luminance.
    • CLAHE: Contrast Limited Adaptive Histogram Equalization is a method which limits the contrast while performing adaptive histogram equalization so that it does not over amplify the contrast in the near constant regions. This it also reduces the problem of noise amplification.
  • Flip : Flips the image along horizontal and vertical line.
  • Custom CCA : Custom version of Connected Component Analysis Algorithm for defect detection in fruits. Apart from computing defected portion of fruits , it computes defected-pixels as well as total-fruit-pixels
  • Canny updates : Canny function now supports any image resolution.

Library Related Changes

  • All tests have been upgraded from using OpenCV 3.4.2 to OpenCV 4.4
  • Added support for Versal Edge series (VCK190) 
  • A new benchmarking section with benchmarking collateral for selected pipeline/functions published.
  • The 2021.1 release provide Two-Gram text analytics:

    • Two Gram Predicate (TGP) is a search of the inverted index with a term of 2 characters. For a dataset that established an inverted index, it can find the matching id in each record in the inverted index.

  • Community Detection: Louvain Modularity
  • 2-Hop Search
  • Adds double-precision SpMV (Sparse Matrix dense Vector multiplication) implementation with L2 kernels
  • In 2021.1 release, GQE receives early-access support the following features

    • 64-bit join support: now the gqeJoin kernel and its companion gqePart kernel has been extended to 64-bit key and payload, so that a larger scale of data can be supported.

    • Initial Bloom-filter support: the gqeJoin kernel now ships with a mode in which it executes Bloom-filter probing. This improves efficiency on certain multi-node flows where minimizing data size in the early stage is important.

    • Both features are offered now as L3 pure software APIs, please check corresponding L3 test cases.

  • GZIP Multi Core Compression:
    • New GZIP Multi-Core Compress Streaming Accelerator which is purely stream only solution (free running kernel), it comes with many variant of different block size support of 4KB, 8KB, 16KB and 32KB. 
  • Facebook ZSTD Compression Core:
    • New Facebook ZSTD Single Core Compression accelerator with block size 32KB. Multi-cores ZSTD compression is in progress (for higher throughput).
  • GZIP low latency Decompression:
    • A new version of GZIP decompress with improved latency for each block, lesser resources (35% lower LUT, 83% lower BRAM) and improved FMax.
  • ZLIB Whole Application Acceleration using U50: 
    • L3 GZIP solution for U50 Platform, containing 6 Compression core to saturate full PCIe bandwidth. It is provided with Efficient GZIP SW Solution to accelerate CPU libz.so library which provide seamless Inflate and deflate API level integration to end customer software without recompiling. 
  • Versal Platform Supports.
 
  • Add AIE Support - See above
  • The 2021.1 release provide support for: * RIPEMD160 * Initial support for BLS (not complete)
  • In the 2021.1 release, Data-Mover is added to this library. Unlike other C++ based APIs, this addition is targeting people less experienced in HLS based kernel design and just want to test their stream-based designs. The Data-Mover is actually a kernel source code generator, creating a list of common helper kernels to drive or validate designs, like those on AIE devices.
  • Produce QoR metrics (Vitis QoR Generation API)
    • Cycles took by Application kernel
    • Stall cycles (computed from VCD file)
    • Measure overhead cycles in the wrapper (time spent in other functions than the kernel itself)
    • Throughput
  • 3 levels of optimization XLOPT=0, 1 (default), 2
  • New functionalities for xlopt=2:
    • loop fusion, flatten single iteration outer loops, enhance loop peeling heuristics
  • Analyze "__restrict" usage and give guidance
  • Incremental recompile: when the graph does not change, recompile only kernels that've been modified
  • Packet Switched data → up to 32-split (was limited to 4)
  • New DMA FIFO location constraint (mapper/router changes between release do not impact performances)
  • Use mapping solution as a constraint in the new compilation: prevent future mapping variations that impact performance
  • Bring x86sim feature support to aiesim level
  • Start of deprecation of PL kernels in ADF graphs (complete deprecation in 2021.2)
  • New “Flow Navigator” in GUI for quick access to flow phases and reports.  The contextual "synthesis, analysis, debug" views are merged into a general default context
  • New synthesis report section for the BIND_OP and BIND_STORAGE directives
  • A new post-synthesis text report reflects the information provided in the GUI synthesis report
  • The IP export and Vivado implementation run widgets have been redesigned with options to pass settings and constraint files to Vivado
  • New function call graph viewer to visualize functions and loops which can be highlighted with an optional heatmap to detect II, latency, or DSP/BRAM utilization hot spots
  • Versal timing calibration and new controls for DSP block native floating-point operations (the -precision option for config_op)
  • The Vitis HLS Migration guide (former UG1391) is now a chapter in UG1399
  • New methodology sections in user guide (UG1399 and web)
  • Alternate flushable pipeline option has been improved (free-running pipeline aka "frp")
  • In Vitis, a top port pointer can now simply be mapped onto the axi-lite adapter rather than a global memory
  • The aggregate directive now provides a "-compact bit" option for maximum packing
  • Adds back a "Leave Feedback" entry in Help menu with optional survey
  • Fixed bug for "Man Pages" tab not displaying information on some Linux systems
  • In Vitis, reshaping m_axi interfaces should be done via the hls::vector types
  • New customization options for s_axilite and m_axi data storage which can be "auto, "uram", "bram" or "lutram" allowing you to tweak RAM utilization in your design
  • In Vitis, introducing a new continuously (aka "never-ending") running mode for kernel
  • The axi_lite secondary clock option has been re-instated
  • Enhance support for RTL kernel packaging in Vivado IP packager
    • public and productized feature with proper methodology and documentation.
    • XRT managed kernel is the default flow.

  • Support encrypted AIE source files as input

    • AIE compiler can accept encrypted AIE source file and v++ supports the rest of the flow.

  • Add Create Boot Image Wizard support for Versal devices
  • Multiple improvements for AI Engine programming and debugging
    • Being able to turn on and off micro code labels
    • Static Cross-probing between the source code and the microcode
    • Full view of the microcode
    • Bringing the last PC in the visible area whenever Pipeline view updates the data
    • Aligning the Instruction data in Pipe line view
    • Adding "Single Instruction Mode" action to disassembly view.
  • Be able to generate a default BIF file for a platform project
  • Program Flash for SD and eMMC adds raw mode support
  • In-context help messages are added to AI Engine development flow
  • Upgraded GCC toolchain version to 10.2
  • Users can emulate AXI-MM master/slave through an external process such as Python / C++. This may help users to emulate design with quick design time of AXI Master / Slave, without investing resources in developing AXI Master or VIP. AXI-MM Inter-process communication can also help to emulate the Chip-to-Chip connection between two FPGAs.
  • Enabling compilation of Versal models for VCS.
  • Platform developers can run hardware emulation on the platform with standalone applications to test the platform in the early stage.
  • User range profiling information and user event information are aggregated into profile summary report
  • Vitis Analyzer shows a critical timing path.

    • Vitis Analyzer will display a simplified version of the Vivado GUI timing report, without the need to open a Vivado project or netlist. This allows users to quickly navigate to the failing timing path.

  • Vitis Analyzer multiple strategies support

    • Results from multiple strategies run can be visualized in Vitis Analyzer.

  • New xrt.ini switches for profiling and debug
  • Reduce memory and loading time for large applications

    • The new profile tool takes less resource for processing large csv file, which reduces the loading time and the crashing problem occurrence. 

  • PL continuous trace offloading improvement

    • Use DDR or HBM as memory resource to store trace data

    • Circular buffer support for large data offloading

    • Trace buffer size and offloading interval can be set in xrt.ini

  • Improvements to the visualization of AIE design’s trace report

    • All AIE inputs will be displayed(window, stream, cascaded stream, etc.) 

    • Support all IO data types

  • Stable native XRT API, with C++ APIs for AIE graph control and execution, Software Emulation and tracing support.
  • XRT provides new helper APIs to help users to move from OpenCL API to XRT native API in $XILINX_XRT/include/CL/cl2xrt.hpp.
  • XRT New API xrt::device.get_info() can extract device properties
  • Greatly improved next generation xbutil and xbmgmt utilities are now the default.
  • xbutil can report power status
  • xbmgmt can support runtime clk scale and setup user power threshold to protect board and server.
  • sysfs, xbmgmt and xbutil can report MAC address of Alveo board
  • KDS scheduler in xocl has been refactored to significantly improve the throughput across hundreds of processes exercising multiple compute units across multiple devices concurrently. For legacy shells you may notice small percentage of throughput degradation. Please see the AR for proper solution.
  • XRT driver debug trace support through debugfs /sys/kernel/debug/xclmgmt/ and /sys/kernel/debug/xocl/

Access the latest Vitis Target Platforms for Alveo Accelerator cards at www.xilinx.com/alveo. Please refer to the Getting Started section of the accelerator card you want to deploy your applications on.

Please refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide for more details and to keep up-to-date on the latest Vitis Target Platform releases, as they become available.

New Platforms 

  • Alveo U200 Gen3x16 XDMA 1RP
    • Name: xilinx_u200_gen3x16_xdma_1_202110_1
    • Features: Slave Bridge, P2P, GT Kernel, DDR Self-Refresh
  • Alveo U50 Gen3x16 noDMA 1RP 
    • Name: xilinx_u50_gen3x16_nodma_1_202110_1
    • Features: Slave Bridge, P2P, GT Kernel, Clock Throttling
  • VCK190 Base Platform enables ECC on DDR and LPDDR; constraints become concise.
  • MPSoC base platforms increased CMA size to 1536M. All Vitis-AI models can run with this CMA size.
  • Embedded platform creation flow gets simplified: Device Tree Generator can automatically generate a ZOCL node; XSCT can generate BIF files. Base platform source files are reduced.
  • Support for Kubernetes(K8s) clusters: Xilinx FPGA Resource Manager (XRM) can now be used together with the Kubernetes to run and manage compute units (CUs) across a pool of multiple Alveo accelerator cards attached to a server and scale applications to multiple servers with Alveo cards.
  • A comprehensive constraint editor enables users to specify any constraint for AI Engine kernels in Vitis Model Composer. The generated ADF graph will contain these constraints. 
  • Addition of AI Engine FFT and IFFT blocks to the library browser. 
  • Users now have access to many variations of AI Engine FIR blocks in the library browser. 
  • Ability to specify filter coefficients using input ports for FIR filters. 
  • Addition of two new utility blocks "RTP Source" and "To Variable Size".
  • Enhanced AIE Kernel import block now also supports importing templatized AI Engine functions. 
  • Ability to specify Xilinx platforms for AI Engine designs in the Hub block.
  • Through the Hub block, users can relaunch Vitis Analyzer at any time after running AIE Simulation. 
  • Users can now plot cycle approximate outputs and see estimated throughput for each output using Simulink Data Inspector. 
  • Enhanced usability to import a graph as a block using only the graph header file. 
  • Revamping of the progress bar with cancel button
  • Usability improvement during importing an AI Engine kernel or simulation of a design when MATLAB working directory and model directory are not the same. 
  • New TX Chain 200MHz example. 
  • New 2d FFT examples showcasing designs with HLS, HDL, and AI Engine blocks. 
  • Simulation speed enhancement for SSR FIR (more than 10x improvement), and SSR FFT.
  • Simulation speed enhancement for memory blocks like RAMs, and FIFOs
  • Questa Simulator updated with VHDL 2008 in the Black-box import flow
  • Vitis Model Composer now contains the functionality of Xilinx System Generator for DSP.  Users who have been using Xilinx System Generator for DSP can continue development using Vitis Model Composer.
  • MATLAB Support - R2020a, R2020b & R2021a

 

2020.2

Vitis Software Platform 2020.2 Release Highlights:

  • Vitis 2020.2 supports application acceleration and embedded software development for Versal ACAP Platforms
  • Vitis Core Development Kit now includes the AI Engine Compiler to compile C/C++ applications for Versal AI Engines. AI Engine, part of Versal AI Core Series, is a vector processor for compute-intensive applications
  • Vitis HLS is default for both accelerated-kernel compilation (Vitis) and C/C++ to RTL IP creation flow (Vivado)
  • 600+ FPGA-accelerated functions across 13 performance-optimized libraries. 2020.2 introduces the new Vitis HPC library for accelerating high-performance computing applications and several enhancements & additions to the Data Analytics, Graph, BLAS, Sparse, Security & Database libraries
  • Support for evaluating multiple implementation strategies for final FPGA binary creation & enhancements for easier RTL-kernel integration within Vitis applications
  • Other enhancements this release include support for AI Engine application profiling, Git version control for Vitis projects, Vitis AI profiler data integration within Vitis Analyzer and enhancements for emulation modes. Learn More >
  • Add-on for MATLAB® and Simulink® : Unification of Xilinx Model Composer and System Generator for DSP. AI Engine is a new domain in Add-On for MATLAB and Simulink.
    Learn More >

Vitis What's New by Category

Expand the sections below to learn more about the new features and enhancements in Vitis 2020.2. For information on Supported Platforms, Changed Behavior & Known Issues, please refer to Vitis 2020.2 Release Notes for Application Acceleration Flow and Embedded Software Development Flow.

Note: Vitis Accelerated Libraries are available as a separate download. They can be downloaded from GitHub or directly from within the Vitis IDE as well.

  • FPGA-accelerated library for HPC workloads. Initial release focuses on Seismic Imaging & Geophysics Simulation use-cases
    • Reverse Time Migration (RTM) – Seismic imaging technique for accurate representation of subsurface 
    • High-precision Multi-layer Perceptron (MLP) - Reconstruction of subsurface properties using seismic reflection data (Seismic Inversion)
  • Optimized for single precision floating point data types (FP32) which is a key requirement within HPC applications
  • Version 1 of the library offers the following:
    • L1 Stencil primitive, L1 MLP activation functions including Sigmoid, Relu, and Prelu
    • L2 2D RTM forward kernel, 2D RTM backward kernels, and 3D RTM forward kernel
    • L3 2D RTM APIs for supporting shot parallelism

New Functions and Features

  •  2020.2 ISP Pipeline example design supports pixel depths up to 16 bits
  • Local tone mapping
  • Auto Exposure Correction
  • Quantization & Dithering
  • Color Correction Matrix
  • Black Level Correction
  • Lens Shading correction
  • Brute Force Feature Matching
  • Mode Filter
  • blobFromImage
  • Laplacian Operator
  • Distance Transform

Library Infrastructure & Other Enhancements

  • All library functions support Alveo U50 platform
  • GUI support for both Edge and Data Center platforms
  • Color Conversion : Supporting RGBX or fourth channel support
  • Line Stride support in Data Converters
  • Removed xf_axi_sdata.hpp file. Axiconverter functions now use the HLS ap_axi_sdata.h file instead.

Ready-to-Evaluate Apps in New Xilinx App Store

The following FPGA-accelerated applications, developed using the Vitis Vision library, are now available on the new Xilinx App Store as containers for easy evaluation and deployment on Alveo accelerator cards on the Nimbix cloud or On-premise

  • Image Classification using ML-inference engine from Vitis AI Library and Vitis Vision Pre Processing Function
  • Image Sensor Processing (ISP) Pipeline
  • Stereo Block Matching
  • Text Processing APIs. Two major APIs included - the regular expression match and geo-IP lookup. The former API can be used to extract content from unstructured data like logs, while the latter is often used in processing web logs, to annotate with geographic information by IP address. A demo tool that converts Apache HTTP server log in batch into JSON file is provided with the library.
  • DataFrame APIs for in-memory Data Abstraction: DataFrame is widely popular for in-memory data abstraction in data analytics domain, the DataFrame write and read APIs should enable data analytics kernel developers to store temporal data or interact with open-source software using Apache Arrow DataFrame more easily.
  • Tree Ensemble Method. Random forest is extended to include regression. Gradient boost tree, based on boosting method, is added to support both classification and regression. Support for XGBoost on classification and regression is also included to exploit 2nd order derivative of loss function and regularization.
  • Single-Source Shortest Path API (singleSourceShortestPath): 2020.2 version now supports the Alveo U50 platform and provides a new output ‘pred32’ for the shortest path information.
  • Page Rank APIs: 2020.2 version now supports Alveo U50 platform and including two APIs both named ‘pageRankTop’ - One to leverage a single memory channel and the other to utilize multi-bank memories.  
  • Similarity APIs: 3 new APIs to cover different applications: .‘denseSimilarityKernel’ is for dense graph applications, ‘sparseSimilarityKernel’ for Sparse graph applications and ‘generalSimilarityKernel’ for both types of applications with single kernel.
  • The following APIs now support Alveo U50 platform:
    • Breadth-First search bfs API (bfs)
    • Degree calculation API (calcuDegree)
    • Connected component API (connectedComponents)
    • Converting format from CSC to CSR API (convertCsrCsc)
    • Label propagation API (labelPropagation)
    • Strongly connected component API (stronglyConnectedComponents)
    • Triangle count API (triangleCount)
  • New L2 GEMM Kernel
  • For FP32 data types, the L3 GEMM performance has been improved from 280 GFLOPS to 340 GFLOPS
  • Introduced FP32 L2 CSCMV kernel (sparse matrix vector multiplication for CSC - Compressed Sparse Column - format matrices) that utilizes 16 HBM channel support on the Alveo U280 accelerator card.
  • The 2020.2 release brings a major enhancements and updates to the General Query Engine (GQE) kernel design, and brand-new Level 3 APIs for JOIN and GROUP-BY AGGREGATE.
    • Columns as Input Buffers: The GQE kernels treat each column as an input buffer, simplifying the data preparation in the host code. Additionally, allocating multiple buffers on host side will reduce out-of-memory issues compared to big contiguous memory allocations, especially when the server is under heavy load.
    • Command Classes for generating Configuration bits : The L2 layer now provides command classes to generate the configuration bits for GQE kernels. Developers no longer have to dive into the bitmap table to understand which bit(s) to toggle to enable or disable a function in GQE pipeline. Thus, the host code can be more sustainable and less error-prone.
    • New Level-3 APIs: New experimental L3 APIs for JOIN and GROUP-BY AGGREGATE are built to scale the problem size that GQE can handle. They can breakdown the tables into parts based on hash and call the GQE kernels multiple rounds in a well-schedule fashion. The strategy of execution is separated from execution, so database gurus can fine-tune the execution based on table statistics, without messing with the OpenCL execution part.
  • LIBZ Library Acceleration using Alveo U50             
    • Seamless acceleration of libz standard APIs : deflate, compress2 and uncompress
    • Ready-to-use libz.so library to accelerate any host code without any code change
    • xzlib standalone executable for both gzip/zlib compress & decompress
  • ZSTD Decompression : New implementation of Facebook ZSTD algorithm available
  • Snappy Dual Core Kernel : New implementation of Google snappy Dual Core decompression algorithm achieves 2x throughput improvement for single file decompress.
  • GZIP Compress Kernel: New GZIP Quad Core Compress Kernel (in-built , LZ77 , TreeGen, Huffman encoder) implementation available. More than 20% reduction in overall resources and 50% reduction in DDR bandwidth requirement.
  • GZIP Compress Streaming Kernel: Fully standard compliance GZIP(include header & footer) implementation available, streaming free running kernels.
  • GZIP/ZLIB L3 Application on Alveo U50: GZIP/ZLIB Application available as an L3 API , optimized for Alveo U50 (HBM) and Alveo U250 cards. Single FPGA binary (xclbin) supports both zlib & gzip format for compress and uncompress
  • Support for  to Alveo U50 : Library functions (LZ4, Snappy, GZIP, ZLIB) ported to support the Alveo U50 platform.
  • Low Latency GZIP/ZLIB Decompress : Initial decompression latency reduced from 5K to 2.5K for 4KB/8KB/16KB block sizes
  • APIs revised to fully support Vitis HLS compiler
  • New Signature Generation and Verification Algorithms: DSA, ECC, ECDSA(secp256k1) and EdDSA(ed25519)
  • New Checksum Algorithms: Adler32 and CRC32.
  • Verifiable delay function (VDF) evaluation and verification: Pietrzak's VDF and Wesolowski's VDF.
  • Commercial Cryptography constituted by CAS: SM2, SM3 and SM4.
  • Stream Cipher: XChacha20.
  • Optimization on RSA, GMAC, AES-GCM and SHA3 to improve their performance and resource utilization.
  • Argument parser (Beta): Parses the options and flags passed from command line and offers automatic help information generation enabling developers to create unified experience on test cases and user applications.
  • FIFO multiplexer: This module wraps around a FIFO (implemented through hls::stream in kernel code ) to enable passing data of different type through the same hardware resource. When the data is too wide, it will automatically be transferred using multiple cycles. This module is expected to make the dataflow code more compact and readable.

ADF: Adaptive Data Flow

  • Compiler:
    • Event tracing on PLIO or GMIO
    • Event tracing also on Hardware
    • Heat Map generation: %utilization of all AI Engines
    • Supports different PL frequencies for PL kernels and PLIOs
  • Vitis IDE for AI Engines
    • Pipeline view
    • Vector register view
    • Internal memory views East, West North, South
    • External memory
  • Vitis HLS replaces Vivado HLS in Vivado (it was already default for Vitis and C based kernel compilation in 2020.1)
    • Adds array reshape and partitioning pragmas for top function ports
  • The tool is now installed in its own directory ./Vitis_HLS/2020.2 alongside Vitis and Vivado
  • HLS design migration information has been updated in UG1391
  • Vitis HLS user guide is UG1399, the full content is also available in HTML
  • Updated design examples on GitHub, they can also be loaded automatically from the Vitis HLS GUI (from the "Git Repositories" sub-window) for direct access
  • Support for SIMD programming
  • Support for on-chip block RAM ECC flags via the bind_storage pragma (Vivado flow only) to monitor error correction logic generated by the RAM blocks
  • GUI has a simplified toolbar icon layout, new reporting sections for interfaces and AXI4 including bursts
  • Non-default options can be filtered for quick review in "Solution Settings"→"General" then "Show only non-defaults" tick mark
  • User can create and open a project in the GUI directly starting from Tcl using the -p option and passing the Tcl file as an argument: vitis_hls -p  <file>.tcl
  • Interactive FIFO depth sizing in GUI
  • Constrained random testing for AXI interfaces now visible in the GUI

Versal Only Features

  • Vitis HLS now infers the dedicated single clock cycle accumulation for floating point (adder or multiplier) of the DSP58 block to implement efficient high throughput accumulation
  • Timing libraries updated for Versal production target devices
  • Improved RTL-Kernel Integration:  Enhancements for packaging & integrating RTL IPs as kernels within Vitis applications, including support for user-managed RTL kernels (not controlled by XRT APIs) and improvements to IP Packager within Vivado to support this flow.
  • Multiple Implementation Strategies for Timing Closure: Vitis compiler & linker (v++) now supports launching & running multiple Vivado implementation strategies at the same time during hardware builds. This enables users to explore & assess all results and select the best strategy for final FPGA binary (xclbin) creation.

Versal Only Features

  • In 2020.2, as long as the hardware design stays the same, aiecompiler will only recompile and update to the software when AIE program is modified. The v++ linking stage is not re-run and it goes directly to the package step. This allows users to easily and quickly iterate on the AIE program after the HW has been fixed.
  • System Level template will be provided which includes AIE, PL and PS design files.
  • AIE tools features integrated into Vitis IDE, such as displaying pipeline information, storage view, parallel compilation etc.
  • Version Control for Vitis Projects: Integration with Git version control for Vitis Projects enables collaboration across multiple developers and teams.
  • Improvements to Project Hierarchy: Acceleration kernel and host applications are now separate projects under top-level System Project enabling a user to compile the host application and hardware kernels separately.
  • Improvements to Board Support Package (BSP) Build times: For platform projects with standalone domains, the Board Support Package (BSP) drivers compiles in parallel to speed up application build time.
  • Ease-of-Use for Host Application Debug: Processing System registers can be now be exported as a file from the Vitis GUI for debug.
  • Profiling System Projects: Top-level System Projects now offers more control over specifying profiling features via the Vitis GUI for the Vitis application acceleration flows.
  • Improved Support for Platform Creation with Hardware Emulation: In addition to the Block Diagram as the top-level, the Hardware emulation mode now also supports RTL sources in the platform  as the top-module or reference RTL inside block diagram without packaging. You can add RTL testbench as in Vivado. It offers more flexibility for validating designs before deployment.
  • Save Signals during Emulation for Debug: Save signals to Xilinx Simulator (XSIM) waveform file during emulation. User can pass -wcfg-file-path to launch_hw_emu.sh when rerunning hardware emulation.
  • Emulation Support for Slave Bridge Feature (Alveo Platforms) : Please refer to the Alveo Platform Documentation for more details on Slave Bridge features.
  • Python/C++ APIs for emulating AXI Stream IOs : Mimic data streaming through IO ports on platform using simple Python or C++ APIs while emulating AXI Stream kernels enabling you to emulate and debug complete system with programmed traffic patterns much earlier in the design cycle
  • Questa Simulator support for U250 Alveo Platform: In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for U250 Alveo platforms now also supports Questa. Setup is done via V++ configuration files or Vitis IDE.
  • HLS kernel deadlock detection: Deadlock or livelock code in HLS kernel can be detected during hardware emulation by compiling HLS kernel with v++ config param=compiler.deadlockDetection=true

Versal Only Features

  • 3rd party simulator support ( Questa, Xcelium, VCS) : In addition to the Xilinx Simulator (XSIM), hardware emulation in Vitis for Versal embedded platforms now also supports 3rd party simulators like Questa and Xcelium on Linux. VCS is supported in Early Access stage. Setup is done via V++ configuration files or Vitis IDE.
  • Vitis AI Profiler Data Integration: For applications that use the Deep Learning Processing Unit (DPU) for AI inference, you can access Vitis AI profiler information including DPU throughput, DDR read/write rates and timeline trace information within Vitis Analyzer to assess end-to-end application acceleration. 
  • View Package Summary Report: View the Package Summary Report within Vitis Analyzer for an overall view of application’s status from a performance and optimization perspective.  The package summary is created by v++ command after linking to build a package that can be run for software or hardware emulation or can be booted and run on the hardware device.
  • Integrated Host & Kernel Profiling: Vitis 2020.2 adds the capability to provide user event API profiling. Beyond the profiling capabilities inherently available for accelerated kernels, you can call Xilinx Runtime Library (XRT) APIs in your host code to profile arbitrary sections of the design and make decisions on overall application performance optimization.
  • Other Enhancements: Global Search across all reports accessible within Vitis Analyzer, flexibility to save/restore custom user layouts for viewing performance reports, Intuitive grouping of guidance messages to view related information in one place, Improvements to utilization reports enabling visibility into statistics on a per Super Logic Region (SLR) basis for deeper insight.

Versal Only Features

  • Profile summary report will have specific AIE design entry. More AIE related data will be shown in the compile/run summary reports, such as AIE heatmap which displays the kernel active/stall cycles running on HW.
  • Improved Visibility for Debug:  AXI-S Transaction-level view available in the Xilinx Simulator (XSIM) Transaction Viewer for System-C portions of hardware emulation designs, providing better visibility into the design at a transaction level for debug.
  • View FIFO Status in Live Waveform Viewer: Status of user-level FIFOs (denoted as hls:streams in kernel code) can be viewed in Live Waveform Viewer during Hardware Emulation, providing visibility into static FIFO depths, FIFO elements and  FIFO usage to identify performance bottlenecks for acceleration kernels

Versal Only Features

  • Event trace enhancements: Vitis 2020.2 incorporate a couple of enhancements on AIE event trace features, such as support for offloading by XRT, multiple trace stream flow enhancement support and the ability to monitor PL/AIE boundary even PL kernel is defined in the graph. Meanwhile, the PL/PS/AIE event trace are combined into a common timeline to provide better visualization of the whole design.

Note: Xilinx Runtime Library (XRT) is available as a separate download. Please refer to the Getting Started information for download and install instructions.

  • Improved Support for HBM-enabled Platforms:  Leverage the benefits of high-bandwidth memory (HBM) enabled platforms by specifying kernel port connections to HBM banks through v++ --sp HBM[#:#] Xilinx Runtime Library (XRT) APIs can also automatically assign the HBM banks and enable the host application to allocate arbitrary sized buffers of one or more HBM segments (256MB+) (on HBM segment bounds).
  • Next Generation Xilinx Board Management Utilities (Preview): Next generation Xilinx Board Management utilities (xbutil, xbmgmt) are available for preview. They can enable the Slave Bridge and DDR retention features for Xilinx platforms that support them. Note: Current generation of board management utilities will be moved to maintenance mode in 2021.1 & new features will only be added to next generation utilities.

Versal Only Features

  • AIE support is added to support RTP, error handling, full array reconfiguration and graph API.

Access the latest Vitis Target Platforms for Alveo Accelerator cards from the Alveo Packages Download Tab

Please refer to UG1120 - Alveo Data Center Accelerator Card Platforms User Guide for more details and to keep up-to-date on the latest Vitis Target Platform releases, as they become available

U200/U250 XDMA Platforms

  • Alveo Platform U200 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh
  • Alveo Platform U250 XDMA 2RP - Production
    • Features: ERT, CMC, PLRAM, DRM capable floorplan, XDMA, 2RP, P2P, M2M, GT Kernel, PCIe Slave Bridge, DDR Self-Refresh

Shell Upgrade DFX - 2RP ( 2 Reconfiguration Partitions)

  • Small size of static region: Base 
    • PCIe functionality
    • In-band FPGA partial reconfiguration 
  • New reconfiguration partition: Shell
    • Update DMA and utility functions
    • Dynamic swapping between platforms without rebooting the server
  • 2nd reconfiguration partition: User Logic
    • Accelerator kernel functions

AXI Slave Bridge

  • Direct host memory access by the kernel 
  • DMA bypass capability, with AXI-Slave 512-bit interface and user can provide their own data mover

Data Retention - DDR4 self-refresh

  • Data context retained in FPGA memory using DDR4 self-fresh during reconfiguration
  • Eliminates copying to host RAM as a temporary storage for different XCLBINs
  • Minimizes movement of large data sets

Note: Vitis Target Platforms for Embedded Platforms (including pre-built linux kernels, root file system and sysroot) are available as a separate download on Vitis Embedded Platforms Tab

  • ZYNQ-7000 and ZYNQ UltraScale+ MPSoC base platform functions are kept the same but platform source code has been re-structured. Directories are renamed for easy understanding; common source files across multiple platforms are grouped together. It would be easier to reuse the platform source code and port it to a new platform.
  • When building platform from source code, besides compiling PetaLinux from scratch, a new end-to-end compiling method is added if user uses downloaded common software components. User can point to those components and skip PetaLinux compiling when building a platform.

The VCK190 platform has flexible DDR + LPDDR memory subsystem and supports 63 interrupts for acceleration kernels. It is available for use with the Vitis core development kit, for both application acceleration and embedded processor software development, as described in Versal AI Engine Programmers Guide (UG1076). The platform enables development of designs that include:

  • AI Engine graphs and kernels
  • Programmable Logic kernels
  • Host application targeting the Linux or a bare metal OS running on the Arm processor in the Versal device.
  • Please refer to Getting Started with Vitis and Versal ACAP platforms to learn more.
  • Support for Kubernetes(K8s) clusters: Xilinx FPGA Resource Manager (XRM) can now be used together with the Kubernetes to run and manage compute units (CUs) across a pool of multiple Alveo accelerator cards attached to a server and scale applications to multiple servers with Alveo cards.
2020.1