UPGRADE YOUR BROWSER

We have detected your current browser version is not the latest one. Xilinx.com uses the latest web technologies to bring you the best online experience possible. Please upgrade to a Xilinx.com supported browser:Chrome, Firefox, Internet Explorer 11, Safari. Thank you!

Flexible DSP Solutions

Massive Parallel Computation

With their inherent flexibility, Xilinx FPGAs and All Programmable SoCs are ideal for high-performance or multi-channel digital signal processing (DSP) applications that can take advantage of hardware parallelism. Xilinx FPGAs and SoCs combine this processing bandwidth with comprehensive solutions, including easy-to-use design tools for hardware designers, software developers, and system architects.

Hardware Parallelism

A standard Von Neumann DSP architecture requires 256 cycles to complete a 256 tap FIR filter while Xilinx FPGAs can achieve the same result in a single clock cycle.

Diagram showing massive parallelism translates into exceptional levels of DSP performance

This massive parallelism translates into exceptional levels of DSP performance:

  • 22 TeraMACs of Fixed Point performance
  • 7.3 TeraFLOPs for Single Precision Floating Point
  • 11 TeraFLOPs for Half Precision Floating Point

Comprehensive DSP Solutions

Xilinx DSP solutions include silicon, IP, reference designs, development boards, tools, documentation, and training to enable a wide range of applications in a breadth of markets, including —but not limited to— Wireless Communications, Data Center, and Aerospace and Defense.

Comprehensive Development Flows

Various tool flows are available for different use models and different levels of design abstraction:

Hardware designers can design in:

Software developers accustomed to developing in C/C++ can design using:

System architects can rapidly evaluate new algorithms with:

With Xilinx FPGAs and SoCs, designers can use multiple flows to deploy their DSP applications depending on design approach and level of abstraction.

Based on an ASIC-class architecture, Xilinx FPGAs combine multi-hundred giga-bit-per-second I/O bandwidth with over 20 TeraMACs of fixed point DSP performance in the Virtex® UltraScale+™ family. The Xilinx DSP slice and its parallelism is key to the achievable DSP performance in the latest generation of Xilinx FPGAs.

DSP Slice Architecture

The UltraScale™ DSP48E2 slice is the 5th generation of DSP slices in Xilinx architectures.

This dedicated DSP processing block is implemented in full custom silicon that delivers industry leading power/performance allowing efficient implementations of popular DSP functions, such as a multiply-accumulator (MACC), multiply-adder (MADD) or complex multiply.

The slice also provides capabilities to perform different kinds of logic operations, such as AND, OR and XOR operations (UG579).

UltraScale architecture builds on the success of 7 series (DSP48E1), with further enhancements:

  • Wider multiplier (27 x 18 bits)
  • Ability to square the output of the Pre-Adder via the Squaring MUX
  • New wide MUX feature allowing for a true 3 input adder after the multiplier
Diagram showing how the enhancements help DSP critical applications perform more computation within the DSP48E2 slice before going into the FPGA fabric.

These enhancements help DSP critical applications perform more computation within the DSP48E2 slice before going into the FPGA fabric, ultimately leading to both resource and power savings.

DSP48E1 (7-Series) vs DSP48E2 (UltraScale) Slice Features

Function 7 Series UltraScale
DSP Tile/Slice Type DSP48E1 DSP48E2
Multiple Add/Sub/Acc operations
Multiplier and MACC 25x18 27x18
Squaring:  [(A or B) +/- D]2  
WMUX Feedback Ultra Efficient Complex Multiply CMACC 5 x DSP48E1 3 x DSP48E2
SIMD Support
Integrated Pattern Detect Circuitry
Integrated Logic Unit
Wide Mux Functions (48 bit)
Wide XOR (96 bit)  
Optional 96-bit Output
Cascade Routing
Pipeline Registers
D Pre-adder
Sequential Complex Multiply, AB dyn access
AB Register Pipeline Balancing Improved

Tools and Flows

Depending on your designing preferences, Xilinx has tools supporting RTL, C/C++ and model-based design entry. This flexibility in the design flow, along with an extensive DSP IP catalog, facilitates easier adoption of Xilinx tools and devices.

The Vivado IDE works as a design cockpit for system level design which provides the ability to build your complete design, implement it and write out a bit-file to program your device.

Diagram showing how Xilinx tools supporting RTL, C/C++ and model-based design entry to program your FPGA

Visit Tools, Libraries & Frameworks for more information.

DSP Performance Metrics

The following table shows some of the key DSP performance metrics for 7-Series, UltraScale and UltraScale+ families. For SoC device performance, see Software Developer section.

  Artix-7 Kintex-7 Kintex UltraScale Kintex UltraScale+ Virtex-7 Virtex UltraScale Virtex UltraScale+
Logic Cells (K) / System Logic Elements (K) (1) 13–215 65–478 318–1,451 356–1,143 326–1,424 783–5,541 862–3,780
DSP Slices 40–740 240–1920 768–5,520 1,368–3,528 1,120–3,600 600–2,880 2,280–12,288
18x18 Multipliers 40–740 240–1920 768–5,520 1,368–3,528 1,120–3,600 600–2,880 2,280–12,288
Fixed Point Performance (GMACs) 25–464 178–1,424 507–4,090 1,218–3,143 831–2,671 444–2,134 2,031–10,948
Fixed Point Performance For Symmetric Filters (GMACs) (2) 50–928 356–2,848 1,014–8,180 2,436–6,286 1,662–5,342 888–4,268 4,062–21,896
INT8 GOPs (3) 50–928 356–2,848 1,774–14,315 4,263–11,000 1,662–5,342 1,554–7,469 7,108–38,318
INT16 GOPs 50–928 356–2,848 1,014–8,180 2,436–6,286 1,662–5,342 888–4,268 4,062–21,896
Single Precision Floating Point (GFLOPs) (4) 10–196 96–770 320–2,685 800–1,673 449–1,444 294–1,411 1,354–7,299
Single Precision Floating Point (GFLOPs) (5) 7–147 72–577 240–2,028 609–1,571 337–1,083 220–1,058 1,015–5,474
Half Precision Floating Point (GFLOPs) (6) 15–295 144–1,154 480–4,056 1,218–3,142 674–2,166 440–2,116 2,030–10,948

Notes:

  1. Logic Cells used for 7-Series only
  2. Using the pre-adder DSP performance can be increased 2x for symmetric filters
  3. Please refer to WP486 – Deep Learning with INT8 Optimization on Xilinx Devices
  4. Single Precision Floating Point performance using Floating Point Operator core with 3 DSP slices
  5. Single Precision Floating Point performance using Floating Point Operator core with 4 DSP slices
  6. Half Precision Floating Point performance using Floating Point Operator core with 2 DSP slices

 

Useful Design Techniques and Information

To achieve the most optimal and efficient usage of the DSP48 slices within Xilinx FPGAs, the following information and techniques should be reviewed and utilized where possible.

  • Utilize the DSP slice user guides as a cumulative resource (AR68594)
  • Xilinx LogiCORE DSP48 Macro provides an easy-to-use interface to configure the DSP48 slice
  • Time Division Multiplexing the DSP Slice to improve efficiency and throughput (AR68595)
  • Working with Floating Point, Xilinx provides the Floating-Point Operator IP core, which includes the ability to convert between data-types, e.g. Floating Point to Fixed Point

Xilinx has introduced software development environments and a comprehensive set of familiar and powerful tools, libraries and methodologies which allow software developers to target Xilinx FPGAs and SoCs with ease. With these high level abstraction environments like Vivado High Level Synthesis (HLS), SDAccel and SDSoC, Xilinx can offer GPU-like and familiar embedded application development and runtime experiences for C, C++ and/or OpenCL development.

Xilinx All Programmable SoCs and MPSoCs

The Zynq UltraScale+ MPSoC and the Zynq-7000 families combine a powerful processing system (PS), incorporating ARM® Cortex® processors, and user-programmable logic (PL), in a single device.

Application Profiling for Acceleration

SDSoC provides the ability to profile a given application and allows for the creation of hardware accelerators to run more efficiently in the Programmable Logic (PL), where the flexibility and parallelism of the FPGA are leveraged to provide large performance improvements. This also enables other functions of the application to run in the Processing System (PS) in parallel if desired.

By targeting Xilinx FPGAs and All Programmable SoCs, many DSP and embedded applications will see improvements in efficiency and reduced power for their applications.

Features and DSP Performance of Xilinx All Programmable SoC Devices

The following tables shows some of the key features and DSP performance metrics for both Xilinx Zynq-7000 SoC and Zynq UltraScale+ MPSoC families. For non-SoC device performance, visit the Hardware Designer section.

PROCESSING SYSTEM Zynq-7000 SoC Zynq UltraScale+ MPSoC
Application
Processing Unit
(APU)
  • Single/Dual-core ARM Cortex-A9 MPCore™ up to 1GHz
  • ARMv7-A architecture
  • NEON™ media-processing engine
  • Single and double precision Vector Floating Point Unit (VFPU)
  • Dual/Quad-core ARM Cortex-A53 MPCore up to 1.5GHz
  • ARMv8-A Architecture
  • Neon Advanced SIMD media processing engine
  • Single/Double Precision Floating Point Unit (FPU)
Real-Time
Processing Unit
(RPU)
-
  • Dual-core ARM Cortex-R5 MPCore up to 600MHz
  • ARMv7-R Architecture
  • Single/Double Precision Floating Point Unit (FPU)
Multimedia Processing -
  • GPU ARM Mali™-400 MP2 up to 667MHz
    • OpenGL ES 1.1 and 2.0 support
    • OpenVG 1.1 support
  • Video Codec supporting H.264-H.265 (EV devices only)
Dynamic Memory Interface DDR3, DDR3L, DDR2, LPDDR2 DDR4, LPDDR4, DDR3, DDR3L, LPDDR3
High-Speed Peripherals USB 2.0, Gigabit Ethernet, SD/SDIO PCIe® Gen2, USB3.0, SATA 3.1, DisplayPort, Gigabit Ethernet, SD/SDIO
Security RSA, AES, and SHA, ARM TrustZone® RSA, AES, and SHA, ARM TrustZone
Max I/O Pins 128 214
PROGRAMMABLE LOGIC Zynq-7000 SoC Zynq UltraScale+ MPSoC
System Logic Elements (K) 23–444 103–1,045
Max Memory (Mb) 1.8–26.5 5.3–70.6
Max I/O Pins 100–362 252–668
DSP Slices 60–2,020 240–3,528
18x18 Multipliers 60–2,020 240–3,528
Fixed Point Performance (GMACs) (1) 42–1,313 213–3,143
Fixed Point Performance For Symmetric Filters (GMACs) (1) (2) 84–2,626 426–6,286
INT8 GOPs (1) (3) 84–2,626 745–11,000
INT16 GOPs (1) 84–2,626 426–6,286
Single Precision Floating Point (GFLOPs) (1) (4) 23–716 142–1,673
Single Precision Floating Point (GFLOPs) (1) (5) 17–537 106–1,571
Half Precision Floating Point (GFLOPs) (1) (6) 34–1,074 212–3,142

Notes:

  1. All performance calculations based of -2 speed grade parts for Zynq-7000 SoC and -3 for Zynq UltraScale+ MPSoC
  2. Using the pre-adder DSP performance can be increased 2x for symmetric filters
  3. Please refer to WP486 – Deep Learning with INT8 Optimization on Xilinx Devices (Not applicable for Zynq devices)
  4. Single Precision Floating Point performance using Floating Point Operator core with 3 DSP slices
  5. Single Precision Floating Point performance using Floating Point Operator core with 4 DSP slices
  6.  Half Precision Floating Point performance using Floating Point Operator core with 2 DSP slices

To learn more about Xilinx SoCs and MPSoCs, go to:

DSP in the Processing Subsystem

The Processing System (PS) provides DSP processing capabilities by way of the different ARM processing cores.

For more information on DSP capabilities in the ARM processors, visit:

Some useful examples can be found at the following locations:

For Zynq UltraScale+ MPSoC, see UG1211 for a demonstration of an FFT using the ARM NEON instruction set.

For Zynq-7000 SoC, the following Tech Tips are available on Xilinx wiki when targeting the Cortex-A9 and ARM SIMD:

Xilinx Data-type Support

Xilinx has very flexible data-type support in their All Programmable devices. Varying precisions of Fixed Point, Floating Point and Integer are supported natively in Xilinx tools with Floating Point being implemented with the aid of the Floating Point Operator IP core.

Floating Point designs implemented on FPGAs will always lead to higher resource and power usage compared to Fixed Point or Integer implementations. Converting to a fixed point solution where possible will bring large benefits:

  • Fewer FPGA resources
  • Lower power
  • Lower cost

For more details on the benefits of converting from floating point to fixed point data types, please read WP491.

Benchmarks

The below tables show a small selection of algorithms and possible performance improvements by using a Xilinx All Programmable device and in particular the fabric in the programmable logic (PL) to accelerate the design.

Algorithm CPU/GPU Zynq UltraScale+ MPSoC Advantage
Stereo LocalBM @ 2K ARM: 0.5 FPS/Watt
nVidia: 3.5 FPS/Watt
146 FPS/Watt 292x
42x
Optical Flow
(Lucas-Kanade)
ARM: 0.1 FPS/Watt
nVidia: 0.8 FPS/Watt
7.1 FPS/Watt 9.3x
GoogleNet
(Batch=1)
ARM: 0.1 Imgs/s/w
nVidia: 8.8 Imgs/s/w
53 Imgs/s/w 530x
6x

Note 1: ARM: Quad-core A53 run on Raspberry Pi @ 1200MHz
Note 2: Nvidia benchmarks were done using Tegra X1
Note 3: Optical Flow (LK) – Window Size 11x1

Algorithm CPU/DSP Zynq-7000 Advantage
Forward Projection ARM: 3 sec/view 0.016 sec/view 188x
Motion Detection ARM: 0.7 FPS 67 FPS 90x
Noise Reduction-Sobel ARM: 1 FPS 67 FPS 60x
Canny Edge Detection ARM: 0.66 FPS 40 FPS 45x
3D Image Reconstruction ARM: 75k 8k 9x
DPD ARM: 506 ms 31.3 ms 16x
FIR TI DSP: 64020 ns 1200 ns 53x
FFT TI DSP: 1036 ns 128 ns 8x

Note 1: Cortex-A9 core used only on the Zynq devices when targeting ARM
Note 2: TI benchmarks were done using C66 DSP core

Xilinx high-level design tools like Vivado System Generator for DSP and Vivado High Level Synthesis provide a level of abstraction that empower system architects and domain experts to rapidly evaluate new algorithms and focus on developing the differentiating parts of their design. The complete Xilinx DSP solution is a combination of these design tools, IP, reference designs, methodologies and boards that work together to get to a working production design in the shortest time possible.

Vivado System Generator for DSP

The Vivado System Generator for DSP is a Model-Based design tool that leverages the MATLAB and Simulink environment to define, test and implement production quality DSP algorithms in programmable logic in a fraction of traditional RTL development times.

Dirgam show how the he Vivado System Generator for DSP is a Model-Based design tools workflow

The tool provides:

  • 100+ optimized DSP blocks, many with C simulation models for 2-3X faster simulation vs RTL
  • Integration of RTL, IP, Simulink, MATLAB and C/C++ components of a DSP system
  • Bit and cycle accurate floating and fixed-point simulations
  • Hardware co-simulation to accelerate simulation and validate algorithm on working hardware
  • Automatic code generation from Simulink to packaged IP or low-level HDL
  • Automatic generation of HDL test bench, including test vectors

Learn more about Vivado System Generator for DSP:

Vivado High Level Synthesis

Vivado High-Level Synthesis, included as a no cost upgrade in all Vivado HLx Editions, enables portable C, C++ and System C algorithm specifications to be directly targeted into Xilinx devices without the need to create RTL. Just as there are compilers from C/C++ to different processor architectures, the HLS compiler provides the same functionality from C/C++ to Xilinx FPGAs.

Learn more about Vivado High Level Synthesis:

Tools

Xilinx provides best-in-class tools to enable Digital Signal Processing (DSP) applications to be implemented efficiently and at low power on a Xilinx FPGA or All Programmable SoC. Whether you are designing with RTL, C/C++/SystemC or Matlab/Simulink, the Xilinx tools below can easily facilitate your DSP design and reduce your time-to-market.

Libraries and Frameworks

Xilinx offers a range of libraries which are optimized for performance, resource utilization and ease of use.

Embedded Vision Solutions

  • OpenCV libraries
  • Deep learning inference framework supporting Caffe
  • Design examples such as optical flow, stereo vision and CNN-based scene segmentation

For more information, visit the reVISION Zone.

Reconfigurable Acceleration Stack

The Xilinx Reconfigurable Acceleration Stack enables the world’s largest cloud service providers to develop and deploy acceleration platforms at cloud scale and delivers ultimate flexibility for complex cloud computing applications like machine learning, data analytics, and video transcoding.

For more information, visit the Acceleration Zone.

GitHub Repositories

Xilinx has created GitHub repositories which contain useful examples for many applications including DSP related functions.

Xilinx and its partners work together to produce tools and boards to ease the adoption of Xilinx FPGAs and SoCs for DSP applications across many market segments.

Partners

Avnet DSP-Centric Development Kits and Modules

Through long-established collaboration with Xilinx, MathWorks and leading high-speed analog suppliers, Avnet offers DSP-centric development kits and production-ready system-on-modules (SOM) for embedded vision, software-defined radio and high-performance motor control. With a global team of over 150 field applications engineers and DSP specialists across multiple design centers, Avnet can support your DSP designs from concept to production.

Visit Avnet DSP solutions

Mathworks Computing Software

Mathworks MATLAB® and Simulink® can reduce FPGA and SoC system development time significantly by enabling users to:

  • Create complex signal and image processing, communications, and control algorithms, incorporating custom and pre-defined functions and blocksets
  • Validate system requirements early in the development process through Model-Based Design and system-level simulation
  • Generate and verify HDL and C code targeting Xilinx All-Programmable FPGA and SoC platforms for rapid prototyping
  • Tune and optimize FPGA performance by incorporating System Generator for DSP blocks in your design

More information

Analog Devices Add-On Boards

The AD-FMCDAQ2-EBZ FMC board, in conjunction with Xilinx FPGAs, enables wideband data conversion, internal clocking, and power to closely approximate real-world hardware and software for DSP design. The board is a self-contained data acquisition and signal synthesis prototyping platform supporting ease of use operation enabling quicker end system signal processing development.

  • An FMC footprint of 84 mm × 69 mm
  • AD9680 features a 14-bit, 1.0 GSPS, JESD204B ADC
  • AD9144 features a quad, 16-bit, 2.8 GSPS, JESD204B DAC
  • AD9523-1 is driven by a 14-output, 1GHz clock,
  • Power management components.

More information

Xilinx Boards and Kits

Whether you are in the concept phase or want to speed time-to-market with a production board or module, Xilinx All Programmable FPGA and SoC boards, kits, and modules, along with Xilinx ecosystem partners, offer a comprehensive set of hardware platforms.

Kintex UltraScale DSP Kit with 8 Lane JESD204B interface

The video highlights the Xilinx Kintex UltraScale FPGA Analog Devices JESD204B DSP Kit featuring the Xilinx Kintex UltraScale KCU105 development board with the KU40 device paired up with the Analog Devices AD-FMCDAQ2-EBZ high-speed analog FMC module.

05:18

Xilinx is also working closely with a range of partners to provide an extensive range of FPGA Mezzanine Cards.

Page Bookmarked