# Machine Learning for Embedded Demo

Quenton Hall Avnet Field Applications Engineer | ML Specialist Boston | March 14

Slide and diagram credits: Kaiming He | Xiangyu Zhang | Shaoqing Ren | Jian Sun | Clayton Cameron | Michaela Blott | Andy Luo



# **Neural Nets - A Nickel Tour**



# Convolutional Neural Networks (CNNs) from a computational point of view

- > CNNs are usually feed forward\* computational graphs constructed from one or more layers
  - >> Up to 1000s of layers
- > Each layer consists of neurons *ni* which are interconnected with synapses, associated with weights *wij*



- >> Typically linear transform (dot-product of receptive field)
- >> Followed by a non-linear "activation" function



Synapse with weight *wji* 



>> 3

\* With exception of RNNs

### Fully Connected Layers (aka inner product or dense layers)

#### > Each input activation is connected to every output activation

- >> Receptive field encompasses the full input
- > Can be written as a matrix-vector product with an elementwise non-linearity applied afterwards.

#### > Implementation Challenges

- >> Connectivity
- >> High weight memory requirement: #IN \* #OUT \* BITS
- >> Low arithmetic intensity assuming weights off-chip
  - 2 \* #IN\* #OUT / #IN \* #OUT \* BITS/8

|      | IN:   | number of input channels    |
|------|-------|-----------------------------|
| >> 4 | OUT:  | number of output channels   |
|      | BITS: | bit precision in data types |

$$\left( \begin{array}{c} i0 \ i1 \ i2 \end{array} \right) \times \left( \begin{array}{c} W00 \ W01 \ W02 \ W03 \\ W10 \ W11 \ W12 \ W13 \\ W20 \ W21 \ W22 \ W23 \end{array} \right) = \left( n0' n1' n2' n3' \right)$$

$$(n0 n1 n2 n3) = Act(n0'n1'n2'n3')$$

|          | <b>CONV WEIGHTS</b> |                |
|----------|---------------------|----------------|
| MODEL    | (M)                 | FC WEIGHTS (M) |
| ResNet50 | 23.454912           | 2.048          |
| AlexNet  | 2.332704            | 58.621952      |
| VGG16    | 14.710464           | 123.633664     |



### **Convolutional Layers Example 2D Convolution**

- > Convolutions capture some kind of locality, spatial or temporal, that we know exists in the domain
- > Receptive field of each neuron reduced
  - >> Applying convolution to all images in the previous layer
- > Weights represent the filters used for convolutions



# **2D Convolutional Layers**

#### > Slide the window till one feature map is complete

>> With a given stride size



# **2D Convolutional Layers**

#### > Compute next channel



**NNs in More Detail** 



Activation & Batch Normalization

>> 8

# **ResNet – A brief history**





## **Image Classification - ImageNet**

#### > In 2009, Fei-Fei Li introduced the ImageNet dataset

>> >14 Million images, 40000 object classes

#### > ImageNet Large Scale Visual Representation Challenge – ILSVRC

» Subset of 1000 object classes. 1.2 Million images



Image Credit: Kaiming He <u>http://kaiminghe.com/</u>

#### imagenet1000\_clsid\_to\_human.txt

| 1  | {0: 'tench, Tinca tinca',                                                                 |
|----|-------------------------------------------------------------------------------------------|
| 2  | 1: 'goldfish, Carassius auratus',                                                         |
| 3  | 2: 'great white shark, white shark, man-eater, man-eating shark, Carcharodon carcharias', |
| 4  | 3: 'tiger shark, Galeocerdo cuvieri',                                                     |
| 5  | 4: 'hammerhead, hammerhead shark',                                                        |
| 6  | 5: 'electric ray, crampfish, numbfish, torpedo',                                          |
| 7  | 6: 'stingray',                                                                            |
| 8  | 7: 'cock',                                                                                |
| 9  | 8: 'hen',                                                                                 |
| 10 | 9: 'ostrich, Struthio camelus',                                                           |
| 11 | 10: 'brambling, Fringilla montifringilla',                                                |
| 12 | 11: 'goldfinch, Carduelis carduelis',                                                     |
| 13 | 12: 'house finch, linnet, Carpodacus mexicanus',                                          |
| 14 | 13: 'junco, snowbird',                                                                    |
| 15 | 14: 'indigo bunting, indigo finch, indigo bird, Passerina cyanea',                        |
| 16 | 15: 'robin, American robin, Turdus migratorius',                                          |
|    |                                                                                           |

>> 10

# **ResNets**

#### > Deep networks suffer from the "vanishing gradient" problem

- During back propagation, weight values in deep networks may not change significantly during the backward pass
  - Impacts our ability to train deep networks
- > ResNet was the first network architecture to employ "skip connections" which made it possible to train deeper networks with higher accuracy
  - https://arxiv.org/abs/1512.03385
- > ResNet50 is so called because the architecture includes 50 convolution layers





**E** XILINX.

>> 11

# **DNNDK ResNet Inference**



For ResNet50:

70 Layers

7.7 Billion operations

25.5 MBytes of weight storage\*

10.1 MBytes for activations\*

\*Assuming int8

>> 12

# Network Inference with DNNDK





DPU Data Flow



Slide and animation credit – Clayton Cameron and family



**DPU Data Flow** 

Slide and animation credit – Clayton Cameron and family



**DPU Data Flow** 

Slide and animation credit – Clayton Cameron and family



Slide and animation credit – Clayton Cameron and family



Slide and animation credit – Clayton Cameron and family



Slide and animation credit – Clayton Cameron and family





**PS DDR** 

Memory

Controller

ΡE



Slide and animation credit – Clayton Cameron and family



top[4] prob = 0.000005 name = Irish water spaniel



Slide and animation credit – Clayton Cameron and family

**E** XILINX.

DDR

# **DNNDK Highlights**





# Quantization

#### **Quantization Strategy**

#### > Our Quantization Strategy

- >> Uniform Symmetric Quantization → 8Bit for Our DPU
- >> Scale = 2<sup>N</sup>



#### > Advantages

- » Hardware friendly
- >> High efficiency: all fix-point calculation
- » Make use of redundancy of CNN models(especially with BatchNorm Layers)

#### **Two quantization methods**

- > Non-Overflow Method:
  - » Choose quantize pos -> all values does not overflow
  - » No saturation
  - Sensitive to large values
- > Min-Diff Method:
  - » Pos = Minimize∑(X<sub>gi</sub>-X<sub>fi</sub>)<sup>2</sup>
  - » Need saturated truncation



#### Quantization Tool - decent\_q



# **Compilation – ResNet50 Example**

| DNNC][Warning]                                | layer [prob] is not supported in DPU, deploy it in CPU instead.<br>Fail to convert gv file to jpg because 'dot' is not installed in current system. Try to install it using 'sudo apt-get instal<br>e original gv file is saved in 'resnet50_kernel_graph.gv'. |  |
|-----------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| NNC Kernel Inf                                | ormation                                                                                                                                                                                                                                                       |  |
| . Overview<br>ernel numbers<br>ernel topology | : 4<br>: resnet50_kernel_graph.jpg                                                                                                                                                                                                                             |  |
|                                               | iption in Detail                                                                                                                                                                                                                                               |  |
| ernel id<br>ernel name                        | : 0<br>: resnet50 0                                                                                                                                                                                                                                            |  |
| ype                                           | : Teshelou : DPUKernel                                                                                                                                                                                                                                         |  |
| odes                                          | : NA                                                                                                                                                                                                                                                           |  |
| nput node(s)                                  | : conv1(0)                                                                                                                                                                                                                                                     |  |
| utput node(s)                                 | : res5c_branch2c(0)                                                                                                                                                                                                                                            |  |
| ernel id                                      | : 1                                                                                                                                                                                                                                                            |  |
| ernel name                                    | : resnet50_1                                                                                                                                                                                                                                                   |  |
| ype                                           | : CPUKernel                                                                                                                                                                                                                                                    |  |
| odes<br>nput node(s)                          | : NA<br>: pool5                                                                                                                                                                                                                                                |  |
| utput node(s)                                 | ; pool5                                                                                                                                                                                                                                                        |  |
|                                               |                                                                                                                                                                                                                                                                |  |
| ernel id                                      | ; 2                                                                                                                                                                                                                                                            |  |
| ernel name<br>ype                             | : resnet50_2<br>: DPUKernel                                                                                                                                                                                                                                    |  |
| odes                                          | NA                                                                                                                                                                                                                                                             |  |
| nput node(s)                                  | : fc1000(0)                                                                                                                                                                                                                                                    |  |
| utput node(s)                                 | : fc1000(0)                                                                                                                                                                                                                                                    |  |
| ernel id                                      | : 3                                                                                                                                                                                                                                                            |  |
| ernel name                                    | resnet50 3                                                                                                                                                                                                                                                     |  |
| уре                                           | : CPUKernel                                                                                                                                                                                                                                                    |  |
| odes                                          | : NA                                                                                                                                                                                                                                                           |  |
| nput node(s)<br>utput node(s)                 | : prob<br>: prob                                                                                                                                                                                                                                               |  |
| reput noue(s)                                 |                                                                                                                                                                                                                                                                |  |

# **B4096 ResNet Deployment in DNNDK v2.08**



### **Typical Evaluation / Development Environment**



**Live Demo** 

>> 27



#### **DeePhi DSight**

DPU Utilization: Core0: 40.2% Schedule Effeciency: Core0: 28.3%





# What have we accomplished

- > Demonstrated DECENT model quantization flow
- > Demonstrated DNNC model compilation flow
- > Demonstrated ResNet50 model deployment on the ZCU102
- > Demonstrated Dsight profiling flow

# Resources

#### Edge AI Resources

The following resources are available to help you start developing with the Edge AI Platform.

#### Edge AI Tools

| Product            | Documentation                          | Tool Download                              | File Size  | MD5 Checksum                     |
|--------------------|----------------------------------------|--------------------------------------------|------------|----------------------------------|
| DNNDK              | DNNDK User Guide (UG1327)              | xinx_dnndk_v2.08_1902.tar.gz               | 1007<br>MB | cf4dade1b3af14437ae97c09691ba381 |
| DNNDK for<br>SDSoC | DNNDK User Guide for SDSoC<br>(UG1331) | xilinx_dnndk_v2.08_for_sdsoc_190214.tar.gz | 667 MB     | 7f165aff5062497e4bb69b70773c49b1 |

#### Edge AI Evaluation Boards

| Product        | Documentation              | Image Download                                    | File Size | MD5 Checksum                     |
|----------------|----------------------------|---------------------------------------------------|-----------|----------------------------------|
| ZCU102 Kit     | ZCU102 User Guide (UG1182) | 2018-12-04-zcu102-desktop-stretch.img.zip         | 571 MB    | d0d5faf8ece80b96f5591d09756d5a5d |
| ZCU104 Kit     | ZCU104 User Guide (UG1267) | 2018-12-04-zcu104-desktop-stretch.img.zip         | 571 MB    | ada2420c4afbd89efdeea741e0917e26 |
| Avnet Ultra 96 | Ultra 96 User Guide        | xilinx-ultra96-desktop-stretch-2018-12-10.img.zip | 566 MB    | c5d2422063213b4bc4c18a3223c6adc8 |

#### Edge AI Targeted Reference Designs (TRD)

| Product | Image Download & Docs          | File Size | MD5 Checksum                     |
|---------|--------------------------------|-----------|----------------------------------|
| DPU TRD | zcu102-dpu-trd-2018-2-1903.zip | 459 MB    | 872170d1038d0c824cb2c808743930e4 |

#### Platform Downloads

| Product                                | Download                         | File Size | MD5 Checksum                     |
|----------------------------------------|----------------------------------|-----------|----------------------------------|
| ZCU102 SDSoC 2018.3 Platform for DNNDK | zcu102-rv-ss-2018-3-dnndk.tar.gz | 1.3 GB    | 7102c6942eb65d8b9d258914f69c6eaa |
| ZCU104 SDSoC 2018.3 Platform for DNNDK | zcu104-rv-ss-2018-3-dnndk.tar.gz | 1.3 GB    | d5bc80aa8135a719e273e2ff6ca85762 |

<u>https://www.xilinx.com/products/design-tools/ai-inference/ai-developer-hub.html#edge</u> <u>https://forums.xilinx.com/t5/Deephi-DNNDK/bd-p/Deephi</u>

#### Community Forums > Forums > Applications > Deephi DNNDK

#### Announcements

Welcome to the Deephi DNNDK Community Forum. This community should serve as a resource to ask and learn about using Deephi DNNDK on all supported platforms, new feature announcements and troubleshooting Al applications.

#### Most Recent Threads

Before you post, please read our Community Forums Guidelines or to get started see our Community Forum Help.

| Discu | issions                                                                                                                                                  | Post a      | Question |
|-------|----------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|----------|
| ٢     | Where is correct img file for ZCU104<br>by <b>Q</b> tacbook on 03-06-2019 02:28 AM • Latest post on 03-07-2019 01:41 PM by <b>Q</b> qhall                | <b>ப்</b> 0 | Q 3      |
|       | DPU Targeted Reference Design Released to Xilinx.c<br>by <b>Q</b> qhall on 03-05-2019 03-24 PM                                                           | <b>1</b> 2  | 00       |
| (Call | zcu104 boot<br>by 👔 @xx on 03-05-2019 05:43 AM • Latest post on 03-05-2019 09:21 AM by 🚥 meherp                                                          | <b>心</b> 0  | Q 2      |
| (Tag) | os image zcu104<br>by 👔 @xx on 03-04-2019 11:27 PM • Latest post on 03-05-2019 09:29 AM by 🚥 meherp                                                      | <b>ப்</b> 0 | Q 2      |
| E.A   | Monitor flickers while runing dnndk example<br>by 👔 deepg799 on 03-04-2019 08:16 PM • Latest post on 03-07-2019 01:16 PM by 🏌 terryo                     | <b>ப்</b> 0 | Q 3      |
| 63    | Is there simple tutorial for K7 Custom board?<br>by <b>Q</b> trustfarm on 02-28-2019 11:38 PM • Latest post on 03-03-2019 11:44 PM by <b>Q</b> trustfarm | <b>ப்</b> 0 | Q 2      |
|       | https://forums.xilinx.com/t5/Deephi-DNNDK/bd-p/D                                                                                                         | eep         | hi       |

## **Resources**

| docs      | Updated ML-CIFAR10-Caffe and CATsvsD( | OGs tutorials a day                                                                                                                  |
|-----------|---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------|
| README.md | Update README.md                      | 9 days                                                                                                                               |
| README.md |                                       |                                                                                                                                      |
|           |                                       | <b>AI Tutorials</b>                                                                                                                  |
|           | Tutorial                              | Description                                                                                                                          |
|           | CIFAR10 Caffe Tutorial (UG1335)       | Train, quantize, and prune custom CNNs with the CIFAR10 dataset using Caffe and the Xilinx® DNNDK tools.                             |
|           | Cats vs Dogs Tutorial (UG1336)        | Train, quantize, and prune a modified AlexNet CNN with<br>the Kaggle Cats vs Dogs dataset using Caffe and the<br>Xilinx DNNDK tools. |
|           |                                       | Train, quantize, and compile SSD using PASCAL VOC                                                                                    |

#### https://github.com/Xilinx/Edge-AI-Platform-Tutorials

| 📋 jimheaton Add files via upload |                      | Latest commit befa659 3 days ago |
|----------------------------------|----------------------|----------------------------------|
| 💼 images                         | Add files via upload | 3 days ago                       |
| src/resnet50                     | Add files via upload | 3 days ago                       |
| E README.md                      | Add files via upload | 3 days ago                       |

E README.md

#XDF 2018 Workshop Machine Learning for Embedded on the Ultra96

#### Introduction

This lab is based on the XDF 2018 Machine learning for Embedded Workshop. It has been modified to run on the Ultra96 board.

During this session you will gain hands-on experience with the Xlinx DNNDK, and learn how to quantize, compile and deploy pre-trained network models to Xilinx embedded SoC platforms.

#### Overview of DNNDK flow

The architecture DNNDK and its development flow are pictured below:

Elements of DNNDK:



https://github.com/jimheaton/Ultra96 ML Embedded Workshop

# **Getting Started**



Purchase a supported Xilinx evaluation board (eg ZCU102, ZCU104, Ultra96)



Configure a suitable build environment



Experience and modify Xilinx DNN examples



Evaluate quantization and compilation of Xilinx examples or custom models

# Key Takeaways



DNNDK is able to deploy pre-trained DNN models to Xilinx SoC easily & quickly without writing any RTL

**E** XILINX.



DNNDK supports both local and AWS build environments



DNNDK supports deployment of DN models with no FPGA experience