Comparison of Edge AI Platforms

Debmalya Biswas May 31, 2021

Abstract

This article compares a couple of common Edge AI platforms in terms of hardware, performance, price and development environment.

Edge AI is a very exciting field today, with a lot of development and innovations coming. For years, there has been a clear tendency for machine learning prediction to move down to embedded hardware that is closer to the user, does not need to be network connected, and can work in real time solving complex problems (e.g., autonomous driving). There are many new frameworks and engines with much smaller model footprint designed specifically to run on Edge devices. Also, it is much easier to address the very important issue of privacy and security of users when their personal data does not leave the edge devices. Complex algorithms analyzing inference results could be executed on the edge device, sending only final obfuscated information to the cloud (for example, alarm for some unusual situation).

For the performance demo of the boards I have used a well known open source NCNN high-performance neural network inference computing framework optimized for mobile and embedded platforms (https://github.com/Tencent/ncnn). Its cross compilation for platforms from this article did not require much effort (standard cmake configuration and build with cross compilation tools and sysroot configured) and for the purpose of simple board performance comparison NCNN default inference benchmarks have been executed (benchncnn). NCNN benchmark is built statically with most of its dependencies, so in general it is not hard to make it run in the standard embedded linux system.

In our development we often utilize Bonseyes Developer Platforms (https://www.bonseyes.com/) that enable easy cross compilation of various machine learning applications using the same tools. Bonseyes platform comes with tools and docker images with ready to use cross compilation environment for many embedded platforms. It enables the build of the platform image, setup of the target board and cross compilation of the custom applications, which is performed in a platform transparent way by building applications inside a container.

Raspberry Pi 4

Famous platform that made embedded hobby applications easily accessible and popular could also be used for moderately complex machine learning applications. With the affordable price ~50$ it is a nice tool for hobbyists entering Edge computing.

Raspberry Pi 4B
Raspberry Pi 4B

Hardware Specifications

CPUBroadcom
BCM2711B0 quad-core A72 (ARMv8-A) 64-bit @ 1.5GHz
GPUBroadcom VideoCore VI
Mermory1GB, 2GB, or 4GB LPDDR4 SDRAM, microSD slot
Displaymicro-HDMI 2.0
Misc.Ethernet, Bluetooth, GPIO, USB 2.0, USB 3.0, Camera Serial Interface (CSI), Display Serial Interface (DSI)
Raspberry Pi 4 Hardware Specifications

There are 2 operating system distributions based on Linux available - official Raspberry Pi distribution Raspberry Pi OS (Raspbian), and Ubuntu Raspberry Pi port. Raspbian is a bit more user friendly in configuration and managing target board (easier for newcomers), but Ubuntu has wider support for external applications and libraries with the access to standard Ubuntu ARM package repositories. It may be more convenient for AI applications porting and cross compilation. Both distributions have 32-bit and 64-bit versions. From our experience 64-bit systems are up to 50% faster in machine learning applications benchmarks compared to 32-bit counterparts.

Open source initiative to develop Vulkan driver for Raspberry Pi 4 GPU has reached its official release by passing 100,000 tests in the Khronos Conformance Test Suite for Vulkan 1.0. Unfortunately, Vulkan driver development was focused on 32-bit platform and has very scarce support for 64-bit OS.

In general, Ubuntu (or other well supported linux distribution) is a good cross compilation environment for Raspberry Pi 4. GNU cross compilation toolchain could be easily installed from the official repositories (g++-aarch64-linux-gnu and gcc-aarch64-linux-gnu). One good way to implement cross compilation environment is to create Docker image, install all the host development tools there, and use Ubuntu multiplatform repositories support to create target system root, and let Ubuntu apt tool resolve and download all dependencies of target board libraries needed for that sysroot to be fully functional (more on this topic in some future article).

NCNN benchmarks result for the Raspberry Pi 4B board are given in the following table:

1 thread CPU (avg ms)4 thread CPU (avg ms)
mobilenet150.2076.95
shufflenet69.2054.03
resnet18_int8301.16144.45
resnet50722.21424.93
NCNN benchmark results on Raspberry Pi 4B

Jetson AGX Xavier Developer Kit

Nvidia flagship product targeting mainly Edge machine learning applications is based on NVIDIA® Jetson AGX Xavier module. It is a capable platform coming with a rich Nvidia ecosystem of tools and libraries for development of AI applications.

NVIDIA Jetson AGX Xavier Developer Kit
NVIDIA Jetson AGX Xavier Developer Kit

Hardware Specifications

CPU8-core ARM v8.2 64-bit CPU, 8MB L2 + 4MB L3
GPU512-core Volta GPU with Tensor Cores
Memory32GB 256-Bit LPDDR4x | 137GB/s, 32GB eMMC 5.1
DisplayHDMI
MiscDeep learning accelerator (NVDLA Engines), Ethernet, 4k hardware video decoders and encoders, 2xUSB 3.1, Camera Connector (CSI-2 Lanes), PCIe X16, other extensions
Jetson AGX Xavier hardware specifications

Easy to use NVIDIA SDK Manager that runs on PC workstation supports download and installation of the latest available operating system (currently Ubuntu 18.04) with additional libraries and drivers related to the hardware on board. Many pre-installed libraries have additional support for Nvidia hardware (for example gstreamer hardware encoding/decoding).

Volta GPU with 512 CUDA cores gives decent computation and processing power. It supports CUDA Compute Capability 7.0. In comparison, the famous NVIDIA GTX 1060 PC GPU has Compute Capability 6.0 support and 1280 CUDA Cores. Also, as CUDA is the first general purpose GPU programming platform and is dominantly used in the machine learning field in the previous decade, so this platform has the unique advantage to easily use existing AI frameworks and libraries directly on board with only simple cross compilation. Ubuntu system running on target supports easy installation with the apt tool of many packages available on Ubuntu ARM repositories. In general, due to its power, easy setup and package availability, the target board itself could be used for compiling native custom applications, without setting a cross compilation environment on the PC workstation. Process usually involves cloning github repository directly on board, running cmake to check for missing dependencies, using apt-get tool to install those dependencies and perform reconfiguration and then executing make to build the application. Compilations on AGX Xavier are an order of magnitude slower than on high end modern PC workstation but for moderately sized projects it is an easy way to test and try new things.

AGX Xavier also supports Tensor RT, NVIDIA engine for running optimized AI inference. It compiles standard model formats (for example ONNX) to highly optimized code that could be executed on GPU or deep learning accelerators. More info about TensorRT could be found on Nvidia website.

The price of the AGX Xavier Developer Kit is around 850$ which hardly makes it your first hobby kit for Edge AI. Its cheaper (100$) cousin NVIDIA Jetson Nano lacks so good GPU (it is based on Maxvell architecture - CUDA Compute Capability 5.2 with 128 cores) and deep learning accelerators with TensorRT support, but development environment and tools/operating system are the same, so it is a good strategy to start simple with Nano board and then when machine learning application complexity starts hurting performances easily switch to AGX Xavier platform.

NCNN benchmarks result for the AGX Xavier Developer Kit are given in the following table:

8 thread CPU (avg ms)1 thread Vulkan (avg ms)
mobilenet26.6410.47
shufflenet46.626.11
resnet18_int892.74327.73
resnet50118.5318.47
NCNN benchmark results for the AGX Xavier Developer Kit

NXP i.MX 8 Multisensory Enablement Kit (MEK)

NXP has a series i.MX 8 of application processors for advanced graphics, imaging, machine vision and safety critical applications and a family of demo/target platforms with various peripherals based on these CPUs. NXP i.MX 8 Multisensory Enablement Kit (MEK) is a convenient platform for machine learning applications.

NXP i MX 8 Multisensory Enablement Kit (MEK)
NXP i MX 8 Multisensory Enablement Kit (MEK)

It is quite powerful with many included hardware/extension capabilities:

CPUNXP i.MX 8 2x Cortex-A72 and 4x Cortex-A53 cores, 2x Cortex-M4
GPU16 Vec4-Shader GPU (32 compute units OpenGL* ES 3.2 and Vulkan)
Memory5.5 GB LPDDR4 memory, 32 GB eMMC 5.1, 64 MB Octal SPI Flash
Display2x MIPI / LVDS connectors, Camera MIPI-CSI 4-lanes each
Misc.Ethernet, USB 3.0 Type C, Serial to USB, SD card slot, etc.
NXP i.MX 8 MEK hardware specifications

With the price of 1000+$, it is meant to be used as a professional platform.

On NXP website there is extensive documentation about this board, and also available Linux images that could be easily burned to the SD card on PC with tool like Etcher. Operating system used on board is NXP customization of Yocto linux based on the default Poky Linux distribution. This customization adds many libraries common in Edge AI applications (opencv, gstreamer, qt, working Vulkan driver).

Unboxing and booting the MEK board from pre pre-burned SD card is easy, but setting the workstation PC as a cross compilation environment is much more work. It requires using Yocto to build sysroot and SDK that could cross compile custom applications to run on this board. Yocto is a very complex build system that gives great power in the ability to customize every aspect of the board image and system libraries, but has a steep learning curve and requires a lot of time to understand and practise. Customizing the default NXP iMX8 build (for example if some additional open source library is missing) is quite tedious. Yocto downloads source code and builds everything from scratch (including GNU cross compilation toolchain tailored for the build computer environment) and on a high end modern personal computer this initial build job takes the whole day to complete.

NCNN benchmarks executed on platform gives following results:

4 thread A53 (avg ms)6 thread A53+A72 (avg ms)1 thread Vulkan (avg ms)
mobilenet79.1753.96202.99
shufflenet54.6541.64290.38
resnet18_int8220.75154.37262.73
resnet50427.56427.56427.56
NCNN benchmarks results executed on NXP i.MX8

Conclusion

For those who want to jump in the exciting field of Edge AI, there are many easily available target platforms to choose from. We have compared a couple of platforms. For hobby/demo purposes there are cheap but machine learning ready Raspberry PI 4 and Nvidia Jetson Nano. Nvidia AGX Xavier is Nvidia’s flagship product with convenient tooling and Ubuntu OS, good GPU and CUDA support. NXP i.MX 8 is a more complex platform to handle and it targets industrial and commercial applications.

Manufacturing, Quality Control

Visual Quality Inspection​

View project