LVC21-113: TensorFlow Lite Delegates on Arm-based Devices

Session Abstract

TensorFlow Lite is a popular open-source framework that enables Machine Learning on mobile, embedded, and IoT devices by default using the Arm NEON instruction set. The delegate mechanism takes TensorFlow Lite one step further and provides a mechanism to use on-device hardware accelerators such as the GPU, NPU, or Digital Signal Processor. Session Detail: The session focuses on TensorFlow Lite, and especially its delegate mechanism. Tensorflow Lite is a widely used open-source framework developed by Google for mobile, embedded, and IoT machine learning. By default, TensorFlow Lite kernel implementation is optimized using Arm NEON instruction set for Arm Cortex-A cores. The delegate mechanism provides the means to leverage the computational power of on-device accelerators such as the GPU, NPU, or DSP and offloads the execution of machine learning kernels to another framework, or an optimized kernel library. Such examples are the Arm NN delegate developed by Arm for the Linaro Machine Learning initiative, NN API delegate, or XNNPACK delegate. We will briefly cover what is TensorFlow Lite and then dive deeper into the delegate mechanism. We will go over several examples of TF Lite delegates. The new Arm NN delegate, which was released in Arm NN 20.11 (an inference engine for machine learning developed by Linaro Machine Learning Initiative). The NN API delegate, which is one of the default delegates designed for Android and used for acceleration on the Arm-based NXP i.MX8 platform. We will also go over why should a developer consider implementing a custom delegate, how to do that. In the end, we will discuss delegate performance and operation support. Flow: 1) Introduction to TF Lite A brief introduction to Google’s embedded machine learning framework. 2) Introduction to Delegates An introduction to the delegate mechanism, which offloads the execution of TF Lite machine learning kernels to another framework, driver, hardware, or other mechanisms, which enable acceleration. 3) NN API delegate A default delegate integrated into TF Lite, which offloads the execution using Android NN API. 4) Arm NN delegate (new in Arm NN 20.11) A newly implemented delegate in Arm NN 20.11, which offloads the execution to the Arm NN framework. 5) Performance benchmarking and operation support discussion Options to benchmark the performance of the delegate. Handling of delegate unsupported operators and graph partitioning. 6) Implementing a custom delegate Considerations why to implement a custom delegate, how to implement such a delegate, and a simple example.

Session Speakers

Pavel Macenauer

NXP (Senior Software Engineer)

Pavel currently develops accelerated ML backends running on GPU/NPUs and enables NXP's eIQ Machine Learning platform. He actively contributes to Linaro's Arm NN framework and as such he was one of the developers contributing to the Python enablement in its latest release. His past experiences involve the development of safety-critical RTOS/display systems for Honeywell Aerospace or image processing applications for photographers.

TensorFlow Lite is a popular open-source framework that enables Machine Learning on mobile, embedded, and IoT devices by default using the Arm NEON instruction set. The delegate mechanism takes TensorFlow Lite one step further and provides a mechanism to use on-device hardware accelerators such as the GPU, NPU, or Digital Signal Processor.

Session Detail: The session focuses on TensorFlow Lite, and especially its delegate mechanism. Tensorflow Lite is a widely used open-source framework developed by Google for mobile, embedded, and IoT machine learning. By default, TensorFlow Lite kernel implementation is optimized using Arm NEON instruction set for Arm Cortex-A cores. The delegate mechanism provides the means to leverage the computational power of on-device accelerators such as the GPU, NPU, or DSP and offloads the execution of machine learning kernels to another framework, or an optimized kernel library. Such examples are the Arm NN delegate developed by Arm for the Linaro Machine Learning initiative, NN API delegate, or XNNPACK delegate.

We will briefly cover what is TensorFlow Lite and then dive deeper into the delegate mechanism. We will go over several examples of TF Lite delegates. The new Arm NN delegate, which was released in Arm NN 20.11 (an inference engine for machine learning developed by Linaro Machine Learning Initiative). The NN API delegate, which is one of the default delegates designed for Android and used for acceleration on the Arm-based NXP i.MX8 platform. We will also go over why should a developer consider implementing a custom delegate, how to do that. In the end, we will discuss delegate performance and operation support.

Flow: 1) Introduction to TF Lite A brief introduction to Google’s embedded machine learning framework.

2) Introduction to Delegates An introduction to the delegate mechanism, which offloads the execution of TF Lite machine learning kernels to another framework, driver, hardware, or other mechanisms, which enable acceleration.

3) NN API delegate A default delegate integrated into TF Lite, which offloads the execution using Android NN API.

4) Arm NN delegate (new in Arm NN 20.11) A newly implemented delegate in Arm NN 20.11, which offloads the execution to the Arm NN framework.

5) Performance benchmarking and operation support discussion Options to benchmark the performance of the delegate. Handling of delegate unsupported operators and graph partitioning.

6) Implementing a custom delegate Considerations why to implement a custom delegate, how to implement such a delegate, and a simple example.

comments powered by Disqus

Recent Posts

Other Posts

Sign up. Receive Updates. Stay informed.

Sign up to our mailing list to receive updates on the latest Linaro Connect news!