LVC21F-211 Tiny ONNC: MLIR-based AI compiler for ARM IoT devices

Session Abstract

Level: Intermediate Tiny ONNC is an MLIR-based compiler exporting deep neural networks (DNN) into function calls to the ARM CMSIS-NN library. MLIR is a high-quality compiler framework addressing software fragmentation issues. By supporting variant Intermediate Representations in a single infrastructure, compilers can transform variant input languages into a common output form. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Open Neural Network Exchange Format (ONNX), Tensorflow, TensorflowLite, and even TVM relay. Tiny ONNC transforms all the input DNN formats into a function composed of a series of function calls to the ARM CMSIS-NN library. “One fits all,” MLIR makes it possible. Tiny ONNC has enormous optimization approaches, such as automatic operator splitting and tensor splitting, addressing on memory constraints of microcontrollers. When an operator or a tensor is too big to fit in the cache, Tiny ONNC separates the big objects into small pieces and reorganizes networks for reusing the memory. Tiny ONNC also supports operators that are not directly supported by CMSIS-NN by mathematical equivalent or approximate transformations. These optimization approaches deliver strong empirical performances while keeping high memory utilization and high performance at the same time. On the MLPerf Tiny benchmark, Tiny ONNC achieves the same level (<2%) as TensorflowLite for Microcontrollers (TFLM) in terms of performance and precision. In the best case, the memory footprint of the generated program is only 3/5 of TFLM, and the code size is only 1/10 of TFLM. In this talk, we will introduce MLIR first, see how it works in Tiny ONNC. And then we will dive into memory optimization strategies and approaches. Last, we will explain the experiment results to see how Tiny ONNC outperforms its rivals.

Session Speakers

Peter Chang

Co-founder and Technical Marketing Manager, Skymizer (Skymizer)

Peter is the co-founder of Skymizer Taiwan Inc. His research interests span areas in operating systems, virtualization, and computer architecture. Currently, he focuses on topics in AI hardware/software co-design and AI system software, including AI compiler and runtime. He was also the maintainer of SkyPat, an open-source performance unit-test suite, and one of the maintainers of ARMvisor, one of the Kernel-based Virtual Machine solutions on ARM architecture.

Level: Intermediate Tiny ONNC is an MLIR-based compiler exporting deep neural networks (DNN) into function calls to the ARM CMSIS-NN library. MLIR is a high-quality compiler framework addressing software fragmentation issues. By supporting variant Intermediate Representations in a single infrastructure, compilers can transform variant input languages into a common output form. Tiny ONNC leverages the unique power of MLIR to support rich neural network frameworks, including PyTorch, Open Neural Network Exchange Format (ONNX), Tensorflow, TensorflowLite, and even TVM relay. Tiny ONNC transforms all the input DNN formats into a function composed of a series of function calls to the ARM CMSIS-NN library. “One fits all,” MLIR makes it possible.

Tiny ONNC has enormous optimization approaches, such as automatic operator splitting and tensor splitting, addressing on memory constraints of microcontrollers. When an operator or a tensor is too big to fit in the cache, Tiny ONNC separates the big objects into small pieces and reorganizes networks for reusing the memory. Tiny ONNC also supports operators that are not directly supported by CMSIS-NN by mathematical equivalent or approximate transformations.

These optimization approaches deliver strong empirical performances while keeping high memory utilization and high performance at the same time. On the MLPerf Tiny benchmark, Tiny ONNC achieves the same level (<2%) as TensorflowLite for Microcontrollers (TFLM) in terms of performance and precision. In the best case, the memory footprint of the generated program is only 3/5 of TFLM, and the code size is only 1/10 of TFLM.

In this talk, we will introduce MLIR first, see how it works in Tiny ONNC. And then we will dive into memory optimization strategies and approaches. Last, we will explain the experiment results to see how Tiny ONNC outperforms its rivals.

comments powered by Disqus

Other Posts

Sign up. Receive Updates. Stay informed.

Sign up to our mailing list to receive updates on the latest Linaro Connect news!