LVC21F-201 Apache Arrow: A unified language across Big Data

Session Abstract

Level: Intermediate  Apache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to both improve the performance of analytical algorithms and the efficiency of moving data from one system to another. Apache Arrow defines a language-independent columnar memory format for efficient analytic operations without serialization overhead. The contiguous columnar layout also enables vectorization using SIMD instructions in modern processors. We have been actively contributing to Apache Arrow for about two years. Our main focus is performance optimization in Arrow compute kernel, FlightRPC and Parquet modules. With our work, Arrow is seeing big performance boost, especially on Arm platforms. In this talk, we will: - Introduce Apache Arrow project and its ecosystem - Deep dive into Arrow columnar data format - Share our experience in Arrow performance optimization Through this talk, the audience will learn Apache Arrow essentials and how it pushes OLAP workloads performance to the extreme. And our software optimization practice may also be helpful to other projects.

Session Speakers

Yibo Cai

Principal Software Engineer from Arm (Arm)

Yibo is principal software engineer from Arm. He is Apache Arrow committer and an active contributor to big data and cloud storage open source projects.

Level: Intermediate 

Apache Arrow is a software development platform for building high performance applications that process and transport large data sets. It is designed to both improve the performance of analytical algorithms and the efficiency of moving data from one system to another. Apache Arrow defines a language-independent columnar memory format for efficient analytic operations without serialization overhead. The contiguous columnar layout also enables vectorization using SIMD instructions in modern processors. We have been actively contributing to Apache Arrow for about two years. Our main focus is performance optimization in Arrow compute kernel, FlightRPC and Parquet modules. With our work, Arrow is seeing big performance boost, especially on Arm platforms. In this talk, we will: - Introduce Apache Arrow project and its ecosystem - Deep dive into Arrow columnar data format - Share our experience in Arrow performance optimization Through this talk, the audience will learn Apache Arrow essentials and how it pushes OLAP workloads performance to the extreme. And our software optimization practice may also be helpful to other projects.

comments powered by Disqus

Recent Posts

Other Posts

Sign up. Receive Updates. Stay informed.

Sign up to our mailing list to receive updates on the latest Linaro Connect news!