There are lots of data formats in BigData world such as parquet file with Python(pandas), Spark dataframe, JSON, Avro, CSV, etc.
It would waste about 70-80% computation on data conversion and serialization/deserialization among different projects.
Apache Arrow addresses these issues and facilitates communication between many components with its high speed in-memory representation for flat and hierarchical data. It would help to get 10-100x speedup on In-Memory analytics workloads.
Collaborating with Linaro LDCG, we validated Apache Arrow on Arm64 and delivered the Arm-related optimization for Arrow.
This session will cover overview of Apache Arrow, brief introduction to Arrow optimization with Arm crypto and Neon extension and patches status submitted to the community. You will see the benchmark statistics results and how to take advantage of ARMv8 characteristics to make your data fly.
A software engineer