LVC21-204: New Apache Bigtop 1.5 and Wikimedia: Empower BigData in the real world

Session Abstract

Bigtop is the Apache project for Infrastructure Engineers and Data Scientists who is looking for packaging, testing, and configuration of the open-source Big Data components. Its latest version 1.5.0 has been released in Dec 2020 and more fully supports Arm64 as a Big Data stack. From a real-world use case of Bigtop, the Analytics/Data-Engineering team at the Wikimedia Foundation has been collaborating with the project to move from its Cloudera CDH 5.x Hadoop distribution to Apache Bigtop. In this session, the speakers provides an overview of new features implementation, the recent upgrade and some basic usages from users' point of view. On the other hand, it's not easy to move away from CDH in the Wikimedia ecosystem where several tools gravitate around Hadoop-related projects. Then the co-speakers will also give a brief introduction of the work done with upstream to support the Wikimedia's use case and the migration/upgrade plan devised.

Session Speakers

Yuqi Gu

Arm (Senior SW Engineer)

Yuqi Gu is currently linaro assignee and works on Arm. He is the committer and PMC member of Apache Bigtop. He is also an active contributor in Apache Arrow, MariaDB and RocksDB mainly focusing on performance optimization on Arm64.

Luca Toscano

Wikimedia Foundation (Site Reiliability Engineer - Wikimedia Foundation)

Site Reliability engineer at the Wikimedia Foundation, currently working on the Data / Analytics Engineering team.

Bigtop is the Apache project for Infrastructure Engineers and Data Scientists who is looking for packaging, testing, and configuration of the open-source Big Data components. 

Its latest version 1.5.0 has been released in Dec 2020 and more fully supports Arm64 as a Big Data stack. From a real-world use case of Bigtop, the Analytics/Data-Engineering team at the Wikimedia Foundation has been collaborating with the project to move from its Cloudera CDH 5.x Hadoop distribution to Apache Bigtop.

In this session, the speakers provides an overview of new features implementation, the recent upgrade and some basic usages from users’ point of view. On the other hand, it’s not easy to move away from CDH in the Wikimedia ecosystem where several tools gravitate around Hadoop-related projects. Then the co-speakers will also give a brief introduction of the work done with upstream to support the Wikimedia’s use case and the migration/upgrade plan devised.

comments powered by Disqus

Other Posts

Sign up. Receive Updates. Stay informed.

Sign up to our mailing list to receive updates on the latest Linaro Connect news!