2.19.19
The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, today announced momentum with Apache Arrow, the Open Source Big Data in-memory columnar layer.
Since the founding of the project in January 2016, Apache Arrow has quickly become the defacto standard for representing and processing analytical data in memory, accelerating analytical processing and interchange by more than 100x.
“When we became a Top-Level Project, we projected that the majority of the world’s data will be processed through Arrow within the next decade,” said Jacques Nadeau, Vice President of Apache Arrow. “In just three years time, we are proud to see Arrow’s substantial industry adoption and increased value across a wide range of analytical, machine learning, and artificial intelligence workloads.”
Highlights of Apache Arrow’s success include:
Industry Adoption —more than 20 major technologies adopted Arrow to accelerate in-memory analytics, including Apache Spark, NVIDIA RAPIDS, pandas, and Dremio, among others.
Millions of Downloads —leveraging and integrating Apache Arrow into many other technologies has bolstered downloads to more than 1,000,000 each month.
New Language Support —as a cross-language development platform, supporting multiple programming languages is paramount. Apache Arrow has grown from supporting one language to eleven different languages today; they include C++, Java, Python, R, C#, Javascript, and Ruby, among others.
Seamless Data Format Support —Arrow supports different data types, both simple and nested, located in arbitrary memory such as regular system RAM, memory-mapped files or on-GPU memory. In addition, it can ingest data from popular storage formats such as Apache Parquet, CSV files, Apache ORC, JSON, and more.
Major Code Donations —Apache Arrow’s new features and expanded functionality are due in part to code and component donations that include:
• C# Library
• Gandiva LLVM-based Expression Compiler
• Go Library
• Javascript Library
• Plasma Shared Memory Object Store
• Ruby Libraries (Apache Arrow and Apache Parquet)
• Rust Libraries (Parquet and DataFusion Query Engine)
Community and Contributor Growth —over the past 12 months, nearly 300 individuals have submitted more than 3,000 contributions that have grown the Apache Arrow code base by 300,000 lines of code. The Arrow community is welcoming approximately 10 new contributors each month.
In January the project announced its most recent release, Apache Arrow 0.12.0, which reflects more than 600 enhancements developed during Q4 2018. The Apache Arrow community is actively working on a number of impactful new initiatives that include solving high performance analytical problems and allowing for more efficient data distribution across entire clusters.
“Apache Arrow’s rapid industry adoption and developer community growth supports our original thesis of the importance of a language-independent open standard for columnar data,” said Wes McKinney, member of the Apache Arrow Project Management Committee, and creator of Python’s pandas project. “Additionally, we are seeing productive collaborations take place not only between programming languages but also between the database systems and data science worlds. We look forward to welcoming more data system developers into our community.”