4.3.18
MapR Technologies, Inc., a pioneer in delivering one platform for all data, across every cloud, today announced breakthrough capabilities for building powerful, real-time streaming and global IoT applications. New enhancements to the MapR Converged Data Platform release 6.0.1 and MapR Expansion Pack 5.0, including Event Streams (MapR-ES), Apache Spark and Apache Drill release 1.13, support streaming pipelines that can stretch across millions of endpoints while also supporting rich analytics that can be used to divide and aggregate the streams into hundreds of thousands of logical topics.
“MapR has redefined streaming with a ground-breaking approach. Companies no longer have to think about streams as a separate, short-lived flow that needs to be put in a lake or warehouse before its significance can be analyzed and understood,” said Howard Marks, founder and chief scientist, DeepStorage. “MapR has embedded Kafkaesque streams directly into the data fabric. Streams can be persisted in the fabric for years with the same enterprise security and reliability; and now with event time-stamping a single historical event can be retrieved instantly. But perhaps more importantly, a single stream can be logically divided into hundreds of thousands of topics with no impact on performance. No other streaming approach provides this scale, flexibility, and performance.”
With release 6.0.1, MapR exposed new functionality through multiple APIs. The MapR-ES API adds support for an event-time timestamp as part of an update to the Kafka 1.0 API, and structured streaming in Apache Spark 2.2.1 which leverages this timestamp for new stream processing capabilities like windowing and aggregation. For IoT applications, this helps ensure that data across a globally distributed network of devices and sensors can be flexibly separated into logical topics and properly aggregated for real-time analytics and applications. For companies adopting a “streaming system of record” that can be reliably persisted for extended periods for compliance or developer productivity, MapR-ES now also maintains a time index so applications can easily seek to a specific point in time from which to consume.
“We are very excited about the new features,” said Eric Keister, advanced analytics and emerging technologies manager at Anadarko. “Spark structured streaming allows us to use advanced analytics on real-time oil well data while Drill allows us to explore the same data using SQL. This helps us make operational decisions faster.”
Apache Drill 1.13 has also been integrated (alpha) with MapR-ES for exploration of historical message data, whether for simple use cases like data pipelines or dashboards, or more complex ones like anomaly detection. Additionally, greater developer and data scientist productivity has been added via the MapR-DB REST gateway and native MapR-DB exploration in the Data Science Refinery Notebook.
“The importance of integrating streams into a high scale, high performance data fabric cannot be overstated,” said Jack Norris, SVP data and applications, MapR Technologies. “The ability to quickly analyze and understand the significance of events as they are collected and created is a huge advantage. And when streams are integrated into a rich fabric that includes rich historical data and the ability to understand the context and generate an immediate action, that advantage is translated directly to business value.”
Benefits in MapR 6.0.1 and MEP 5.0 include:
• Updates to MapR-ES API, allowing for more accurate analytics of data generated by IoT devices and sensors, and more efficient application usage of a “streaming system of record” letting applications seek to an exact point in time when reprocessing historical data.
• Spark 2.2 with Structured Streaming, allowing for stream processing capabilities with a simpler API and advanced analytics using the event time added in MapR-ES.
• MapR-DB ReST Gateway, allowing for developers to use their preferred language(s) to access MapR-DB JSON, and ease the integration of MapR-DB JSON with third party tools that support ReST.
• Audits to Streams, with MapR 6.0.1, audit logs and events are forwarded to streams, opening up a wider array of real-time use cases such as: real-time security analytics and anomaly detection, data synchronization apps, and triggered processing (e.g., classification) of new data.
• Native MapR-DB Exploration in Data Science Refinery Notebook, allowing for easier access to explore operational data.
• Drill 1.13, allowing for better handling of memory intensive queries, performance improvements when working with Parquet files, optimization of primary table joins, and cgroups support for YARN.
• Alpha Feature: Drill on MapR-ES, letting Drill 1.13 usersquery MapR-ES, a streaming system of record, directly to get ad-hoc insights faster and without duplicating data. Also allows for easy experimentation of application/analytic logic before writing real-time applications.
The updated MapR Converged Data Platform and additional enhancements to Apache Drill are available this week.