MapR Tries To Separate From Hadoop Pack With New Streams Product

8 Dec 2015 | Author: | No comments yet »

Kafka Streaming Pulled Into Converged MapR Platform.

MapR is one of several companies built on the open source Hadoop platform, and as such it has a bit of competition in the space. MapR Technologies, Inc. today announced the industry’s first and only converged data platform and introduced MapR Streams, a reliable, global event streaming system that connects data producers and data consumers across shared topics of information.Time and again, commercial Hadoop distributor MapR Technologies has demonstrated the value of the MapR-FS file system that underpins its Hadoop stack and differentiates it, more than any other feature, from the other Hadoop platforms with which it competes. MapR joins fellow Hadoop spinners Hortonworks and Cloudera in the handy platform branding stakes, with Hortonworks Data Platform (HDP) and Cloudera Distributed Hadoop (CDH). It is the underlying MapR-FS that provided NFS access, with random reads and writes, as well as storing data in a mode that is compatible with the Hadoop Distributed File System, and this file system is what has allowed MapR to accelerate the HBase distributed database overlay for HDFS, and is also the key technology that the company exploited to create its own NoSQL database, MapR-DB, which got support for native JSON documents in the fall and which now can take on Mongo-DB.

MapR Streams is based on the same publish-and-subscribe method that underlies Apache Kafka, and is fully compatible with streaming analytics applications such as Apache Storm and Spark Streaming. This new product takes a constant stream of data like feeding consumer data to advertisers to create custom offers or distributing health data to medical professionals to tailor medication or treatment options — all of this in near real-time. Streams, which will become generally available in early 2016, is an event-streaming architecture pitched at those gathering and analyzing data in real-time. The integration of MapR Streams into a converged platform enables organizations in any industry to continuously collect, analyze and act on streaming data.

Now, MapR is pulling another workload that formerly ran on separate clusters – Kafka data streaming – into the core MapR file system. “This has been under development for quite some time,” Jack Norris, chief marketing officer at MapR, explains to The Next Platform. “The way we have done that is not as a separate component, or a separate cluster, unlike every other message queue or event streaming product out there. It combines file, database, stream processing and analytics to pull data from a wide variety of sources and deliver information on a publish-and-subscribe basis to people and machines.

Overall, MapR is pitching CDP as the way to run an integrated big-data stack that sidesteps the need to run separate nodes for Kafka, Tibco, Spark, Storm or Hadoop. “Spark and Storm are excellent but there’s more analytics and processing than just the stream analytics,” director of product marketing Will Ochandarena told The Reg. “We are starting to see data silos emerge as different processing requirements with different clusters for data in motion versus data at rest and separate cluster for special analytics,” Ochandarena said. From advertisers providing relevant real-time offers, to healthcare providers improving personalized treatment, to retailers optimizing inventory, to telecom carriers dynamically adjusting mobile service areas, organizations must improve their responsiveness to critical events with the continuous analysis of big data.

This is integrated, and we are moving from these different silos to a new converged platform, which you are kind of front and center and in the middle of.” Broadly speaking, there are two kinds of analytics processing that is going on out there today, but that does not necessarily mean companies need two kinds of systems to perform them, at least according to MapR. For example, data can come from a combination of sensors, newsfeeds, log files and database queries and be delivered directly to user dashboards, analytics engines, report generators and batch database engines from a single stream depending on how subscriptions are defined. A maintenance program could subscribe to the data coming from the shop floor of a manufacturer and learn about usage, production, bottlenecks and wear and tear, or IT could subscribe to a data stream with log information looking for anomalies that signal maintenance issues or a security breach.

As a result, it enables developers to create new applications that reduce data duplication and movement, lower the cost of integration and maintenance associated with multiple platforms, and accelerate business results. Developers work with MapR Streams using the same OJAI API that MapR adopted when it delivered JSON support in a document-based database earlier this year. One particularly interesting aspect of MapR’s streaming product is that it can act as a system of record, creating a persistent record that you can even rewind like a recording to any point in time and review what happened. This has meant that companies need to have a fast event processing system like Spark or Storm and a much slower analytics system based on Hadoop for broader and deeper data sets and more complex queries to derive value from the data.

That means it’s fully auditable and that could prove useful, especially in regulated industries that have to keep track of all transactions as they happen. This really addresses the evolving flows that we are seeing that are being driven by the evolving Internet of Things and these new applications,” said Norris. Kafka sits on the front end, splitting the streams of incoming data and passing off the bits that the fast furious and slow and deep parts of the analytics stack need. Because of format inconsistencies, data originating from disparate sources needs to be duplicated and stored in different processing engines depending upon intended use. However, while some Hadoop distributors include Kafka with their distributions and there are some points of integration between Kafka and Hadoop (including at YARN), Kafka itself isn’t a “native” Hadoop application, and Kafka clusters are typically separate from Hadoop clusters.

In addition, MapR works closely with technology partners, such as dataArtisans, Databricks, DataTorrent, Streamsets, and Syncsort, to provide customers with the flexibility to choose the components they want in their real-time analytics data platforms. MapR says one of its beta test customers – a global advertising technology company – has been using Streams to deliver customized views of real-time advertising data to employees and external clients across the globe. Previously, reports were delivered daily in batch, but customers can now access real-time click streams, perform analysis and change their advertising programs dynamically.

MapR uses JSON (JavaScript Object Notation) – which is known for its flexibility and adaptability – as a common data interchange layer. “It’s polyglot persistence. That kind of action could put pressure on competitors (although Cloudera CEO Tom Reilly told me recently in an interview at the Intel Capital Global Summit that all Hadoop vendors benefit from a healthy ecosystem because no one vendor can support Hadoop alone.) Regardless, Schroeder rejects the notion that he’s chasing the competition with these updates, saying it’s more a result of customer requirements. “I always run my company based on what customers need, not what the competitors are doing,” he said.

Given this, now the MapR stack can be used as a back-end for real-time billing applications, to drive operational alerting systems and dashboards, or provide feedback loops into production applications for real-time optimization. You always have to worry about, is my shovel up, is my shovel reliable.” In many ways, this is a continuation of the Hadoop message, which has always been about filling massive lakes with data, and then bringing different compute engines to work on that lake. Moreover, it is not clear what the ratio of Kafka compute is to Hadoop and Spark at enterprises today, which would be interesting to see, but clearly the cluster will be expanded as Kafka is being brought into the MapR fold even as it puts overhead on all of the nodes in a cluster. The MapR Enterprise Edition adds high availability, security, global synchronization, and other features that are necessary for large-scale production workloads and that comes with per-node annual support contracts. The Streams function will carry a supplemental licensing cost above and beyond the basic Enterprise Edition license, as is the case with the NoSQL database.

Here you can write a commentary on the recording "MapR Tries To Separate From Hadoop Pack With New Streams Product".

* Required fields
Our partners
Follow us
Contact us
Our contacts

ICQ: 423360519

About this site