Hadoop: why enterprises need something to aspire to

Merv Adrian wrote a blog boast recently bemoaning the “aspirational marketing” that surrounds Hadoop, in particular the fact that current deployments are a long way from delivering on the vision.

While I completely agree that many enterprises are struggling to translate tactical use-cases into the business use-cases required to drive more strategic adoption beyond the proof of concept stage, I don’t think that aspirational marketing around Hadoop is necessarily a bad thing.

It is certainly true that part of the problem lies in clearly understanding how Hadoop can be used as a complement to traditional relational database technologies deployed as an enterprise data warehouse.

That is why we recently asked Is Hadoop a planet? – comparing confusion around Hadoop’s classification to that of Pluto – while also describing Hadoop as a framework in search of a metaphor.

Given the confusion, however, I believe it is incumbent on Hadoop providers to describe not just the functional use-cases that are driving tactical adoption, but also the bigger vision that will drive more strategic adoption.

The data management industry has become accustomed to thinking about the storage, processing and analysis of data in analytical databases as akin to warehousing, to the extent that the phrase ‘data warehouse’ no longer requires an explanation.

We believe that a good understanding of the potential strategic role of Hadoop, even if it is only aspirational at this stage, will be important in encouraging broader and deeper adoption of Hadoop.

In addition, it is not as if there are no enterprises deploying Hadoop more strategically. Cloudera estimates that about 20% of its 300 subscription customers are already deploying Hadoop as what it calls an Enterprise Data Hub.

I’m not personally convinced that Enterprise Data Hub is really the right term, (not least since we previously used the term Data Hub in a slightly different context). Other potential terms include data lake and data refinery.

Although the latter better describes Hadoop’s role in aggregating and processing data and the industrial-scale processes used to make data more acceptable for different analytic use-cases, it appears to have quickly passed out of fashion compared to the former.

I have begun using the term ‘data treatment plant’ as a combination of the two concepts to describe how Hadoop can be used as a single ‘logical’ unified data platform into which you simply poor data, while industrial-scale processes – the multiple data processing and analytic engines that will be supported by Hadoop 2.0: such as MapReduce, streaming processing, SQL and NoSQL – are used to make data more acceptable for a desired end-use.

451 clients can get more detail on the ‘data treatment plant’ and why we believe a bit of aspirational marketing may not be a bad thing for Hadoop, from our recent report, Hadoop: a framework in search of a metaphor.

The Data Day, A few days: January 11-17 2014

Neo updates Neo4j. Cloudera sharpens focus on Accumulo. And more

And that’s the data day, today.

The Data Day, A few days: December 13-10 2013

VC funding for Hadoop and NoSQL tops $1bn. And more

And that’s the data day, today.

Visualizing the $1bn+ VC investment in Hadoop and NoSQL

Cumulative VC funding for Hadoop and NoSQL vendors broke through the $1bn barrier in 2013, according a Spotlight report published by 451 Research, based on data provided by The 451 M&A KnowledgeBase.

The data indicates that there was a substantial increase in funding in 2013 ($530.5m, not including RethinkDB’s $8m announced yesterday) compared to 2012 ($190.9m), thanks to major rounds for the likes of MongoDB, Pivotal, Hortonworks and DataStax.

The report includes a visualization created by 451’s Director of Data Strategy and Solutions, Barbara Peng, that illustrates the connections between the various investors and the NoSQL and Hadoop vendors in which they have invested.

A snapshot of the visualization is shown below but the the original is interactive, enabling 451 Research clients to drag the various elements around for greater emphasis, as well as isolate the NoSQL or Hadoop categories.

vc-firms

451 Research clients can also scroll over the blue circles to see the total amount of funding raised by the individual Hadoop and NoSQL vendors, and scroll over the smaller orange circles to see which investors have backed which companies.

The sample set was limited to 16 vendors for visual clarity, but the six Hadoop and 10 NoSQL providers cited account for more than 87% of funding to date (with Pivotal representing the vast majority of the remaining 13%).

This visualization illustrates that investment in Hadoop and NoSQL providers comes from a relatively small group of VC firms (52 to be specific, excluding individual seed investors), resulting in a relatively tightly clustered graph.

However, the visualization also enables us to put to the test the recent blog post by MarkLogic’s Adam Fowler in which he stated:

“Just look at the number of investors who are investing in multiple NoSQL companies. They’re hedging their bets because they’re not sure themselves which businesses will survive.”

In fact investment in multiple Hadoop and NoSQL vendors is relatively rare. Only 11 out of the 52 VC firms have invested in more than one Hadoop and/or NoSQL vendor, with seven of those picking one Hadoop vendor and one NoSQL provider. Less hedging their bets as picking a winner in each category.

Of the remaining four investment shops, two have invested in one Hadoop distributor, one NoSQL specialist and one Hadoop-as-a-service provider (MapR, DataStax and Qubole for Lightspeed Venture Partners; Cloudera, Couchbase and Altiscale for Accel Partners), while In-Q-Tel has invested in one Hadoop supplier, one NoSQL vendor and one NoSQL-as-a-service provider (Cloudera, MongoDB and Cloudant).

Only Sequoia Capital has invested in multiple NoSQL vendors (as well as Hadoop-as-a-service provider Altiscale) having invested in MongoDB, DataStax and – hold onto your hats, irony fans – MarkLogic. It should be noted however that Sequoia has not invested in DataStax since its series A round in late 2010.

The full report, Venture funding for Hadoop and NoSQL vendors tops $1bn is available now to 451 Research clients and also includes our perspective on when combined Hadoop and NoSQL revenue might begin to exceed combined Hadoop and NoSQL VC funding, as well as the potential for M&A and IPO activity in 2014.

The Data Day, A few days: December 6-12 2013

Talend raises $40m. GridGain names new CEO. And more

And that’s the data day, today.

The Data Day, A few days: October 26-November 1 2013

Cloudera launches Enterprise Data Hub. And more

And that’s the data day, today.

The Data Day, A few days: October 19-25 2013

Hadoop and Teradata go to the cloud. And more.

And that’s the data day, today.

7 Hadoop questions. Q7: Hadoop’s role

What is the point of Hadoop? It’s a question we’ve asked a few times on this blog, and continues to be a significant question asked by users, investors and vendors about Apache Hadoop. That is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

hadoop-elephant

As I explained during our keynote presentation at the inaugural Hadoop Summit Europe earlier this year, our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

  • Big data storage: Hadoop as a system for storing large, unstructured, data sets
  • Big data processing/integration: Hadoop as a data ingestion/ETL layer
  • Big data analytics: Hadoop as a platform new new exploratory analytic applications

And we’re not the only ones that see it that way. This blog from Cloudera CTO Amr Awadallah outlines three very similar, if differently-named use-cases (Transformation, Active Archive, and Exploration).

In fact, as I also explained during the Hadoop Summit keynote, we see these three roles as a process of maturing adoption, starting with low cost storage, moving on to high-performance data aggregation/ingestion, and finally exploratory analytics.

survey

As such it is interesting to view the current results of our Hadoop survey, which show that the highest proportion of respondents that have implemented or plan to implement Hadoop (63%) for data analytics, followed by 48% for data integration and 43% for data storage.

This would suggest that our respondents include some significantly early Hadoop adopters. I look forward to properly analysing the results to see what they can tell us, but in the meantime it is interesting to note that the percentage of respondents using Hadoop for analytics is significantly higher among those that adopted Hadoop prior to 2012 (88%) compared to those that adopted in in 2012 or 2013 (65%).

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, A few days: October 12-18 2013

Apache Hadoop 2 goes GA. Teradata cuts guidance. And more

And that’s the data day, today.

7 Hadoop questions. Q6: Hadoop’s shortcomings

What are the major shortcomings of Hadoop? The answer to that questions looks set to shape the future development roadmap for the open source data processing framework, which is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

hadoop-elephant

The limitations of Hadoop have been widely reported over the years, but as the Apache Hadoop community and related vendors have responded to issues such as reliability and high availability – not least via the now generally available Apache Hadoop 2 – so attention turns to other areas such as security, administration and performance, as well as more advanced functionality requirements, including graph processing, stream processing, improved SQL support and virtualization support.

survey

The list of potential improvements is therefore fairly long, and as we near the end of our survey it is interesting to see that the list of key advances respondents are looking for in order to increase adoption of Hadoop is fairly widespread.

So far the responses to our Hadoop survey suggest administration tooling and performance top the list, followed by reliability, SQL support and backup and recovery, but development tools and authentication and access control are not far behind.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.