7 Hadoop questions. Q7: Hadoop’s role

What is the point of Hadoop? It’s a question we’ve asked a few times on this blog, and continues to be a significant question asked by users, investors and vendors about Apache Hadoop. That is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

hadoop-elephant

As I explained during our keynote presentation at the inaugural Hadoop Summit Europe earlier this year, our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

  • Big data storage: Hadoop as a system for storing large, unstructured, data sets
  • Big data processing/integration: Hadoop as a data ingestion/ETL layer
  • Big data analytics: Hadoop as a platform new new exploratory analytic applications

And we’re not the only ones that see it that way. This blog from Cloudera CTO Amr Awadallah outlines three very similar, if differently-named use-cases (Transformation, Active Archive, and Exploration).

In fact, as I also explained during the Hadoop Summit keynote, we see these three roles as a process of maturing adoption, starting with low cost storage, moving on to high-performance data aggregation/ingestion, and finally exploratory analytics.

survey

As such it is interesting to view the current results of our Hadoop survey, which show that the highest proportion of respondents that have implemented or plan to implement Hadoop (63%) for data analytics, followed by 48% for data integration and 43% for data storage.

This would suggest that our respondents include some significantly early Hadoop adopters. I look forward to properly analysing the results to see what they can tell us, but in the meantime it is interesting to note that the percentage of respondents using Hadoop for analytics is significantly higher among those that adopted Hadoop prior to 2012 (88%) compared to those that adopted in in 2012 or 2013 (65%).

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

What is the point of Hadoop?

Among the many calls we have fielded from users, investors and vendors about Apache Hadoop, the most common underlying question we hear could be paraphrased ‘what is the point of Hadoop?’.

It is a more fundamental question than ‘what analytic workloads is Hadoop used for’ and really gets to the heart of uncovering why businesses are deploying or considering deploying Apache Hadoop. Our research suggests there are three core roles:

– Big data storage: Hadoop as a system for storing large, unstructured, data sets
– Big data integration: Hadoop as a data ingestion/ETL layer
– Big data analytics: Hadoop as a platform new new exploratory analytic applications

While much of the attention for Apache Hadoop use-cases focuses on the innovative new analytic applications it has enabled in this latter role thanks to its high-profile adoption at Web properties, for more traditional enterprises and later adopters the first two, more mundane, roles are more likely the trigger for initial adoption. Indeed there are some good examples of these three roles representing an adoption continuum.

We also see the multiple roles playing out at a vendor level, with regards to strategies for Hadoop-related products. Oracle’s Big Data Appliance (451 coverage), for example, is focused very specifically on Apache Hadoop as a pre-processing layer for data to be analyzed in Oracle Database.

While Oracle focuses on Hadoop’s ETL role, it is no surprise that the other major incumbent vendors showing interest in Hadoop can be grouped into three main areas:

– Storage vendors
– Existing database/integration vendors
– Business intelligence/analytics vendors

The impact of these roles on vendor and user adoption plans will be reflected in my presentation at Hadoop World in November, the Blind Men and The Elephant.

You can help shape this presentation, and our ongoing research into Hadoop adoption drivers and trends, by taking our survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.