Weird Science – Darwinian theory and emerging Hadoop vendor business strategies

Dan Woods recently opined that Apache Hadoop has had a weird beginning thanks to its “Three Headed Open Core” model and warned that there is a danger than it will fragment – à la Unix – thanks to competing commercial forces.

There are a couple of points to address here. The first is the assumption that the vendor community developing Hadoop is in some way ‘weird’. Not for those of us that have studied the evolution of open source-related business strategies it isn’t.

In fact, Hadoop’s multi-vendor community is a prime example of the corporate-dominated development communities we saw emerging as the fourth stage of commercial open source back in 2010.

Some people still have trouble understanding, as I wrote two years ago, that

being successful is about sharing your code development with the competition via multi-vendor open source projects in order to benefit from improved code quality and lower research and development costs for non-differentiating features AND beating your competition with proprietary complementary technologies.

This isn’t weird. I firmly believe in the not-too-distant future this will be seen as entirely normal.

Another issue to address is the suggestion that these competing vendors pose a danger to the core project. In the blog linked above I argued that the contrary is true: comparing the various competing players in collaborative communities as having a similar impact on the development of a project as various competing factors – climate, habitat, existence or dearth of predators etc – do in Darwin’s evolutionary process: i.e. making it stronger.

I would be much more concerned about the potential fragmentation of Hadoop if we were looking at four or five different competing implementations of Google’s MapReduce and file system research. Instead, you could compare the differentiating features that Cloudera, Hortonworks, MapR, IBM and EMC have introduced to the result of natural selection based on a need to evolve to certain conditions.

So long as there remains a single core Apache Hadoop project upon which these differentiating features are based I believe Hadoop will not only survive, but will thrive. If I may quote myself again: “As long as they continue to collaborate on the non-differentiating code, the project should benefit from being stretched in multiple directions.”

I believe that, as with Linux, the vendors involved have learned the lessons of the Unix wars and understand that it is in their best interests – let alone everyone else’s – not to repeat them.

Another key point when we look at the Hadoop ecosystem is that we see multiple vendors building on others’ differentiating features and often supporting multiple distributions. It’s not a case of a herd of individually differentiated Hadoops, but more like a stack of Russian Hadoop dolls.

To my mind there are (currently) eight main Hadoop business strategies, each of which has the potential to build on those before it:

  • Hadoop distributors
  • e.g. Cloudera, Hortonworks, MapR, EMC, IBM

  • Hadoop cloud services
  • e.g. Amazon EMR, Google Compute Engine

  • Hadoop-based deployment services
  • e.g. Infochimps, Metascale

  • Hadoop-based deployment stack/appliances
  • e.g. Zettaset, Oracle BDA, Dell

  • Hadoop-based development services
  • e.g. Continuuity, Mortar Data

  • Hadoop-based application stacks
  • e.g. NGDATA, Guavus

  • Hadoop-based database stacks
  • e.g. Drawn to Scale, Splice Machine

  • Hadoop-based analytic services
  • e.g. Treasure Data, Qubole

    Tags: ,