Entries from November 2012 ↓

Weird Science – Darwinian theory and emerging Hadoop vendor business strategies

Dan Woods recently opined that Apache Hadoop has had a weird beginning thanks to its “Three Headed Open Core” model and warned that there is a danger than it will fragment – à la Unix – thanks to competing commercial forces.

There are a couple of points to address here. The first is the assumption that the vendor community developing Hadoop is in some way ‘weird’. Not for those of us that have studied the evolution of open source-related business strategies it isn’t.

In fact, Hadoop’s multi-vendor community is a prime example of the corporate-dominated development communities we saw emerging as the fourth stage of commercial open source back in 2010.

Some people still have trouble understanding, as I wrote two years ago, that

being successful is about sharing your code development with the competition via multi-vendor open source projects in order to benefit from improved code quality and lower research and development costs for non-differentiating features AND beating your competition with proprietary complementary technologies.

This isn’t weird. I firmly believe in the not-too-distant future this will be seen as entirely normal.

Another issue to address is the suggestion that these competing vendors pose a danger to the core project. In the blog linked above I argued that the contrary is true: comparing the various competing players in collaborative communities as having a similar impact on the development of a project as various competing factors – climate, habitat, existence or dearth of predators etc – do in Darwin’s evolutionary process: i.e. making it stronger.

I would be much more concerned about the potential fragmentation of Hadoop if we were looking at four or five different competing implementations of Google’s MapReduce and file system research. Instead, you could compare the differentiating features that Cloudera, Hortonworks, MapR, IBM and EMC have introduced to the result of natural selection based on a need to evolve to certain conditions.

So long as there remains a single core Apache Hadoop project upon which these differentiating features are based I believe Hadoop will not only survive, but will thrive. If I may quote myself again: “As long as they continue to collaborate on the non-differentiating code, the project should benefit from being stretched in multiple directions.”

I believe that, as with Linux, the vendors involved have learned the lessons of the Unix wars and understand that it is in their best interests – let alone everyone else’s – not to repeat them.

Another key point when we look at the Hadoop ecosystem is that we see multiple vendors building on others’ differentiating features and often supporting multiple distributions. It’s not a case of a herd of individually differentiated Hadoops, but more like a stack of Russian Hadoop dolls.

To my mind there are (currently) eight main Hadoop business strategies, each of which has the potential to build on those before it:

  • Hadoop distributors
  • e.g. Cloudera, Hortonworks, MapR, EMC, IBM

  • Hadoop cloud services
  • e.g. Amazon EMR, Google Compute Engine

  • Hadoop-based deployment services
  • e.g. Infochimps, Metascale

  • Hadoop-based deployment stack/appliances
  • e.g. Zettaset, Oracle BDA, Dell

  • Hadoop-based development services
  • e.g. Continuuity, Mortar Data

  • Hadoop-based application stacks
  • e.g. NGDATA, Guavus

  • Hadoop-based database stacks
  • e.g. Drawn to Scale, Splice Machine

  • Hadoop-based analytic services
  • e.g. Treasure Data, Qubole

    The Data Day, Two days: November 28/29 2012

    Amazon and BitYota launch DWaaSes (DWaaSi?) Continuuity’s funding and plans. And more.

    And that’s the Data Day, today.

    Forthcoming webinar: Big Data Best Practices with NGDATA

    On December 13 at 1pm EDT/10AM PDT I’ll be taking part in a webinar to discuss Big Data Best Practices – Realizing True Business Value from Your Big Data.

    Big Data has rapidly become a transformational business trend. Most business leaders understand that not being able to tap into the power of their Big Data could mean losing business to the competition. However, most organizations are not fully aware of how to embrace it.

    I’ll discuss how you can overcome these hurdles and tap into your Big Data to transform your business, while Naren Patil, SVP of Product Marketing, NGDATA will provide some real-life examples of successful deployment projects.

    To register, click here.

    The Data Day, A few days: November 22-27 2012

    Actian acquires Versant. GoodData’s hosted analytics. And more.

    And that’s the Data Day, today.

    The Data Day, Today: November 21 2012

    HP/Automomy fall-out. Behind 10gen’s strategic funding. And more

    And that’s the Data Day, today.

    The Data Day, Two days: November 19/20 2012

    HP uncovers Autonomy irregularity. Pentaho ups big data commitment. And more.

    And that’s the Data Day, today.

    The Data Day, Two days: November 15/16 2012

    Jaspersoft gets visual. MemSQL gets distributed. And more.

    And that’s the Data Day, today.

    The Data Day, Today: November 14 2012

    Funding for Continuuity and 10gen. Wibi Data launches the Kiji. And more.

    And that’s the Data Day, today.

    The Data Day, Two days: November 12/13 2012

    Platfora raises $20m. IBM trumpets ‘integration anywhere’. And more

    And that’s the Data Day, today.

    The Data Day, Two days: November 8/9 2012

    Funding for Neo, Elasticsearch and Hadapt. And more

    And that’s the Data Day, today.