The Data Day, A few days: March 20-22 2013

MongoDB goes Enterprise. Riak CS goes open source. And more.

And that’s the data day, today.

The Data Day, A few days: March 18-19 2013

Splunk adds structure. MapR raises $30m. And more.

And that’s the data day, today.

Forthcoming webinar: The New Path to Performance. No Sharding!

On Tuesday March 26th at 10am PT I’ll be taking part in a webinar with NuoDB on the subject of The New Path to Performance. No Sharding!

As part of the webinar I’ll be explaining the various strategies used by enterprises to attempt to achieve scalability of relational databases, why they fail to meet modern distributed processing requirements, and why companies are increasingly open to looking at alternatives to the traditional relational database.

Wiqar Chaudry from NuoDB will also be discussing how to eliminate technical acrobatics, including:

Sharding
Clustering
Performance tuning
Replication
And other kinds of 20th century database tricks.

To register, click http://go.nuodb.com/no-sharding-webinar-register-s.html

The Data Day, A few days: March 11-14 2013

SAP’s predictive analytics plans. Dell’s Boomi MDM. And more

And that’s the data day, today.

What it means to be “all in” on Hadoop

Pivotal HD is not Hadoop
Neither is Cloudera’s Distribution, including Apache Hadoop.
Nor the Hortonworks Data Platform.
Nor the MapR Distribution.
Nor IBM’s InfoSphere BigInsights.
Nor the WANdisco Distro.
Nor Intel’s Distribution for Apache Hadoop.

Apache Hadoop is Hadoop. And Hadoop is Apache Hadoop.

I don’t write that to be pedantic, or controversial, but because it is the only logical conclusion you can reach after reading Defining Apache Hadoop from the Apache Hadoop Wiki.

“The key point is that the only products that may be called Apache Hadoop or Hadoop are the official releases by the Apache Hadoop project as managed by that Project Management Committee (PMC)… Products that are derivative works of Apache Hadoop are not Apache Hadoop, and may not call themselves versions of Apache Hadoop, nor Distributions of Apache Hadoop.”

It is with this in mind that one should view the reaction to EMC Greenplum’s recent launch of of Pivotal HD; and in particular this statement from Scott Yara, EMC Greenplum senior Vice President, Products and Co-Founder:

“We’re all in on Hadoop, period.”

What does it mean to be “all in on Hadoop”? Based on a strict reading of Defining Apache Hadoop (a document that demands by its own words to be read strictly), being “all in” on Hadoop means only one thing: being “all in” on Apache Hadoop.

I have no doubt that EMC Greenplum is “all in” on Pivotal HD, but that’s not the same thing at all.

Not a purity debate

There is nothing wrong with offering additional functionality beyond the scope of Apache Hadoop – the licensing terms clearly encourage it.

As my fellow analyst Merv Adrian notes:

“Having some components of your solution stack provided by the open source community is a fact of life and a benefit for all. So are roads, but nobody accuses Fedex or your pizza delivery guy of being evil for using them without contributing some asphalt.”

That is true. However, to continue the analogy, you would expect any company that claimed to be “all in on roads” to be getting involved in laying and maintaining them, rather than just driving on top of them.

Despite what some people may think this isn’t a matter of arguing about which vendor has the most Hadoop committers. It is a matter of defining what users understand Hadoop to be, and what they understand it not to be. It is a matter of drawing a line between Hadoop – Apache Hadoop – and additional, proprietary, functionality beyond the scope of the project.

User preference

Whether users will choose to go with a pure approach to Hadoop-based products and services is another matter. Dan Woods, for one, clearly believes that products like Pivotal HD will drive further mainstream adoption beyond “the limits of open source.”

The idea is that most enterprises don’t care if it meets the Apache definition of Hadoop or not, as long as it works.

While I have no doubt that some companies will be drawn to the additional features and confidence that vendors such as EMC and Intel can provide, I have also spoken to multiple enterprises – including one very large enterprise just last week – for which the preference is to default to open in order to avoid any potential for lock-in and vendor-specific architecture choices.

There are many vendors that do very much care whether what they are adopting meets the Apache definition of Hadoop.

Which of these attitudes will dominate? I’m not going to pretend I know the answer to that question at this point, but our previous coverage of open source adoption suggests that once the door to openness has been unlocked its very hard to force it shut again.

Dan Woods responded to my (sarcastic) comment about this as follows:

I would dispute that players like IBM, HP, and Intel “took Linux over” but in any case it is undeniable that they had a significant role to play – alongside Red Hat, Novell et al, and individual developers – in turning Linux into an enterprise-grade operating system.

The point is though that they did so by engaging with the Linux project, not by launching their own differentiated versions of Linux.

The Data Day, A few days: March 6-8 2013

Ayasdi emerges. Amazon slashes DynamoDB prices. And more

And that’s the data day, today.

The Data Day, A few days: March 1-5 2013

SQL and Hadoop: ascloseasthis. Splunk revenue up 64%. and more.

And that’s the data day, today.

Forthcoming webinar: Searching for Value in Big Data

On March 14th at 10:00am PT I’ll be taking part in a webinar in association with LucidWorks on the role search has to play in big data.

I’ll be joined by Grant Ingersoll, Chief Technology Officer for LucidWorks, who will provide a brief overview of LucidWorks Big Data, a development platform designed specifically for building these new applications.

Also presenting will be Tony Jewitt, Vice President of Big Data Solutions for Avalon Consulting LLC, who will demonstrate how Avalon’s Unified Search and Analytics platform leverages the LucidWorks Big Data platform to discover and analyze data maintained in Hadoop.

Between us we’ll also be discussing:

  • How Big Data is changing the database landscape
  • Total data – new approaches to accessing and analyzing data
  • Search and analytics – two sides of the same coin
  • The role that search plays in discovering new insights and generating value

For more details, and to register, go to http://programs.lucidworks.com/451Group032013_signuppage.html

Forthcoming webinar: Security and Manageability: Key Criteria In Selecting Enterprise-Grade Big Data Stacks

I’m taking part in a webinar on Wednesday, March 13 at 9 AM PT / 12 PM ET with DataStax. The focus of the webinar is Security and Manageability: Key Criteria In Selecting Enterprise-Grade Big Data Stacks.

I’ll be joined by will feature DataStax CEO, Billy Bosworth and Hallo CTO, Adrian Rodriguez. Between us we will explain the importance of security and visual management tools when selecting a big data stack, and discuss how DSE 3.0 addresses these two key criteria.

Additionally, attendees receive a chance to win free passes to Cassandra training. For full details and to register, go to http://learn.datastax.com/DSE3launch.html

The Data Day, Two days: February 27/28 2013

Rackspace buys ObjectRocket. Intel delivers Hadoop distro. And more.

And that’s the data day, today.