Entries from February 2013 ↓

The Data Day, Two days: February 27/28 2013

Rackspace buys ObjectRocket. Intel delivers Hadoop distro. And more.

And that’s the data day, today.

The Data Day, Two days: February 25/26 2013

EMC Pivotal HD. Hortonworks Hadoop for Windows. And more.

And that’s the data day, today.

The Data Day, Two days: February 21/22 2013

Aster Discovery. Delphix and SAP. Hadoop use-cases. And more.

And that’s the data day, today.

Hadoop Summit keynote preview: What is the point of Hadoop?

I am very pleased and honoured to have been asked to provide a keynote presentation at the inaugural Hadoop Summit Europe, which will be held in Amsterdam on March 20-21.

The title of my talk is “What is the point of Hadoop?” which isn’t as derogatory as it sounds. Our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

  • Big data storage: Hadoop as a system for storing large, unstructured, data sets
  • Big data processing/integration: Hadoop as a data ingestion/ETL layer
  • Big data analytics: Hadoop as a platform new new exploratory analytic applications

The flexibility of Apache Hadoop is one of its biggest assets – enabling businesses to generate value from data that was previously considered too expensive to be stored and processed in traditional databases – but also results in Hadoop meaning different things to different people.

As early adopters press ahead with innovative new analytic applications, many mainstream enterprises are are still scratching their heads trying to demonstrate Hadoop’s value. While it is very tempting to try and run before you can walk when you see others demonstrating the potential for Hadoop-based analytics it is my view that trying to jump ahead to Hadoop-based analytics without first understanding Hadoop’s storage and integration roles runs the risk of confusion and, potentially, disillusionment.

My keynote presentation at Hadoop Summit Europe will explore the impact that Hadoop is having on the traditional data processing landscape, examining the expanding ecosystem of vendors and their relationships with Apache Hadoop, exploring adoption trends around the world, and highlighting how an understanding of the roles Hadoop can play will be essential to helping Hadoop cross the chasm from early adopters to mainstream adoption.

Anyone interested in attending the event can get a 20% discount, using the registration code 13aslett20.

The Data Day, Two days: February 19/20 2013

Tableau IPO rumour. Funding for Elasticsearch. And more.

And that’s the data day, today.

Forthcoming webinar: Strategies for scaling MySQL

On February 28 at 1pm EST I’ll be taking part in a webinar, sponsored by ScaleBase, on strategies for scaling MySQL.

Scalability is one of the primary drivers we’ve seen for database users considering alternatives to traditional relational databases. That could mean adopting an entirely new database for new projects or – more likely for existing applications – looking at various strategies for improving the scalability of an existing database.

During the webinar I will be joined by Doron Levari and Paul Campaniello, both from ScaleBase, which enables applications to scale without disruption to the existing infrastructure. We’ll be discussing, amongst other things:

  • Scaling-out your MySQL databases
  • New high availability strategies
  • Centrally managing a distributed MySQL environment

For further details, and to register, click here.

The Data Day, Two days: February 15/18 2013

Redshift goes GA. Pivotal’s Google in a box. And more.

And that’s the data day, today.

The Data Day, Two days: February 13/14 2013

TempoDB’s timely DBaaS for the Internet of Things. ScaleBase 2.0. And more

And that’s the data day, today.

NoSQL on MySQL: stating the obvious

Some of the NoSQL vendors seemed to have stirred up a mild controversy with their reactions to the launch of NoSQL access to InnoDB in MySQL 5.6 and their suggestions that NoSQL access is only a part of the NoSQL story.

Mark Leith, software development senior manager at Oracle has described the criticism as laughable and Oracle’s director of MySQL product marketing, Mat Keep, accused the NoSQL vendors of “trying to stand on the shoulders of giants” (which is pretty ironic given we are talking about Oracle adding NoSQL capabilities to one of its databases).

In any case I don’t see what the fuss is all about.

Sure, Couchbase and DataStax laid it on a bit thick, but these are corporate blog posts – it goes with the territory.

Besides while it might seem churlish to criticise NoSQL access to InnoDB in MySQL 5.6 for not being a document database or for enabling masterless multi-datacenter replication, the responses are valid in the context of hyperbolic claims that “MySQL can provide the best of both worlds… You don’t have to split your data and manage two databases.”

The caveat to all these claims, and indeed probably any claim ever made in a corporate blog, is “if it suits your particular application requirement.”

Back in early 2011 when we first considered the momentum behind NoSQL development and adoption we highlighted six key drivers:

  • Scalability
  • Performance
  • Relaxed consistency
  • Agility
  • Intricacy
  • Necessity

How many of those are addressed by key value access to the InnoDB storage engine? Query performance and agility, certainly. Necessity, perhaps – but only if your application workload requires both SQL and key value access.

As we stated when Oracle first began previewing key value access to the InnoDB storage engine:

“Support for data access using the memcached API by no means alleviates the need for NoSQL alternatives, but it will provide additional flexibility and agility for existing MySQL adopters.”

I also have to agree with Couchbase that this is a point that is illustrated by the existence of Oracle’s own NoSQL Database. As we stated at the time of its launch:

“The launch of Oracle NoSQL is… a clear indication that there are trends at work here that cannot be solved by adding non-SQL querying to existing relational databases.”

And that’s really all Couchbase and DataStax are pointing out.

If you’re looking for an offering that provides direct, key value insertion and querying of data in addition to SQL-based access to relational database tables, then MySQL 5.6 is clearly a leading contender. If that’s all you’re looking for, then you could arguably forget the need to manage two databases.

That clearly doesn’t necessarily make MySQL 5.6 suitable for use as a pure key value store, let alone a document database, or wide-column store, or graph database. If those are your requirements, MySQL 5.6 isn’t the best of any world, let alone both.

The Data Day, Two days: February 11/12 2013

ClearStory sheds light on data analysis service. Illuminating ‘dark data’. More.

And that’s the data day, today.