The Data Day, Two days: October 29/30 2012

Pentaho raises $23m. Exploring CAP Theorem. And more.

And that’s the Data Day, today.

The Data Day, Today: August 24 2012

Facebook’s Prism. CAP Theorem. Keeping MySQL open. And more.

And that’s the Data Day, today.

How to to provide a strongly consistent distributed database and not break CAP Theorem

In the months since we coined the term NewSQL we have come to define it as referring to a new breed of relational database products designed to meet scalability requirements of distributed architectures, or improve performance so horizontal scalability is no longer a necessity, while maintaining support for SQL and ACID.

During the recent round of NoSQL Road Show events it has emerged that this description could be taken to suggest that NewSQL products are able to provide consistency, availability and partition tolerance and therefore contravene the common understanding of CAP Theorem that “a distributed system can satisfy any two of these guarantees at the same time, but not all three.”

How is possible to provide strongly consistent distributed systems and not break CAP Theorem?

For a start, CAP Theorem is not that simple. As others have pointed out – Cloudera’s Henry Robinson for example – CAP Theorem isn’t simply a case of “consistency, availability, partition tolerance. Pick two.”

In fact the father of CAP Theorem, Dr Eric Brewer, has clarified that the “2 of 3” explanation is misleading: “First, because partitions are rare, there is little reason to forfeit C or A when the system is not partitioned. Second, the choice between C and A can occur many times within the same system at very fine granularity; not only can subsystems make different choices, but the choice can change according to the operation or even the specific data or user involved. Finally, all three properties are more continuous than binary. Availability is obviously continuous from 0 to 100 percent, but there are also many levels of consistency, and even partitions have nuances, including disagreement within the system about whether a partition exists.”

We know that CAP is not simply a case of “pick two”, since while Amazon’s Dynamo (and the many NoSQL databases it has inspired) sacrifices consistency for availability, it does so with eventual consistency, not the total absence of consistency.

Clearly is possible to have systems that are partition tolerant, highly available and offer *a degree of consistency* (although as Fred Holahan points out, whether that degree is suitable for you particular workload is another matter).

Partition tolerance is not necessarily something that can be relaxed in the same manner – in fact the proof of CAP Theorem relies on an assumption of partition tolerance. As Yammer engineer Coda Hale explains: “Partition Tolerance is mandatory in distributed systems. You cannot not choose it.”

Daniel Abadi has previously explained how CAP is not really about choosing two of three states, but about answering the question “if there is a partition, does the system give up availability or consistency?”

Just as systems that sacrifice consistency retain a degree of consistency, Daniel also makes the point that systems that give up availability also do not do so in totality, noting that “availability is only sacrificed when there is a network partition.”

As such, Daniel makes the point that the roles of consistency and availability in CAP are asymmetric, and that latency is the forgotten factor that re-balances the equation.

Daniel has also returned to the issue of the tradeoff between latency and consistency in a more recent post, noting that, unlike availability vs consistency, “the latency vs. consistency tradeoff is present even during normal operations of the system.”

The Apache Cassandra wiki actually makes this point very well:

“The CAP theorem… states that you have to pick two of Consistency, Availability, Partition tolerance: You can’t have the three at the same time and get an acceptable latency. Cassandra values Availability and Partitioning tolerance (AP). Tradeoffs between consistency and latency are tunable in Cassandra. You can get strong consistency with Cassandra (with an increased latency).”

This suggests that you can, in fact, have consistency, partition tolerance and availability at the same time, but that latency will suffer. ScaleDB’s Mike Hogan made that argument earlier this year in describing the ‘CAP event horizon’ – “the point at which latency for a clustered system exceeds that which is acceptable and then you must decide what concessions you are willing to make”.

See also Brian Bulkowski’s explanation of how Citrusleaf can claim to deliver immediate consistency by relaxing availability in the event of partition failure: “During this period, Citrusleaf will seem less highly available – that is, latencies will be higher – until the reconfiguration completes. Transactions still flow during this period – they are queued and forwarded at different places in the client and in the servers – but the cluster has, in theoretical terms, lower availability.”

Like Citrusleaf’s ACID-compliant NoSQL database, NewSQL databases are not designed to avoid the CAP event horizon by being as available as eventually consistent systems – that *would* break CAP Theorem – but arguably they are designed to delay that CAP event horizon as much as possible by delivering systems that, in the event of a partition, are highly consistent and offer *a degree of availability*.

Whether that degree of availability is suitable for your application will depend on your tolerance – not for partitions but for latency.

How will pro-SQL respond to NoSQL?

Gear6’s Mark Atwood is less than impressed with my recent statement: “Memcached is not a key value store. It is a cache. Hence the name.”

Mark has responded with a post in which he explains how memcached can be used as a key value store with the assistance of “persistent memcached” from Gear6, or by combining memcached with something like Tokyo Cabinet.

As much as I agree with Mark that other technologies can be used to turn memcached into a key value store I can’t help thinking his post actually proves my point: that memcached itself is not a key value store.

Either way it brings me to the next post in the NoSQL series (see also The 451 Group’s recent Spotlight report), looking at what the existing technology providers are likely to do in response.

I spent last week in San Francisco at the Open Source Business Conference where David Recordon, head of open source initiatives at Facebook, outlined how the company makes use of various open source projects, including memcached and MySQL, to scale its infrastructure.

It was an interesting presentation, although the thing that stood out for me was that Recordon didn’t once mention Cassandra, the open source key value store created by Facebook, despite being asked directly about the company’s plans for what was rather quaintly referred to as “non-relational databases”.

In fact, this recent post from Recordon puts Cassandra in context: “we use it for Inbox search, but the majority of development is now being led by Digg, Rackspace, and Twitter”. It is technologies like MySQL and memcached that Facebook is scaling to provide its core horsepower.

The death of memcached, as they say, has been greatly exaggerated.

That said, it is clear that to some extent the rise of NoSQL can be explained by CAP Theorem and the inability of the MySQL database to scale consistently. Sharding is a popular method of increasing the scalability of the MySQL database to serve the requirements of high-traffic websites, but it’s manually intensive. The memcached distributed memory object-caching system can also be used to improve performance, but does not provide persistence.

An alternative to throwing out investments in MySQL and memcached in favor of NoSQL is to improve the MySQL/memcached combination, however. A number of vendors, including Gear6 and NorthScale, are developing and delivering technologies that add persistence to memcached (see recent 451 Group coverage on Gear6 and NorthScale), while appliance providers such as Schooner Information Technology (451 coverage) and Virident Systems (451 coverage) have taken an appliance-based approach to adding persistence.

Another approach would be to improve the performance of MySQL itself. ScaleDB (451 coverage) has a shared-disk storage engine for MySQL that promises to improve its scalability. We have also recently come across GenieDB, (451 coverage) which is promising a massively distributed data storage engine for MySQL. Additionally, Tokutek’s TokuDB MySQL storage engine is based on Fractal Tree indexing technology that reduces data-insertion times, improving the performance of MySQL for both read and write applications, for example.

As we noted in our recent assessment of Tokutek, while TokuDB is effectively an operational database technology, it does blur the line between operations and analytics since the company claims it delivers a performance improvement sufficient to run ad hoc queries against live data.

Beyond MySQL, while we expect the database incumbents to feel the impact of NoSQL in certain use cases, the lack of consistency (in the CAP Theorem sense) inevitably enables quick dismissal of their wider applicability. Additionally, we expect to see the data management vendors take steps to improve performance and scalability. One method is through the use of in-memory databases to improve performance for repeatedly accessed data, another is through the use of in-memory data grid caching technologies, which are designed to solve both performance and scalability issues.

Although these technologies do not provide the scalability required by Facebook, Amazon, et al., the question is, how many applications need that level of scalability? Returning again to CAP Theorem, if we assume that most applications do not require the levels of partition tolerance seen at Google, expect the incumbents to argue that what they lack in partition tolerance they can make up for in consistency and availability.

Somewhat inevitably, the requirements mandated by NoSQL advocates will be watered down for enterprise adoption. At that level, it may arguably be easier for incumbent vendors to sacrifice a little consistency and availability for partition tolerance than it will be for NoSQL projects to add consistency and availability.

Much will depend on the workload in question, which is something that is being hidden by debates that assume a confrontational relationship between SQL and NoSQL databases. As the example of Facebook suggests, there is room for both MySQL/memcached and NoSQL

Categorizing the “Foo” fighters – making sense of NoSQL

One of the essential problems with the covering the NoSQL movement is that it describes not what the associated databases are, but what they are not (and doesn’t even do that very well since SQL itself is in many cases orthogonal to the problem the databases are designed to solve).

It is interesting to see fellow analyst Curt Monash facing the same problem. As he notes, while there seems to be a common theme that “NoSQL is Foo without joins and transactions,” no one has adequately defined what “Foo” is.

Curt has proposed HVSP (High-Volume Simple Processing) as an alternative to NoSQL, and while I’m not jumping on the bandwagon just yet, it does pass the Ronseal test (it does what it says on the tin), and it also matches my view of what defines these distributed data store technologies.

Some observations:

  • I agree with Curt’s view that object-oriented and XML databases should not be considered part of this new breed of distributed data store technologies. There is a danger that NoSQL simply comes to mean non-relational.
  • I also agree that MapReduce and Hadoop should not be considered part of this category of data management technologies (which is somewhat ironic since if there is any technology for which the terms NoSQL or Not Only SQL are applicable, it is MapReduce).
  • The vendors associated with the NoSQL movement (Basho, Couchio and MongoDB) are in a problematic position. While they are benefiting from, and to some extent encouraging, interest in NoSQL, the overall term masks their individual benefits. My sense is they will look to move away from it sooner rather than later.
  • Memcached is not a key value store. It is a cache. Hence the name.
  • .
    There are numerous categorizations of the various NoSQL technologies available on the Internet. Without wishing to add yet another to the mix, I have created another one – more for my benefit than anything else.

    It includes a list of users for the various projects (where available), and also some sense of whether the various projects fit into CAP Theorem, an understanding of which is, to my mind, essential for understanding how and why the NoSQL/HVSP movement has emerged (look out for more on CAP Theorem in a follow-up post on alternatives to NoSQL).

    Here’s my take, for those that are interested. As you can see there’s a graph database-shaped whole in my knowledge. I’m hoping to fill that sooner rather than later.

    By the way, our Spotlight report introducing The 451 Group’s formal coverage of NoSQL databases will be available here imminently.

    Update: VMware has announced that it has hired Redis creator Salvatore Sanfilippo, and is taking on the Redis key value store project. The image below has been updated to reflect that, as well as the launch of NorthScale’s Membase.