Categorizing the “Foo” fighters – making sense of NoSQL

One of the essential problems with the covering the NoSQL movement is that it describes not what the associated databases are, but what they are not (and doesn’t even do that very well since SQL itself is in many cases orthogonal to the problem the databases are designed to solve).

It is interesting to see fellow analyst Curt Monash facing the same problem. As he notes, while there seems to be a common theme that “NoSQL is Foo without joins and transactions,” no one has adequately defined what “Foo” is.

Curt has proposed HVSP (High-Volume Simple Processing) as an alternative to NoSQL, and while I’m not jumping on the bandwagon just yet, it does pass the Ronseal test (it does what it says on the tin), and it also matches my view of what defines these distributed data store technologies.

Some observations:

  • I agree with Curt’s view that object-oriented and XML databases should not be considered part of this new breed of distributed data store technologies. There is a danger that NoSQL simply comes to mean non-relational.
  • I also agree that MapReduce and Hadoop should not be considered part of this category of data management technologies (which is somewhat ironic since if there is any technology for which the terms NoSQL or Not Only SQL are applicable, it is MapReduce).
  • The vendors associated with the NoSQL movement (Basho, Couchio and MongoDB) are in a problematic position. While they are benefiting from, and to some extent encouraging, interest in NoSQL, the overall term masks their individual benefits. My sense is they will look to move away from it sooner rather than later.
  • Memcached is not a key value store. It is a cache. Hence the name.
  • .
    There are numerous categorizations of the various NoSQL technologies available on the Internet. Without wishing to add yet another to the mix, I have created another one – more for my benefit than anything else.

    It includes a list of users for the various projects (where available), and also some sense of whether the various projects fit into CAP Theorem, an understanding of which is, to my mind, essential for understanding how and why the NoSQL/HVSP movement has emerged (look out for more on CAP Theorem in a follow-up post on alternatives to NoSQL).

    Here’s my take, for those that are interested. As you can see there’s a graph database-shaped whole in my knowledge. I’m hoping to fill that sooner rather than later.

    By the way, our Spotlight report introducing The 451 Group’s formal coverage of NoSQL databases will be available here imminently.

    Update: VMware has announced that it has hired Redis creator Salvatore Sanfilippo, and is taking on the Redis key value store project. The image below has been updated to reflect that, as well as the launch of NorthScale’s Membase.

    Tags: , , , , , , , ,

    9 comments ↓

    #1 Nuno Job on 03.15.10 at 2:13 pm

    >> I agree with Curt’s view that object-oriented and
    >> XML databases should not be considered part of
    >> this new breed of distributed data
    >> store technologies.
    >> There is a danger that NoSQL simply comes
    >> to mean non-relational.

    Well sorry to tell you but what the hell does XML has to do with anything? So JSON is ok but XML is not? Or…. Is ’cause it’s hip???

    I thought it was about CAP ? If it is, then some XML databases frekin rock. They produce websites with pentabytes of information, with ingestion rates that would make many websites blush. Aside from that they exhibit the features (sharding, replication, high availability, strict consistency) that most NoSQL systems do but in a completely different maturity level, as they have been doing this for years now!

    Having said that, your point is?

    #2 Matthew Aslett on 03.15.10 at 2:32 pm

    Cool your jets there Nuno. Nothing wrong with XML databases, which can indeed provide the levels of scalability and performance you describe. But if people are trying to figure out what these new NoSQL/HVSP databases are, then in my view including everything that is not a relational database into the category just confuses the situation even more than it already is – which is a lot. I’m not altogether clear on how it benefits the XML database providers to lump themselves in with NoSQL, either, as it obscures their strengths rather than playing to them

    #3 Nuno Job on 03.15.10 at 3:35 pm

    To me it’s fairly obvious that JSON maps quite closely to objects (Object notation) and XML to documents – real world entities that people want to digitize. I would go a little further and say that JSON is closer to the relational model than XML, but that’s a personal opinion.

    So this really has nothing to do with being NoSQL or not, and that was my point!

    So if any XML database tries to lump in, that’s just wrong.
    I did comment on NoSQLdatabase.org that they should be aware that not all XML databases are NoSQL — just like not all JSON datastores are NoSQL.

    But there are some databases that store XML are NoSQL as the community see’s it, because they have been using the same techniques to solve the same problems. And MarkLogic (which is the one that Curt used as an example) as been doing that consistently for some years now.

    #4 Chris Noble on 03.15.10 at 4:47 pm

    There are lots of people (myself included) trying to wrap their heads around this, I quite like Nathan Hurst’s attempt to position NoSQL systems in terms of Availability, Partition Tolerance and Consistency.

    It’s just a different way of diagramming the contents of your penultimate column, really but perhaps worth a peek.

    http://blog.nahurst.com/visual-guide-to-nosql-systems

    Chris

    #5 Matthew Aslett on 03.15.10 at 4:57 pm

    Thanks Chris, I saw Nathan’s post this morning and agree it is very useful for explaining the role of CAP Theorem.

    #6 Dave Kellogg on 03.15.10 at 8:52 pm

    Matt,

    It strikes me that excluding XML databases from a category (or should I say un-category) called “NoSQL” is illogical.

    I agree that NoSQL needs definition. Perhaps unfairly, I’m OK with excluding object databases, most probably on grounds of irrelevance / asked-and-answered.

    However, XQuery is most definitely not SQL and unless you want to change the name of NoSQL itself — as some do — then it should be included.

    Check out the Wikipedia “structured storage” page which includes both the “traditional” NoSQL suspects as well as other options, including currently XML systems.

    http://en.wikipedia.org/wiki/Structured_storage

    Best,
    Dave

    #7 Matthew Aslett on 03.16.10 at 1:21 am

    Hi Dave,

    My view is that the term NoSQL is so broad that it is essentially meaningless, and that another term is required to describe the likes of Cassandra, Redis, Voldemort, Riak et al in a meaningful way. Whatever that term is – I’ll use Curt’s HVSP here for simplicity’s sake – existing XML and object databases are not HVSP databases.

    #8 How will pro-SQL respond to NoSQL? — Too much information on 03.22.10 at 12:18 pm

    […] Mark Atwood is less than impressed with my recent statement: “Memcached is not a key value store. It is a cache. Hence the […]

    #9 区分两种不同类型的列存储数据库 | Allen's World on 05.08.12 at 12:20 am

    […] and Cassandra are being called column-stores with increasing frequency (e.g.here, here, and here), due to their ability to store and access column families separately. This makes them appear to be […]