The Data Day, A few days: February 2-6, 2015

NoSQL enters the multi-model age. And more

And that’s the data day, today.

It’s the end of NoSQL as we know it (and I feel fine)

Last week I tweeted that this week was shaping up to be a watershed week in the history of NoSQL. I was referring, of course, to MongoDB launching 3.0 and DataStax acquiring Aurelius – although more specifically what the context of these two announcements tells us about the future of NoSQL.

While each of these announcements could be considered significant in its own right in combination they suggest a new stage in the evolution of NoSQL and a clear signal that the future of NoSQL will be driven by database products that support multiple data models.

When we formally started covering NoSQL in 2010 it made sense to divide the various projects into four groups: key value stores, distributed (wide) column stores (or BigTable clones), graph databases, and document-oriented databases.

By early 2013 it had become obvious that there was another emerging category: multi-model databases.

Multi-model NoSQL databases have therefore been around for several years but while we have seen growing interest in these multi-model databases, in terms of widespread adoption they still lagged behind the early specialist NoSQL databases. That’s what makes the recent announcements by MongoDB and DataStax so significant.

    1. Along with releasing version 3.0 of its document database, MongoDB also began to share (at least with us) its long-term multi-model vision for MongoDB, explaining how the pluggable storage engine architecture could enable the database to support multiple data models – such as key value, graph and relational.
    1. Meanwhile DataStax described how its acquisition of Aurelius will see it developing a graph database to complement Apache Cassandra’s wide column key value model, and explained its multi-model strategy.
  • Multi-model momentum may have been growing for years but the fact that the commercial providers behind the two most popular NoSQL databases have detailed their plans to go multi-model confirms that the multi-model approach is the future of NoSQL.

    Indeed, since we expect to see similar moves from other NoSQL players it will become increasingly difficult to divide the NoSQL space in terms of key value stores, wide column stores, graph databases, and document-oriented databases. Instead it makes sense to divide the NoSQL projects in terms of whether they are single-model or multi-model.

    451 Research clients can read more about our perspectives on MongoDB’s strategic direction, as well as DataStax’s acquisition of Aurelius, and the wider implications for the NoSQL sector.

    NoSQL LinkedIn Skills Index – September 2014

    Time for a new look for our NoSQL LinkedIn Skills Index, which tracks mentions of NoSQL database in LinkedIn member profiles, as it enters its third year. We’ve switched from a bar chart to a line chart to reduce clutter – at least on the horizontal plane.

    Unfortunately the dominance of MongoDB means that the chart is inevitably cluttered on the low end of the vertical plane, but the line chart at least provides a clear illustration of that dominance.

    nosql

    There are a few other changes of note further down the list, with FoundationDB gaining a place on Sparksee (as predicted) thanks to it having the fastest rate of growth (40.74%) in Q3. ArangoDB also gained a place on InfiniteGraph thanks to recording the second fastest growth rate (37.84%).

    We noted last time that Q3 could see OrientDB overtake Aerospike, unless the release of Aerospike as open source had an immediate impact on interest levels. That seems to have occurred, with Aerospike recording 23.80% growth to not only hold off OrientDB but gain ground on Voldemort, which looks likely to be overtaken by both Aerospike and OrientDB in Q4. Inside the top 10 there is also a chance that DynamoDB could overtake MarkLogic in Q4.

    Titan (25.97%), RethinkDB (22.88%) and DynamoDB (22.85%) also deserve a mention in terms of growth in Q3, while Neo4j was the fastest growing of the top 10 with 17.99%. MongoDB was of course most popular NoSQL database by a considerable margin, once again accounting for 49% of all LinkedIn member profiles mentioning a NoSQL project.

    nosql2

    Of course, we would also note that this is not meant to be a comprehensive analysis, but rather a snapshot of one particular data source.

    NoSQL LinkedIn Skills Index – June 2014

    There isn’t a great deal of movement in the June update to our NoSQL LinkedIn Skills Index, which tracks mentions of NoSQL database in NoSQL member profiles. At the tail-end of the list FoundationDB jumped a place above InfiniteGraph and can be expected to gain another place on Sparksee in the next quarter, but otherwise it’s very much ‘as you were’.

    Q3 could also see OrientDB overtake Aerospike, unless the recent release of Aerospike as open source his an immediate impact on interest levels. FoundationDB was among those with the fastest growth rates in Q2 at 35.0%, although the faster growth came from ArangoDB (48.0%) followed by RethinkDB (36.6%), Titan (27.1%) and Couchbase (18.9%).

    nosql-Jun

    Once again MongoDB was the most popular NoSQL database by a considerable margin, representing 49% of all LinkedIn member profiles mentioning a NoSQL project.

    jun-donut

    Of course, we would also note that this is not meant to be a comprehensive analysis, but rather a snapshot of one particular data source.

    NoSQL LinkedIn Skills Index – March 2014

    The latest version of our NoSQL LinkedIn Skills Index shows the continued strength of MongoDB, as the document database increased its share back to 49% of all mentions of NoSQL databases in Q1.

    q1_donut

    We were surprised to find MongoDB’s proportion of the Index (based on the number of LinkedIn member profiles mentioning each of the NoSQL projects) actually declined in the previous quarter: from 49% to 48%. The Q1 results suggest that was just a blip.

    We had wondered whether Couchbase’s leap of two places in our previous update might also be a blip, but in fact Couchbase retained seventh spot in Q1 and there were no changes of position within the top ten this quarter.

    q1-chart

    Outside the top ten, Titan gained a place on Hypertable, as expected, while RethinkDB leapfrogged AllegroGraph, thanks to recording the third fastest growth (34.94%) in the quarter. The fastest climber, in terms of mentions, was FoundationDB (42.86%), followed by ArangoDB (38.89%). DynamoDB (28.03%) and Titan (26.88%) complete the list of the top five fastest climbers.

    Of course, we would also note that this is not meant to be a comprehensive analysis, but rather a snapshot of one particular data source.

    The Data Day, A few days: March 8-14 2014

    Basho under new leadership. And more

    And that’s the data day, today.

    The Data Day, A few days: December 13-10 2013

    VC funding for Hadoop and NoSQL tops $1bn. And more

    And that’s the data day, today.

    NoSQL LinkedIn Skills Index – December 2013

    There’s an early end to the quarter for our NoSQL LinkedIn Skills Index, based on the number of LinkedIn member profiles mentioning each of the NoSQL projects, just as there was in 2012.

    We predicted in Q3 that Couchbase would overtake MarkLogic this quarter, which came to pass, but were somewhat surprised to see Couchbase also leapfrog Riak to claim 7th place. It’s almost too close to call between the three, though we wouldn’t be surprised to see those places change hands in the coming quarters.

    december2013

    There were no other changes of position outside the top ten, although Titan is bearing down on Hypertable having recorded the fastest growth in Q4 (49.5%) and can be expected to gain a place in Q1 2014. The second fastest climber, in terms of mentions, was FoundationDB, followed by ArangoDB, RethinkDB and Apache Cassandra (the latter being particularly notable since it was the only one of the five fastest growers to also be one of the top ten most mentioned in LinkedIn member profiles).

    That growth was of course not enough to close the gap on MongoDB as the most mentioned NoSQL database in LinkedIn member profiles, although for the first time MongoDB’s proportion of the overall total actually declined – from 49% in Q3 to 48%, upsetting our prediction that MongoDB would pass the 50% threshold in Q4.

    Q42013

    It will be interesting to see whether MongoDB’s dominance declines again in Q1, although either way it retains a monumental lead over all the other NoSQL databases in terms of mentions in LinkedIn profiles.

    Of course, we would also note that this is not meant to be a comprehensive analysis, but rather a snapshot of one particular data source.

    Visualizing the $1bn+ VC investment in Hadoop and NoSQL

    Cumulative VC funding for Hadoop and NoSQL vendors broke through the $1bn barrier in 2013, according a Spotlight report published by 451 Research, based on data provided by The 451 M&A KnowledgeBase.

    The data indicates that there was a substantial increase in funding in 2013 ($530.5m, not including RethinkDB’s $8m announced yesterday) compared to 2012 ($190.9m), thanks to major rounds for the likes of MongoDB, Pivotal, Hortonworks and DataStax.

    The report includes a visualization created by 451’s Director of Data Strategy and Solutions, Barbara Peng, that illustrates the connections between the various investors and the NoSQL and Hadoop vendors in which they have invested.

    A snapshot of the visualization is shown below but the the original is interactive, enabling 451 Research clients to drag the various elements around for greater emphasis, as well as isolate the NoSQL or Hadoop categories.

    vc-firms

    451 Research clients can also scroll over the blue circles to see the total amount of funding raised by the individual Hadoop and NoSQL vendors, and scroll over the smaller orange circles to see which investors have backed which companies.

    The sample set was limited to 16 vendors for visual clarity, but the six Hadoop and 10 NoSQL providers cited account for more than 87% of funding to date (with Pivotal representing the vast majority of the remaining 13%).

    This visualization illustrates that investment in Hadoop and NoSQL providers comes from a relatively small group of VC firms (52 to be specific, excluding individual seed investors), resulting in a relatively tightly clustered graph.

    However, the visualization also enables us to put to the test the recent blog post by MarkLogic’s Adam Fowler in which he stated:

    “Just look at the number of investors who are investing in multiple NoSQL companies. They’re hedging their bets because they’re not sure themselves which businesses will survive.”

    In fact investment in multiple Hadoop and NoSQL vendors is relatively rare. Only 11 out of the 52 VC firms have invested in more than one Hadoop and/or NoSQL vendor, with seven of those picking one Hadoop vendor and one NoSQL provider. Less hedging their bets as picking a winner in each category.

    Of the remaining four investment shops, two have invested in one Hadoop distributor, one NoSQL specialist and one Hadoop-as-a-service provider (MapR, DataStax and Qubole for Lightspeed Venture Partners; Cloudera, Couchbase and Altiscale for Accel Partners), while In-Q-Tel has invested in one Hadoop supplier, one NoSQL vendor and one NoSQL-as-a-service provider (Cloudera, MongoDB and Cloudant).

    Only Sequoia Capital has invested in multiple NoSQL vendors (as well as Hadoop-as-a-service provider Altiscale) having invested in MongoDB, DataStax and – hold onto your hats, irony fans – MarkLogic. It should be noted however that Sequoia has not invested in DataStax since its series A round in late 2010.

    The full report, Venture funding for Hadoop and NoSQL vendors tops $1bn is available now to 451 Research clients and also includes our perspective on when combined Hadoop and NoSQL revenue might begin to exceed combined Hadoop and NoSQL VC funding, as well as the potential for M&A and IPO activity in 2014.

    Forthcoming webinar: Beyond NoSQL – Distributed Databases in Production, with Basho

    On Tuesday, December 10th at 10:00am PT/1:00pm ET I’ll be taking part in a webinar in association with Basho on the subject of Beyond NoSQL – Distributed Databases in Production.

    I’ll be presenting a brief history of NoSQL and covering NoSQL drivers and adoption trends, as well as our perspective on the NoSQL database landscape, and the importance of scalability and distributed architecture.

    I’ll also be joined by Bobby Patrick, EVP and CMO at Basho Technologies, to discuss the benefits and future of distributed systems, while Tapjoy will also discuss how they are using distributed databases to provide reliable data locality to their customers.

    For full details, and to register, click here.