Search Results for 'nosql' ↓

NoSQL ≠ open source

I thought we finished with trying to define NoSQL in 2010 but Martin Fowler has raised the question again with his recent post – although he has a good reason to do so since he is collaborating on a book on the subject.

Fowler’s list of common characteristics (which he acknowledges is not definitional) is as follows:

  • Not using the relational model (nor the SQL language)
  • Open source
  • Designed to run on large clusters
  • Based on the needs of 21st century web properties
  • No schema, allowing fields to be added to any record without controls
  • You could argue about whether all NoSQL databases are designed to run on large clusters, but the characteristic from the list above that I would dispute is open source.

    While it is undoubtedly true to say that most NoSQL databases are open source, I don’t believe it defines them in the same way that other common characteristics do.

    The main argument for making open source licensing a requirement of NoSQL seems to me to be historical. The first NoSQL meeting, cited by Fowler, specified that it was about “open source, distributed, non-relational databases”.

    However, making open source licensing a defining characteristic of NoSQL would also exclude a number of products that would otherwise clearly fit the definition of NoSQL, as well as projects such as Google’s BigTable and Amazon’s Dynamo which were the genesis of much – although by no means all – of the momentum behind the NoSQL database movement.

    For the sake of argument let’s assume Amazon decided to release a version of Dynamo that could be deployed on-premise and for whatever reason decided not to release “Dynamo-on-premise” under an open source license.

    Is anyone seriously going to argue that a closed source “Dynamo-on-premise” wouldn’t be a NoSQL database?

    For what it’s worth since our NoSQL, NewSQL and Beyond report the description of NoSQL I have been using is:

  • A new breed of non-relational database products
  • sharing a rejection of fixed table schema and join operations
  • designed to meet scalability requirements of distributed architectures
  • and/or schema-less data management requirements
  • Although, like Fowler I would not claim this to be a definition.

    The geographic distribution of NoSQL skills – just one more thing

    Hidden away amongst the details of our little tour around LinkedIn statistics on NoSQL and Hadoop skills was some interesting information on how many LinkedIn members list the various data management technologies in our sample in their profiles.

    Our original post contained the fact that there were 9,079 LinkedIn members with “Hadoop” in their member profiles, for example, compared to 366,084 with “MySQL” in their member profiles.

    Later posts showed there were 170 with “Membase” and 1,687 with “HBase”, 787 with “Apache Cassandra” and 376 with “Riak”, 6,048 with “MongoDB” and 2,152 with “Redis”, and finally, 1,844 with “CouchDB” and 268 with “Neo4j”.

    This gives us an interesting perspective on the relative adoption of the various NoSQL databases:

    If it wasn’t already obvious from the list above, the chart illustrates just how much more prevalent MongoDB skills are compared to the other NoSQL databases, followed by Redis, Apache CouchDB, Apache HBase and Apache Cassandra. The chart also illustrates that while HBase is the second most prevalent NoSQL skill set in the USA, it is only fourth overall given its lower prevalence in the rest of the world.

    In response, a representative from a certain vendor notes “Some skills are more valued not because they are more prevalent, but because they are harder to achieve.” Make of that what you will.

    The geographic distribution of NoSQL skills: CouchDB and Neo4j

    Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

    The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

    We’ve already taken a look at Membase and HBase; Apache Cassandra and Riak; and 10gen’s MongoDB and Redis.

    Part four brings the series to a close with a look at Apache CouchDB and Neo4j, which boast the most geographically diverse adoption of the NoSQL databases in our sample.

    The statistics showed that 36.4% of the 1,844 LinkedIn members with “CouchDB” in their member profiles are based in the US, while only 8.9% are in the Bay area, the least of any of the NoSQL database we looked at.

    The results also indicate that the UK is a particularly strong area for CouchDB skills, with 7.1%. Other hot-spots include Canada (4.1%), Germany (4.0%) and The Netherlands (3.1%).

    Neo4j is even more widely adopted, with only 36.2% of the 268 LinkedIn members with “Neo4j” in their member profiles based in the US, although 10.4% are in the Bay area.

    With 4.1%, Sweden is a hot-spot for Neo4j skills, as one might expect given that’s where it and Neo Technology originated. The UK is also strong with 9.7%, followed by India with 5.6% and the New York area with 4.9%.

    Since Neo4j originated in Europe it is of course an open question whether its higher adoption in the Rest of the World than the US is a sign of a greater spread of adoption, or a relative failure to infiltrate the US market. Given that the company already has an active presence in the US we are inclined towards the former.

    N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

    The geographic distribution of NoSQL skills: MongoDB and Redis

    Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

    The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

    We’ve already taken a look at Membase and HBase, and Apache Cassandra and Riak. Part three examines the geographic spread of 10gen’s MongoDB and Redis.

    The statistics showed that 41.0% of the 6,048 LinkedIn members with “MongoDB” in their member profiles are based in the US, putting MongoDB is the top half of the table for geographic spread.

    Only 11.2% are in the Bay area, fewer than Hadoop, Membase, HBase, Cassandra, Riak and Redis. The results also indicate that the New York area is a hot-spot for MongoDB skills, with 6.2% – as one might expect given the location of 10gen’s HQ. Other hot-spots include Brazil (4.2%) and Ukraine (2.8%).

    Redis is even more widely adopted, with only 37% of the 2,152 LinkedIn members with “Redis” in their member profiles are based in the US, although 12.0% are in the Bay area.

    Ukraine is also a hot-spot for Redis skills (3.8%) as is France (3.6%) and Spain (2.9%).

    The series will conclude later this week with CouchDB, and Neo4j.

    N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

    The geographic distribution of NoSQL skills: Apache Cassandra and Riak

    Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

    The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

    Following yesterday’s look at Membase and HBase, part two examines the geographic spread of Apache Cassandra and Basho Technologies’ Riak.

    The statistics showed that 52.2% of the 787 LinkedIn members with “Apache Cassandra” in their member profiles are based in the US (as previously explained, we had to use the ‘Apache’ qualifier with Cassandra to filer out people with the name Cassandra).

    A significant proportion (18.0%) of those are in the Bay area, although fewer than Hadoop, Membase and HBase. The results also indicate that Canada is a hot-spot for Apache Cassandra skills, with 4.1%, while Apache Cassandra is also making in-roads into Europe via France and Spain.

    Basho’s Riak is less dependent on the USA for adoption. The statistics showed that less than half – 45.5% – of the 376 LinkedIn members with “Riak” in their member profiles are based in the US, with only 13.0% in the Bay area.

    Riak hot-spots include the UK (6.9%) and Australia (4.3%). as well as the Boston area, in keeping with the company’s HQ.

    The series will continue later this week with MongoDB, CouchDB, Neo4j, and Redis.

    N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

    The geographic distribution of NoSQL skills: HBase and Membase

    Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

    The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

    We begin this week’s series with Membase and HBase, the two projects that proved, like Apache Hadoop, to have significantly greater adoption in the USA compared to the rest of the world.

    The statistics showed that 58.2% of the 170 LinkedIn members with “Membase” in their member profiles are based in the US (as previously explained, we tried the same search with Couchbase, but with only 85 results we decided to use the Membase result set as it was more statistically relevant).

    As with Hadoop, a significant proportion (27.1%) of those are in the Bay area, the highest proportion of all the NoSQL databases we looked at. The results also indicate that Ukraine is a hot-spot for Membase skills, with 3.5%, while Membase adoption is lower the UK (2.4%) than other NoSQL databases.

    It should not be a great surprise that Apache HBase returned similar results to Apache Hadoop. The top eight individual regions for HBase were exactly the same as for Hadoop, although the UK (3.4%) is stronger for HBase, as is India (10.7%).

    The statistics showed that 57.0% of the 1,687 LinkedIn members with “HBase” in their member profiles are based in the US, with 25.0% in the Bay area (the third highest in our sample behind Hadoop and Membase).

    The series will continue later this week with MongoDB, Riak, CouchDB, Apache Cassandra, Neo4j, and Redis.

    N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

    Forthcoming webinar: Real Enterprise NoSQL Applications

    On Wednesday, December 7, 2011 at 10am PT (6pm GMT) I’ll be taking part in a webinar with DataStax CTO and Apache Cassandra project chair Jonathan Ellis on the subject of Apache Cassandra: Real NoSQL Applications in the Enterprise Today.

    The session will shed light on real-world use cases for NoSQL databases by providing case studies from enterprise production users taking advantage of the massively scalable and highly-available architecture of Apache Cassandra.

    I’ll be summarising some of the findings from our NoSQL, NewSQL and Beyond research report, and exploring the drivers behind the development and adoption of NoSQL databases – explaining how the failure of existing suppliers to meet the performance, scalability and flexibility needs of large-scale data processing has led to the development and adoption of alternative data management technologies.

    Jonathan will provide more detail on Apache Cassandra and DataStax, including a number of real-world projects including Netflix, Backupify, Ooyala and Constant Contact.

    You can register for the event here and find more details about our NoSQL, NewSQL and Beyond research report here.

    VC funding for Hadoop and NoSQL tops $350m

    451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.

    According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.

    That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.

    A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.

    The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.

    I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.

    The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.

    Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.

    One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.

    It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.

    While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.

    451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.

    The significance of Oracle NoSQL

    We have previously speculated at The 451 Group about Oracle’s potential to respond to the growing adoption of NoSQL databases, noting that the company had a number of options at its disposal, including Berkeley DB and projects like HandlerSocket.

    While some may wonder about the potential impact of Oracle NoSQL (based indeed on Berkeley DB) on the existing NoSQL vendors, I believe the launch says something very significant about NoSQL itself: specifically that its adoption is driven by more than the nature of the query language.

    To get a sense of why Oracle NoSQL is significant, think about the way Oracle has traditionally responded to alternative approaches that threaten the relational model and its dominance thereof. Oracle’s approach has traditionally been to subsume the alternative approach, at least in part, into Oracle Database, nullifying the competitive threat.

    Oracle CEO Larry Ellison explained the approach himself on a recent call with investors:

    “We think that data should be integrated with a single database technology. That’s always been our strategy for Oracle. And it started as a relational database then we added objects, then we added text and then we’ve added a variety of other things like video and audio to the Oracle Database. We think that should be unified and that’s how we’re approaching the problem.”

    As we recently covered (451 clients only), Oracle is in the process of replicating this strategy with MySQL, adding support for the ability to directly access MySQL’s InnoDB and MySQL’s Cluster’s NDB storage engines using the memcached API.

    This ability to perform non-SQL querying of the database is part of the agility benefit of NoSQL, and if the term NoSQL were to be taken literally would perhaps be enough to discourage would-be NoSQL adopters from turning away from MySQL.

    As our NoSQL, NewSQL and Beyond report highlighted, however, agility is just one of six key trends we see driving adoption of NoSQL databases. Scalability, performance, relaxed consistency, intricacy and necessity will not be solved by the ability to query MySQL or MySQL Cluster using the memcached API.

    The launch of Oracle NoSQL is therefore a clear indication that there are trends at work here that cannot be solved by adding non-SQL querying to existing relational databases.

    There is another significant factor here, which is the fact that Oracle has chose to name the product NoSQL. In one simple naming move the company has effectively disarmed the NoSQL ‘movement’.

    We have previously noted that existing NoSQL vendors were turning away from the term in favor of emphasizing their individual strengths. How many of them are going to want to self-identify with an Oracle product? I’m not convinced any of them believe the brand is worth fighting for.

    NoSQL Road Show, Hadoop Tuesdays and Hadoop World

    I’ll be taking our data management research out on the road in the next few months with a number of events, webinars and presentations.

    On October 12 I’m taking part in the NoSQL Road Show Amsterdam, with Basho, Trifork and Erlang Solutions, where I’ll be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.

    The following week, October 18, I’m taking part in the Hadoop Tuesdays series of webinars, presented by Cloudera and Informatica, specifically talking about the Hadoop Ecosystem.

    The Apache Hadoop ecosystem will again be the focus of attention on November 8 and 9, when I’ll be in New York for Hadoop World, presenting The Blind Men and the Elephant.

    Then it’s back to NoSQL with two more stops on the NoSQL Road Show, in London on November 29 and Stockholm on December 1, where I’ll once again be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.

    I hope you can join us for at least one of these events, and am looking forward to learning a lot about NoSQL and Apache Hadoop adoption, interest and concerns.