A different perspective on NoSQL vendor traction

Amid the reporting of 10gen’s $42m funding round yesterday a specific claim about 10gen’s success to date caught my eye.

“10gen says it’s got about half the NoSQL market wrapped up already. This is based on… indicators, such as how often LinkedIn profiles mention MongoDB.”

While our own analysis of LinkedIn profiles did indeed indicate that 10gen has a sizeable lead over its NoSQL rivals, this only accounts for the NoSQL market *to date*, and the NoSQL vendors have barely scratched the surface.

451 Research recently estimated that NoSQL software vendors between them generated revenue of just $20m in 2011 (less than half 10gen’s latest funding round), and that the market will grow at a CAGR of 82% to reach $215m by 2015.

10gen is well placed to capitalize on this growth given its customer and revenue traction to date. While we are not breaking out individual revenue estimates the chart below shows revenue and customer estimates for 10gen, Basho, Couchbase and DataStax, with the scale adjusted to fit on a single chart.

The chart appears to confirm 10gen’s claim to have half the NoSQL market wrapped up, at least in terms of customers. However, what this chart doesn’t address is the relative strategy stage of each vendor in terms of customer traction.

10gen has done extremely well in growing a large customer base via its focus on ease of developer adoption, and is now turning its attention to the sort of capabilities required by traditional enterprises.

Other vendors in the NoSQL space have done precisely the opposite: starting with enterprise capabilities and now turning their attention to greater ease of use and developer adoption.

We can begin to get a sense of how these strategies are playing out if we add a column for revenue per customer (again re-scaled). Here you can see that 10gen is actually doing less well than some of its rivals.

The size of the MongoDB installed base gives 10gen a big opportunity to aim at, but others are arguably ahead in terms of traction with enterprise customers. That’s why our market sizing methodology is specifically designed to take multiple (sometimes conflicting) factors into account in creating an estimate for each vendor, as well as the aggregate total.

10gen may well have about half the current NoSQL market wrapped up but this market has really only just begun.

The Data Day, Today: Apr 19 2012

Splunk goes public. SkySQL and Connotate raise funding. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Splunk Prices Initial Public Offering 13,500,000 shares at $17.00 per share = $229.5m.

* Connotate Increases Momentum and Closes $7m Series B Round

* SkySQL Raises $4 million in Series A Round

* SAND Technology Announces Exploration of Potential Strategic Alternatives

* GoodData Closes out the Quarter With Increased Revenue Growth and Expanded Market Traction

* World’s Largest Telcos Adopt Graph Databases to Solve Connected Data Issues

* Gazzang Seizes Big Data Opportunity, Announces Record Quarter and Year over Year Growth

* Hadapt Adds Big Data Industry Veteran Christopher Lynch as Chairman of the Board of Directors

* PalominoDB and SkySQL Join Forces to Offer Unparalleled Remote Database Services to Leading Companies Worldwide

* Cloudant Data Layer as a Service Adds Support for Joyent Cloud

* GoGrid Introduces a High-Performance Platform for Predictive Analytics

* MongoDB Hadoop Connector Announced

* StreamBase Releases StreamBase LiveView 1.0

* Pervasive RushAnalyzer and Cloudera Eliminate Barriers to Rapid Hadoop ROI

* Pegasystems Announces Hadoop Big Data Support

* XtremeData Hires Former IBM Analytics Leader

* Lucid Imagination Announces General Availability of LucidWorks Enterprise 2.1

* Of open data and pregnant men

* Is UNQL Dead?

* MySQL in 2012: Report from Percona Live

* For 451 Research clients

# Will new offerings and price cuts encourage greater database-as-a-service adoption? Spotlight report

# Basho expands into cloud storage with Riak CS Impact Report

# SAP modernizes its application stack at the data layer and the mobile front end Impact Report

# QlikTech takes QlikView pricing out of the dark Impact Report

# Kitenga refreshes Hadoop-based content-analysis wares; finds rollouts a slow burn Impact Report

# CoreMedia looks to NoSQL to scale social experiences for its WCM platform Impact Report

# Boundary maps monitoring for ‘big data’ as its path to enterprise Impact Report

# Orchestra to add data quality notes to MDM ensemble as it continues to eye US growth Impact Report

# Columnar database provider SAND Technology puts itself up for sale M&A Insight

# Is it time for Microsoft to ditch partners for performance management and go shopping? Acquirer IQ

And that’s the Data Day, today.

The Data Day, Today: Apr 2 2012

Basho launches cloud storage play. Opera acquisitions. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Basho Unveils Riak CS, Multi-Tenant Cloud Storage Software for Public and Private Clouds

* InsightsOne Secures $4.3 Million in Series A Round of Funding Led by Norwest Venture Partners

* Opera buys Commendo to create predictive analytics powerhouse

* Opera Solutions Increases Procurement Capabilities with Acquisition of Lexington Analytics

* How federal money will spur a new breed of big data

* Another HP org change Vertica no longer under the purview of Autonomy boss Mike Lynch?

* New SAS Visual Analytics Helps Organizations Analyze, Visualize Big Data

* Citrusleaf Delivers Real-Time NoSQL Replication

* NuoDB Launches Open Source Initiative on Github

* Actian Teams up With FlyingBinary and Tableau to Unleash Big Data Potential

* DH2i Launches and Unveils DxConsole Next Generation Virtualization Solution to Enable the Agile, Always-On Enterprise

* Acunu Analytics Ready to Preview!

* SAND Technology Announces Second Quarter Results for Fiscal Year 2012

* Idera Announces VMware Database Performance Monitoring Solution

* Idera Announces SQL Compliance Manager 3.6

* WalmartLabs is building big data tools — and will then open source them

* The three waves of opportunities in big data

* 4 Big Data Myths – Part I

* For 451 Research clients

# Drawn to Scale raises funds for Hadoop-based real-time database Impact report

# ParElastic brings elastic parallelism to relational databases Impact report

# DH2i launches with PolyServe-inspired database-virtualization software Impact report

# Tape industry pins future on ‘big data,’ active archiving and LTFS Spotlight report

# Lucid Imagination dreams up new strategy for enterprise search Market development report

# Pentaho identifies ‘big data’ analytics as investment priority, hooks into DataStax Market development report

# GridGain positions in-memory data grid for real-time analytics Market development report

# Having earned its stripes in HPC, Panasas heads for ‘big data’ Market development report

* Google News Search outlier of the day: Top 10 Dog and Cat Medical Conditions of 2011

And that’s the Data Day, today.

The Data Day, Today: Feb 24 2012

Teradata partners with Hortonworks. New CEOs for Zettaset and VoltDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Teradata-Hortonworks Partnership to Accelerate Business Value from Big Data Technologies

* Skytree Unlocks the Advanced Analytics Power of Big Data with Unprecedented Performance, Scalability and Accuracy

* Big Data Innovator Zettaset Appoints Jim Vogt as New President and CEO

* Zettaset to Create Secure Hadoop with ‘SHadoop’ Initiative

* VoltDB Names Bruce Reading President and Chief Executive Officer

* Basho Unveils New Graphical Operations Dashboard, Diagnostics With Release of Riak 1.1

* Pervasive RushAnalyzer Launches ‘No Compromise’ Predictive Analytics for Hadoop and Big Data

* QlikTech Reveals Pricing for its QlikView Business Discovery Platform

* Kognitio Announces Completely Memory-Based Pricing

* Objectivity Adds New Plugin Framework, Integrated Visualizer And Support For Tinkerpop Blueprints To InfiniteGraph

* Announcing the Infochimps Platform for Big Data

* Big Data, Hadoop and StreamInsight

* Three New Cloud Providers join the MongoDB ecosystem

* Hadoop Has Promise but Also Problems

* Hortonworks: Reaffirming our Commitment to 100% Pure Open Source Despite speculation to the contrary.

* WhySQL? Evernote explains why it continues to use SQL databases.

* More on database consistency Anders Karlsson explains the different definitions of database consistency.

* Graphic proof of big demand for big data talent Or just graphic proof of use of phrase ‘big data’ in jobs ads?

* Will ‘big data’ transform your industry?

* For 451 Research clients

# CrowdFlower – it’s like Hadoop, but with people? Impact Report

# Teradata and Hortonworks strike Hadoop marketing and development deal Market Development report

# Hypertable reemerges with high-performance NoSQL database Market Development report

And that’s the Data Day, today.

The geographic distribution of NoSQL skills: Apache Cassandra and Riak

Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

Following yesterday’s look at Membase and HBase, part two examines the geographic spread of Apache Cassandra and Basho Technologies’ Riak.

The statistics showed that 52.2% of the 787 LinkedIn members with “Apache Cassandra” in their member profiles are based in the US (as previously explained, we had to use the ‘Apache’ qualifier with Cassandra to filer out people with the name Cassandra).

A significant proportion (18.0%) of those are in the Bay area, although fewer than Hadoop, Membase and HBase. The results also indicate that Canada is a hot-spot for Apache Cassandra skills, with 4.1%, while Apache Cassandra is also making in-roads into Europe via France and Spain.

Basho’s Riak is less dependent on the USA for adoption. The statistics showed that less than half – 45.5% – of the 376 LinkedIn members with “Riak” in their member profiles are based in the US, with only 13.0% in the Bay area.

Riak hot-spots include the UK (6.9%) and Australia (4.3%). as well as the Boston area, in keeping with the company’s HQ.

The series will continue later this week with MongoDB, CouchDB, Neo4j, and Redis.

N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

VC funding for Hadoop and NoSQL tops $350m

451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.

According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.

That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.

A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.

The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.

I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.

The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.

Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.

One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.

It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.

While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.

451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.

NoSQL consolidation begins…

The predicted consolidation of the NoSQL database landscape has begun. Membase and CouchOne have announced that they are merging to form Couchbase.

And in more interesting NoSQL news, Danish IT company Trifork has announced that it has acquired an 8% stake in Basho as part of the NoSQL vendor’s $7.4m series D round, and has become the European distributor for Riak.

The formation of Couchbase brings together to of the leading companies in the NoSQL space, and the complementary nature of the their technology and business plans highlights that the term NoSQL has been applied to many different database technologies which are being adopted for different reasons.

While Membase had focused on improving the performance of distributed applications through its Membase Server distributed database, CouchOne focused on developer interest in flexible document data stores and mobile applications, rather than performance at scale.

Additionally while Membase was focused on operational adoption with a small (albeit significant) developer community, the priority with CouchOne has been on growing adoption of Apache CouchDB, with commercial efforts only recently becoming the focus of attention.

The technology is also complementary. Couchbase will combine the Membase and CouchDB projects to form a new distributed document store project of the same name that combines the caching and clustering technology of Membase with the CouchDB document data store.

The result will be a new distributed document database covering a variety of use cases from mobile applications (Mobile Couchbase) to scalable clusters (Elastic Couchbase), with synchronization of data between the various Couchbase implementations enabled by CouchSync.

The merged company will be led by Bob Weiderhold, formerly CEO of Membase, while Damien Katz, formerly CEO of CouchOne and creator of the CouchDB database, becomes CTO.

Couchbase is claiming more than 200 customers, which would indicate phenomenal growth for both companies since the launch of their CouchOne Mobile and Membase Server products in September and October 2010 respectively.

Prior to the launch of those products they previously claimed just a handful of customers each, although CouchOne had signed up thousands of users to its free hosted services, so it had a large and willing audience ready for conversion.

Additionally the company claims millions of combined users since CouchDB has been included in every installation of the Ubuntu Linux distribution since late 2009 and Heroku (now part of Salesforce.com) offers a Membase-driven service to thousands of its hosting customers.

We previously predicted that we would see the NoSQL market both consolidate and proliferate this year, and it is worth noting that the merger of CouchOne and Membase will not result in a similar consolidation of open source projects.

While Couchbase.org can be expected to replace membase.org over time, the Couchbase project will be independent of the Apache CouchDB, which will not be impacted by the merger. Couchbase will continue to contribute to both CouchDB and also the memcached project.

While we’re on the subject of NoSQL, it is also interesting to see that Danish IT vendor Trifork has not only signed up to be European distributor of the Riak database, but has also taken a stake in Basho Technologies.

Trifork has acquired newly issued shares in Basho representing 8.35% of the company as part of its series D round, with an option to acquire an additional 3.96% at the end of Q1 2011.

The beginning of the end of NoSQL

CouchOne has become the first of the major NoSQL database vendors to publicly distance itself from the term NoSQL, something we have been expecting for some time.

While the term NoSQL enabled the likes of 10gen, Basho, CouchOne, Membase, Neo Technologies and Riptano to generate significant attention for their various database projects/products it was always something of a flag of convenience.

Somewhat less convenient is the fact that grouping the key-value, document, graph and column family data stores together under the NoSQL banner masked their differentiating features and potential use cases.

As Mikael notes in the post: “The term ‘NoSQL’ continues to lump all the companies together and drowns out the real differences in the problems we try to tackle and the challenges we face.”

It was inevitable, therefore, that as the products and vendors matured the focus would shift towards specific use cases and the NoSQL movement would fragment.

CouchOne is by no means the only vendor thinking about distancing itself from NoSQL, especially since some of them are working on SQL interfaces. Again, we would see this fragmentation as a sign of maturity, rather than crisis.

The ongoing differentiation is something we plan to cover in depth with a report looking at the specific use cases of the “database alternatives” early in 2011.

It is also interesting that CouchOne is distancing itself from NoSQL in part due to the conflation of the term with Big Data. We have observed this ourselves and would agree that it is a mistake.

While some of the use cases for some of the NoSQL databases do involve large distributed data sets not all of them do, and we had noted that the launch of the CouchOne Mobile development environment was designed to play to the specific strengths of Apache CouchDB: peer-based bidirectional replication, including disconnected mode, and a crash-only design.

Incidentally, Big Data is another term we expect to diminish in usage in 2011, since Bigdata is a trademark of a company called SYSTAP.

Witness the fact that the Data Analytics Summit, which I’ll be attending next week, was previously the Big Data Summit. We assume that is also the reason Big Data News has been upgraded to Massive Data News.

The focus on big data sets and solving big data problems will continue, of course, but expect much less use of Big Data as a brand.

Similarly, while we expect many of the “NoSQL” databases have a bright future, expect much less focus on the term NoSQL.

User perspectives on NoSQL

The NoSQL EU event in London this week was a great event with interesting perspectives from both vendors – Basho, Neo Technology, 10gen, Riptano – and also users – The Guardian, the BBC, Amazon, Twitter. In particular I was interested in learning from the latter about how and why they ended up using alternatives to the traditional relational database model.

Some of the reasons for using NoSQL have been well-documented: Amazon CTO Werner Vogels talked about how the traditional database offerings were unable to meet the scalability Amazon.com requires. Filling a functionality void also explains why Facebook created Cassandra, Google created BigTable, and Twitter created FlockDB (etc etc). As Werner said, “We couldn’t bet the company on other companies building the answer for us.”

As Werner also explained, however, the motivation for creating Dynamo was also about enabling choice and ensuring that Amazon was not trying to force the relational database to do something it was not designed to do. “Choosing the right tool for the job” was a recurring theme at NoSQL EU.

Given the NoSQL name it is easy to assume that this means that the relational database is by default “the wrong tool”. However, the most important element in that statement is arguably not “tool”, but “job” and The Guardian discussed how it was using non-relational data tools to create new applications that complement its ongoing investment in the Oracle database.

For example, the Guardian’s application to manage the progress of crowdsourcing the investigation of MP’s expenses is based on Redis, while the Zeitgeist trending news application runs on Google’s AppEngine, as did its live poll during the recent leader’s election debate. Datablog, meanwhile, relies on Google Spreadsheets to serve up usable and downloadable data – we’ll ignore for a moment whether Google Spreadsheets is a NoSQL database 😉

Long-term The Guardian is looking towards the adoption of a schema-free database to sit alongside its Oracle database and is investigating CouchDB. The overarching theme, as Matthew Wall and Simon Willison explained, is that the relational database is now just a component in the overall data management story, alongside data caching, data stores, search engines etc.

On the subject of choosing the right tool for the job, Basho’s engineering manager Brian Fink pointed out that using NoSQL technology alongside relational SQL database technology may actually improve the performance of the SQL database since storing data in a relational database that does not need SQL features slows down access to data that does need SQL features.

Another perspective on this came from Werner Vogels who noted that unlike database administrators/ systems architects, users don’t care about where data resides or what model it uses – as long as they get the service they require. Werner explained that the Amazon.com homepage is a combination of 200-300 different services, with multiple data systems. Users do not think about data sources in isolation, they care about the amalgamated service.

This was also a theme that cropped up in the presentation by Enda Farrell, software architect at the BBC, who noted that the BBC’s homepage is a PHP application integrated with multiple data sources at multiple data centers, and also Twitter‘s analytics lead Kevin Weil, who described Twitter’s use of Hadoop, Pig, HBase, Cassandra and FlockDB.

While the company is using HBase for low-latency analytic applications such as people search and moving to Cassandra from MySQL for its online applications, it uses its recently open-sourced FlockDB graph database to serve up data on followers and correlate the intersection of followers to (for example) ensure that Tweets between two people are only sent to the followers of both. (As something of an aside, Twitter is using Hadoop to store the 7TB of of data its generates a day from Tweets, and Pig for non-real time analytics).

Kevin noted that the company is also working with Digg to build real-time analytics for Cassandra and will be releasing the results as open source, and also discussed how Twitter has made use of open source technologies created by others such as Facebook (both Cassandra and the Scribe log data aggregation server.

One of the issues that has arisen from the fact that organizations such as Amazon and Facebook have had to create their own data management technologies is the proliferation of NoSQL databases and a certain amount of wheel re-invention.

Werner explained that SmugMug creator Don Macaskill ended up being a MySQL expert not because he necessarily wanted to be, but because he needed to be because he had to be to keep his applications running.

“He doesn’t want to have to become an expert in Cassandra,” noted Werner. “What he wants is to have someone run it for him and take care of that.” Presumably Riptano, the new Cassandra vendor formed by Jonathan Ellis – project chair for the Cassandra database – will take care of that, but in the meantime Werner raised another long-term alternative.

“We shouldn’t all be doing this,” he said, adding that Dynamo is not as popular within Amazon Web Services as it once was as it is a product, that requires configuration and management, rather than a service, and Amazon employees “have better things to do.”

Which raises the question – don’t Twitter, Facebook, the BBC, the Guardian et al have better things to do than developing and maintaining database architecture? In a perfect world, yes. But in a perfect world they’d all have strongly consistent, scalable distributed database systems/services that are suited to their various applications.

Interestingly, describing S3 as “a better key/value store than Dynamo”, Werner noted that SimpleDB and S3 are “a good start to provide that service”.

Saying yes to NoSQL

As a company, The 451 Group has built its reputation on taking a lead in covering disruptive technologies and vendors. Even so, with a movement as hyped as NoSQL databases, it sometimes pays to be cautious.

In my role covering data management technologies for The 451 Group’s Information Management practice I have been keeping an eye on the NoSQL database movement for some time, taking the time to understand the nuances of the various technologies involved and their potential enterprise applicability.

That watching brief has now spilled over into official coverage, following our recent assessment of 10gen. I also recently had the chance to meet up with Couchio’s VP of business development, Nitin Borwankar (see coverage initiation of Couchio). I’ve also caught up with Basho Technologies sooner rather than later. A report on that is now imminent.

There are a couple of reasons why I have formally began covering the NoSQL databases. The first is the maturing of the technologies, and the vendors behind them, to the point where they can be considered for enterprise-level adoption. The second is the demand we are getting from our clients to provide our view of the NoSQL space and its players.

This is coming both from the investment community and from existing vendors, either looking for potential partnerships or fearing potential competition. The number of queries we have been getting related to NoSQL and big data have encouraged articulation of my thoughts, so look-out for a two-part spotlight on the implications for the operational and analytical database markets in the coming weeks.

The biggest reason, however, is the recognition that the NoSQL movement is a user-led phenomena. There is an enormous amount of hype surrounding NoSQL but for the most part it is not coming from vendors like 10gen, Couchio and Basho (although they may not be actively discouraging it) but from technology users.

A quick look at the most prominent key-value and column-table NoSQL data stores highlights this. Many of these have been created by user organizations themselves in order fill a void and overcome the limitations of traditional relational databases – for example Google (BigTable), Yahoo (Hbase), Zvents (Hypertable), LinkedIn (Voldemort), Amazon (Dynamo), and Facebook (Cassandra).

It has become clear that traditional database technologies do need meet the scalability and performance requirements of dealing with big data workloads, particularly at a scale experienced by social networking services.

That does raise the question of how applicable these technologies will be to enterprises that do not share the architecture of the likes of Google, Facebook and LinkedIn – at least in the short-term. Although there are users – Cassandra users include Rackspace, Digg, Facebook, and Twitter, for example.

What there isn’t – for the likes of Cassandra and Voldemort, at least – is vendor-based support. That inevitably raises questions about the general applicability of the key-value/column table stores. As Dave Kellog notes, “unless you’ve got Google’s business model and talent pool, you probably shouldn’t copy their development tendencies”.

Given the levels of adoption it seems inevitable that vendors will emerge around some of these projects, not least since, as Dave puts it, “one day management will say: ‘Holy Cow folks, why in the world are we paying programmers to write and support software at this low a level?'”

In the meantime, it would appear that the document-oriented data stores (Couchio’s CouchDB, 10gen’s MongoDB, Basho’s Riak) are much more generally applicable, both technologically and from a business perspective. UPDATE – You can also add Neo Technology and its graph database technology to that list).

In our forthcoming two-part spotlight on this space I’ll articulate in more detail our view on the differentiation of the various NoSQL databases and other big data technologies and their potential enterprise applicability. The first part, on NoSQL and operational databases, is here.