Neither fish nor fowl: the rise of multi-model databases

One of the most complicated aspects of putting together our database landscape map was dealing with the growing number of (particularly NoSQL) databases that refuse to be pigeon-holed in any of the primary databases categories.

I have begun to refer to these as “multi-model databases” in recognition of the fact that they are able to take on the characteristics of multiple databases. In truth though there are probably two different groups of products that could be considered “multi-model”:

True multi-model databases that have been designed specifically to serve multiple data models and use-cases

Examples include:
FoundationDB, which is being designed to support ACID and NoSQL, but more to the point in this instance, multiple layers including key-value, document, and object layers

Aerospike, which is planning to combine SQL, key value, and document and graph database technologies in a single database by bringing together its Citrusleaf NoSQL database with the acquired AlchemyDB NewSQL project

OrientDB, which is, at heart, a document database, but can also be used as a graph database; as an object database, making use of the Java persistence API; and as a hybrid database, taking advantage of multiple models to serve different application requirements

ArangoDB, which promises to deliver the benefits of key value and document and graph stores in a single database

Other products that could be considered true multi-model databases are:
Couchbase Server 2.0, which can be used as both a document store and a key value store, as well as a distributed cache

Riak, which is a key-value store, although it can be used as a document store since the value can be a JSON document

NuoDB, which will provide compatibility with other databases by taking on multiple ‘personalities’ – an Oracle personality via PL/SQL compatibility is in the development roadmap, as is a document store personality via JSON support.

General-purpose databases with multi-model options
What’s the difference between multi-model databases and existing general-purpose databases that have optional capabilities for serving multiple models? My book book it’s about being designed for purpose, but I’m sure that will be a debating point for the future. In the mean-time, examples include:

Oracle MySQL 5.6, which can support both SQL-based access and key-value access via the Memcached API.

Oracle MySQL Cluster 7.2, which similarly supports concurrent NoSQL and SQL access to the database.

IBM DB2 10, which extends DB2’s hybrid relational and XML engine to enable the storage and management of graph triples, as well as support for the SPARQL 1.0 query language.

Akiban Server, which has the ability to treat groups of tables as objects and access them as JSON documents via SQL.

PostgreSQL h-store, which can be used for storing key-value pairs within a PostgreSQL data field, thereby enabling schema-less queries against data stored in PostgreSQL

We are also aware of other NewSQL database that plan to adopt support for popular NoSQL data models, while IBM has also talked about plans to integrate key value store NoSQL access capabilities with DB2 and Informix database software.

Other products that could be considered multi-model options include:
Oracle Spatial and Graph, an option for Oracle Database 11g.

One of the drivers of NoSQL database adoption has been polyglot persistence – using multiple databases depending on the specific requirements of individual applications. Multi-model databases contradict this trend, to some extent, so it will be interesting to see whether they begin to gain traction.

While we see the wisdom of selecting the best database for the job, we also recognise that it could sometimes be a matter of choosing the best data model for the job, while relying on a single storage back-end.

Our 2013 Database survey is now live

451 Research’s 2013 Database survey is now live at http://bit.ly/451db13 investigating the current use of database technologies, including MySQL, NoSQL and NewSQL, as well as traditional relation and non-relational databases.

The aim of this survey is to identify trends in database usage, as well as changing attitudes to MySQL following its acquisition by Oracle, and the competitive dynamic between MySQL and other databases, including NoSQL and NewSQL technologies.

There are just 15 questions to answer, spread over five pages, and the entire survey should take less than ten minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due during Q2.

The full report will be available to 451 Research clients, while the results of the survey will also be made freely available via a
presentation at the Percona Live MySQL Conference and Expo in April.

Last year’s results have been viewed nearly 55,000 times on SlideShare so we are hoping for a good response to this year’s survey.

One of the most interesting aspects of a 2012 survey results was the extent to which MySQL users were testing and adopting PostgreSQL. Will that trend continue or accelerate in 2013? And what of the adoption of cloud-based database services such as Amazon RDS and Google Cloud SQL?

Are the new breed of NewSQL vendors having any impact on the relational database incumbents such as Oracle, Microsoft and IBM? And how is SAP HANA adoption driving interest in other in-memory databases such as VoltDB and MemSQL?

We will also be interested to see how well NoSQL databases fair in this year’s survey results. Last year MongoDB was the most popular, followed by Apache Cassandra/DataStax and Redis. Are these now making a bigger impact on the wider market, and what of Basho’s Riak, CouchDB, Neo4j, Couchbase et al?

Additionally, we have been tracking attitudes to Oracle’s ownership of MySQL since the deal to acquire Sun was announced. Have MySQL users’ attitudes towards Oracle improved or declined in the last 12 months, and what impact will the formation of the MariaDB Foundation have on MariaDB adoption?

We’re looking forward to analyzing the results and providing answers to these and other questions. Please help us to get the most representative result set by taking part in the survey at http://bit.ly/451db13

NoSQL LinkedIn Skills Index – December 2012

Time again to take a look at our NoSQL LinkedIn Skills Index, based on the number of LinkedIn member profiles mentioning each of the NoSQL projects. This is the first update since we rebooted the analysis in September to account for more products and refine our search terms.

NoSQL_Dec

On the face of it not a lot has changed in the last quarter, although there are a few interesting statistics to pick out. For instance, Neo4j is now practically tied for sixth place with MarkLogic and can be expected to overtake it in Q1 2013. Outside the top ten shown above, Apache Accumulo has gained two places – overtaking Aerospike and Hypertable.

In fact, Apache Accumulo showed the fastest rate of growth in mentions between September and December, just ahead of DynamoDB and OrientDB, followed by Couchbase and MongoDB.

MongoDB’s growth means that it has cemented its place as the most popular NoSQL database, according to LinkedIn profile mentions. As the chart below illustrates, it now accounts for 45% of all mentions of NoSQL technologies in LinkedIn profiles, according to our sample, compared with 43% in September.

nosql_all_dec

The Data Day, Two days: December 12/13 2012

Total Data Analytics. Couchbase Server 2.0. And more

And that’s the Data Day, today.

The Data Day, Two days: November 19/20 2012

HP uncovers Autonomy irregularity. Pentaho ups big data commitment. And more.

And that’s the Data Day, today.

The Data Day, Two days: September 21/24 2012

Alpine Data bags EMC. Infobright delivers appliance. And more.

And that’s the Data Day, today.

A different perspective on NoSQL vendor traction

Amid the reporting of 10gen’s $42m funding round yesterday a specific claim about 10gen’s success to date caught my eye.

“10gen says it’s got about half the NoSQL market wrapped up already. This is based on… indicators, such as how often LinkedIn profiles mention MongoDB.”

While our own analysis of LinkedIn profiles did indeed indicate that 10gen has a sizeable lead over its NoSQL rivals, this only accounts for the NoSQL market *to date*, and the NoSQL vendors have barely scratched the surface.

451 Research recently estimated that NoSQL software vendors between them generated revenue of just $20m in 2011 (less than half 10gen’s latest funding round), and that the market will grow at a CAGR of 82% to reach $215m by 2015.

10gen is well placed to capitalize on this growth given its customer and revenue traction to date. While we are not breaking out individual revenue estimates the chart below shows revenue and customer estimates for 10gen, Basho, Couchbase and DataStax, with the scale adjusted to fit on a single chart.

The chart appears to confirm 10gen’s claim to have half the NoSQL market wrapped up, at least in terms of customers. However, what this chart doesn’t address is the relative strategy stage of each vendor in terms of customer traction.

10gen has done extremely well in growing a large customer base via its focus on ease of developer adoption, and is now turning its attention to the sort of capabilities required by traditional enterprises.

Other vendors in the NoSQL space have done precisely the opposite: starting with enterprise capabilities and now turning their attention to greater ease of use and developer adoption.

We can begin to get a sense of how these strategies are playing out if we add a column for revenue per customer (again re-scaled). Here you can see that 10gen is actually doing less well than some of its rivals.

The size of the MongoDB installed base gives 10gen a big opportunity to aim at, but others are arguably ahead in terms of traction with enterprise customers. That’s why our market sizing methodology is specifically designed to take multiple (sometimes conflicting) factors into account in creating an estimate for each vendor, as well as the aggregate total.

10gen may well have about half the current NoSQL market wrapped up but this market has really only just begun.

The Data Day, Today: May 8 2012

IBM acquires Vivisimo. Funding for Birst, ParAccel, Metamarkets and DataSift. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* For 451 Research clients

# IBM picks up Vivisimo to search for value in ‘big data’ Deal Analysis

# Teradata delivers on analytic cloud vision with Active Data Warehouse Private Cloud Impact Report

# The Big Blue picture for ‘big data’ analytics: IBM sheds light on BigSheets Impact Report

# Oversight Systems’ Continuous Analysis extracts actionable insight from data Impact Report

# Kalido updates MDM offering with business users, operationalizing master data in mind Impact Report

# Delphix reaps reward from agile approach to database virtualization Impact Report

# Automated Insights looks to pitch narrative, visuals and stats to enterprises Impact Report

# myDIALS eyes indirect sales in quest to be Internet access layer for analytics Impact Report

* IBM Advances Big Data Analytics with Acquisition of Vivisimo Also announces support for Cloudera.

* Teradata Announces 2012 First Quarter Results Revenue up 21% (PDF)

* Actuate Reports First Quarter 2012 Financial Results Revenue up 9% (PDF)

* Birst Secures $26 Million in Financing Led By Sequoia Capital

* ParAccel Closes Record Q1 Revenues and $20 Million Investment Round

* Metamarkets Raises $15 Million to Deliver Data Science-as-a-Service

* DataSift adds $7.2M: The story so far and focus for the future

* Teradata to Acquire eCircle (PDF)

* Google BigQuery brings Big Data analytics to all businesses

* TIBCO Spotfire Brings the Power of Data Discovery to Big Data and Extreme Information

* Jaspersoft Teams with VMware To Deliver Business Intelligence for Data-Driven Cloud Applications

* Kalido and Teradata Sign Global Reseller Agreement

* Actuate Announces Cloudera Alliance to Support Apache Hadoop and BIRT Developers in Big Data Integration

* Hortonworks and Kognitio Announce Technical Partnership Driving Apache Hadoop Adoption in Big Data Analytics Implementations

* Tokutek and PalominoDB Partner to Bring Scale, Performance to Database Deployments

* Acunu is pleased to announce v2 of the Acunu Data Platform!

* Is Yahoo really threatening memcached and Open Compute?

* Introducing Zend DBi as a MySQL Replacement on IBM i

* Zettaset and Hyve Solutions Build First Fully Integrated Enterprise OS Hadoop Solution

* Cloudera Announces New Japanese Subsidiary

* Bull Announces the Formation of Database Migration Business Unit

* Couchbase to Run Native with Key-Value API for ioMemory

* The Big Data Value Continuum

* Big Data is Business Intelligence plus Attention Deficit Disorder

* Nokia released Dempsy an open source stream data processing platform.

And that’s the Data Day, today.

Update on the relative popularity of NoSQL database skills

Back in December we ran a series of posts looking at the geographic distribution of NoSQL skills, according to the results of searching LinkedIn member profiles, culminating in a look at the relative overall popularity of the major NoSQL databases.

This week I took another look at LinkedIn to update the results for a forthcoming report, which gives us the opportunity to see how the results have changed over the past quarter:

While this provides us with an interesting opportunity to track LinkedIn profile mentions over time there isn’t a huge amount we can learn from this first update – other than that MongoDB seems to be increasing its dominance.

The only significant change that isn’t immediately obvious from looking at the chart is that Apache HBase has overtaken Apache CouchDB by a tiny margin to claim third place overall.

As we noted last time, however, Apache HBase is more reliant on the US than other NosQL databases for its LinkedIn mentions: it is the second most prevalent NoSQL database mentioned in the USA but fourth in the rest of the world.

Two other points to take into consideration:

– The results for Apache Cassandra are probably disproportionately low since we have to search for the full phrase in order to avoid including people called Cassandra.

– Previously we only searched for Membase. This time we added together the search results for both Membase and Couchbase. This may mean the result for Couch/Membase is disproportionately high since some members probably listed both.

This is not meant to be a comprehensive analysis, however, but rather a snapshot of one particular data source.

The Data Day, Today: Feb 14 2012

Teradata closes best year ever. NetApp and EMC propose big data forum. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Teradata Announces 2011 Fourth Quarter and Full-Year Results (PDF)

* Hell Has Not Frozen Over: NetApp and EMC Combine to Educate for Big Data Standards

* Cray Forms New Big Data Division, Hires New General Manager

* Privacy in the Age of Big Data

* ScaleBase Unveils New Elastic Load Balancing Feature at Cloud Connect

* Introducing CDH4

* Lucid Imagination “Search-as-a-Service” Powers Flexible, Cost-Effective Enterprise-Wide Data Discovery

* Couchbase Survey Shows Accelerated Adoption of NoSQL in 2012

* Open Source OData Tools for MySQL and PHP Developers

* New Release of WhereScape’s Data Warehouse Development Environment Enables Cross-Platform Database Appliance Support

* On MongoDB, SQL and ACID

* For 451 Research clients

# IxReveal seeks opportunities as a hub for data fusion Impact Report

# 5000fish sets out to swim beyond an IT services management pond in BI Impact Report

# Zimory boosts Scale cloud database with pickup of sones development team Deal Analysis Report

# Alpine Data outlines strategy as it follows the workflow for advanced analytics Market Development report

# 10gen targets agility and flexibility for increased document database adoption Market Development report

# ScaleArc expands its database-clustering and load-balancing focus beyond MySQL Market Development report

And that’s the Data Day, today.