Entries from April 2011 ↓
April 20th, 2011 — Data management
As we noted last week, necessity is one of the six key factors that are driving the adoption of alternative data management technologies identified in our latest long format report, NoSQL, NewSQL and Beyond.
Necessity is particularly relevant when looking at the history of the NoSQL databases. While it is easy for the incumbent database vendor to dismiss the various NoSQL projects as development playthings, it is clear that the vast majority of NoSQL projects were developed by companies and individuals in response to the fact that the existing database products and vendors were not suitable to meet their requirements with regards to the other five factors: scalability, performance, relaxed consistency, agility and intricacy.
The genesis of much – although by no means all – of the momentum behind the NoSQL database movement can be attributed to two research papers: Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006, and Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007.
The importance of these two projects is highlighted by The NoSQL Family Tree, a graphic representation of the relationships between (most of) the various major NoSQL projects:
Not only were the existing database products and vendors were not suitable to meet their requirements, but Google and Amazon, as well as the likes of Facebook, LinkedIn, PowerSet and Zvents, could not rely on the incumbent vendors to develop anything suitable, given the vendors’ desire to protect their existing technologies and installed bases.
Werner Vogels, Amazon’s CTO, has explained that as far as Amazon was concerned, the database layer required to support the company’s various Web services was too critical to be trusted to anyone else – Amazon had to develop Dynamo itself.
Vogels also pointed out, however, that this situation is suboptimal. The fact that Facebook, LinkedIn, Google and Amazon have had to develop and support their own database infrastructure is not a healthy sign. In a perfect world, they would all have better things to do than focus on developing and managing database platforms.
That explains why the companies have also all chosen to share their projects. Google and Amazon did so through the publication of research papers, which enabled the likes of Powerset, Facebook, Zvents and Linkedin to create their own implementations.
These implementations were then shared through the publication of source code, which has enabled the likes of Yahoo, Digg and Twitter to collaborate with each other and additional companies on their ongoing development.
Additionally, the NoSQL movement also boasts a significant number of developer-led projects initiated by individuals – in the tradition of open source – to scratch their own technology itches.
Examples include Apache CouchDB, originally created by the now-CTO of Couchbase, Damien Katz, to be an unstructured object store to support an RSS feed aggregator; and Redis, which was created by Salvatore Sanfilippo to support his real-time website analytics service.
We would also note that even some of the major vendor-led projects, such as Couchbase and 10gen, have been heavily influenced by non-vendor experience. 10gen was founded by former Doubleclick executives to create the software they felt was needed at the digital advertising firm, while online gaming firm Zynga was heavily involved in the development of the original Membase Server memcached-based key-value store (now Elastic Couchbase).
In this context it is interesting to note, therefore, that while the majority of NoSQL databases are open source, the NewSQL providers have largely chosen to avoid open source licensing, with VoltDB being the notable exception.
These NewSQL technologies are no less a child of necessity than NoSQL, although it is a vendor’s necessity to fill a gap in the market, rather than a user’s necessity to fill a gap in its own infrastructure. It will be intriguing to see whether the various other NewSQL vendors will turn to open source licensing in order to grow adoption and benefit from collaborative development.
NoSQL, NewSQL and Beyond is available now from both the Information Management and Open Source practices (non-clients can apply for trial access). I will also be presenting the findings at the forthcoming Open Source Business Conference.
April 15th, 2011 — Data management
The 451 Group’s new long format report on emerging database alternatives, NoSQL, NewSQL and Beyond, is now available.
The report examines the changing database landscape, investigating how the failure of existing suppliers to meet the performance, scalability and flexibility needs of large-scale data processing has led to the development and adoption of alternative data management technologies.
Specifically, the report covers:
- NoSQL databases designed to meet scalability requirements of distributed architectures and/or schema-less data management requirements, including big tables, key value stores, document database and graph databases
- NewSQL databases designed to meet scalability requirements of distributed architectures or to improve performance such that horizontal scalability is no longer a necessity, including new MySQL storage engines, transparent sharding technologies, software and hardware appliances, and completely new databases
- Data grid/cache products designed to store data in memory to increase application and database performance, covering a spectrum of data management capabilities from non-persistent data caching to persistent caching, replication, and distributed data and compute grid functionality
You can see how these products fit into the wider data management landscape from the chart below. The shaded areas are those specifically covered in this report.
The answer to SPRAINed relational databases
SPRAIN, used in the above graphic, is an acronym that refers to the six key factors driving the adoption of alternative data management technologies to traditional relational databases that are being ‘sprained’ as a result of being stretched beyond their normal capacity by the needs of high-volume, highly distributed or highly complex applications.
Those six key drivers, and their associated sub-drivers, are as follows:
- Scalability – hardware economics
- Performance – MySQL limitations
- Relaxed consistency – CAP theorem
- Agility – polyglot persistence
- Intricacy – big data, total data
- Necessity – open source
The report examines each of these drivers and sub-drivers in turn, investigating how they are driving interest in alternative database approaches in general, and how they prompted the development of specific NoSQL, NewSQL and data grid/cache products and services.
It continues with profiles of the individual database alternatives and their use cases and case studies before concluding with a discussion of the impact of these database alternatives on the wider database market and the likely consolidation, confluence and proliferation of various technologies looking forward.
Here’s a selection of some of our key findings:
- The database market remains dominated by relational databases and the incumbent industry giants, but the emergence of NoSQL and NewSQL alternatives has in part been driven by the inability of these products to address emerging distributed and schema-less data management requirements.
- Polyglot persistence, and the associated trend toward polyglot programming, is driving developers toward making use of multiple database products depending on which might be suitable for a particular task.
- The NoSQL projects were developed in response to the failure of existing suppliers to address the performance, scalability and flexibility requirements of large-scale data processing, particularly for Web and cloud computing applications.
- NewSQL and data-grid products have emerged to meet similar requirements among enterprises, a sector that is now also being targeted by NoSQL vendors.
- While NoSQL is seen as a software innovation prompted by the need to deal with large volumes of data, the software innovation was a direct response to the improved performance of commodity hardware clusters and the ability to spread data storage and processing across that hardware.
- Changing hardware economics mean that distributed server architecture is increasingly being adopted in traditional enterprise environments. The emergence of NewSQL providers is a direct response to the increasing need for scalable data management products to make more efficient use of this architecture.
- Distributed data-grid/cache products are increasingly being positioned as potential alternatives to relational databases as the primary platform for distributed data management, with a relational database relegated to a supporting role.
The report is available now from both the Information Management and Open Source practices (non-clients can apply for trial access). I will also be presenting the findings at the forthcoming Open Source Business Conference.
April 6th, 2011 — Data management
Yesterday The 451 Group published a report asking “How will the database incumbents respond to NoSQL and NewSQL?”
That prompted the pertinent question, “What do you mean by ‘NewSQL’?”
Since we are about to publish a report describing our view of the emerging database landscape, including NoSQL, NewSQL and beyond (now available), it probably is a good time to define what we mean by NewSQL (I haven’t mentioned the various NoSQL projects in this post, but they are covered extensively in the report. More on them another day).
“NewSQL” is our shorthand for the various new scalable/high performance SQL database vendors. We have previously referred to these products as ‘ScalableSQL’ to differentiate them from the incumbent relational database products. Since this implies horizontal scalability, which is not necessarily a feature of all the products, we adopted the term ‘NewSQL’ in the new report.
And to clarify, like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL vendors is the vendor, not the SQL.
So who would be consider to be the NewSQL vendors? Like NoSQL, NewSQL is used to describe a loosely-affiliated group of companies (ScaleBase has done a good job of identifying, some of the several NewSQL sub-types) but what they have in common is the development of new relational database products and services designed to bring the benefits of the relational model to distributed architectures, or to improve the performance of relational databases to the extent that horizontal scalability is no longer a necessity.
In the first group we would include (in no particular order) Clustrix, GenieDB, ScalArc, Schooner, VoltDB, RethinkDB, ScaleDB, Akiban, CodeFutures, ScaleBase, Translattice, and NimbusDB, as well as Drizzle, MySQL Cluster with NDB, and MySQL with HandlerSocket. The latter group includes Tokutek and JustOne DB. The associated “NewSQL-as-a-service” category includes Amazon Relational Database Service, Microsoft SQL Azure, Xeround, Database.com and FathomDB.
(Links provide access to 451 Group coverage for clients. Non-clients can also apply for trial access).
Clearly there is the potential for overlap with NoSQL. It remains to be seen whether RethinkDB will be delivered as a NoSQL key value store for memcached or a “NewSQL” storage engine for MySQL, for example. While at least one of the vendors listed above is planning to enable the use of its database as a schema-less store, we also expect to see support for SQL queries added to some NoSQL databases. We are also sure that Citrusleaf won’t be the last NoSQL vendor to claim support for ACID transactions.
NewSQL is not about attempting to re-define the database market using our own term, but it is useful to broadly categorize the various emerging database products at this particular point in time.
Another clarification: ReadWriteWeb has picked up on this post and reported on the “NewSQL Movement”. I don’t think there is a movement in that sense that we saw the various NoSQL projects/vendors come together under the NoSQL umbrella with a common purpose. Perhaps the NewSQL players will do so (VoltDB and NimbusDB have reacted positively to the term, and Tokutek has become the first that I am aware of to explicitly describe its technology as NewSQL). As Derek Stainer notes, however: ” In the end it’s just a name, a way to categorize a group of similar solutions.”
In the meantime, we have already noted the beginning for the end of NoSQL, and the lines are blurring to the point where we expect the terms NoSQL and NewSQL will become irrelevant as the focus turns to specific use cases.
The identification of specific adoption drivers and use cases is the focus of our forthcoming long-form report on NoSQL, NewSQL and beyond, from which the 451 Group reported cited above is excerpted.
The report contains an overview of the roots of NoSQL and profiles of the major NoSQL projects and vendors, as well as analysis of the drivers behind the development and adoption of NoSQL and NewSQL databases, the evolving role of data grid technologies, and associated use cases.
It will be available very soon from the Information Management and CAOS practices and we will also publish more details of the key drivers as we see them and our view of the current database landscape here.
April 6th, 2011 — Content management, eDiscovery, Storage
Two companies central to our coverage of information management are having their own particular – and distinct – issues with shareholders and equity analysts.
Autonomy has been having its run-ins with London’s equity analysts for some time. Not all of them, but a core and increasingly vocal group of them. Generally they regularly question a few things: how the company calculates organic growth of its core IDOL business; cash conversion; and why it hasn’t bought a company after saying it would do and raising £500m of convertible debt to help it do so, back in February 2010. We’re also weighed in on some of these issues.
Autonomy regularly takes on these doubters on its quarterly calls and also does the same during the quarter on its website, which is at least a refreshing change from companies that stay completely mute on such matters. However the answers are often very simplistic. In a post dated March 30, 2011 entitled, “How should we think about Autonomy’s penetration of its end markets, when we attempt to evaluate the opportunity for growth?” that most of the world’s top software companies OEM IDOL and thus are “building their future products with IDOL deeply embedded and paying Autonomy a royalty.” Are they? Autonomy doesn’t distinguish between its two main OEM product when it announces OEM deals, but there’s a big difference between OEMing IDOL and OEMing its document filters. And as we have discussed before we think a lot of the OEM deals are for the latter, rather than for IDOL itself, although we have no way of proving that, except to say that we speak regularly to these leading software vendors and they don’t appear to be using IDOL as their core search and classification engine nearly as widely as Autonomy claims. Ironically given what Autonomy does for a living, a fair bit of the to and fro on the site is semantic-related, e.g. discussion of what “early spring” or “Winter with snowdrops” scenarios mean in terms of the guidance given by the company to analysts. All will no doubt become clearer when it announces its Q1 results, due Thursday April 28.
Over at Iron Mountain, some dissident shareholders have been putting pressure on the company to take on board its slate of directors and eventually turn itself into a Real Estate Investment Trust (REIT), mainly for its beneficial tax status. We cover what used to be called the digital business – the back up and recovery, e-Discovery, archiving and other software that’s mostly been added via acquisitions over the past few years. But that doesn’t seem to hold any attraction to hedge fund Elliott Management, which owns just less than 5%. It was the company that put forward the slate of directors and advised the company to turn itself into a REIT and in general to focus on its core – non-digital – business. Elliott and even larger shareholder Davis Advisors (it owns a shade less than 20% of the outstanding shares) were annoyed when the company dropped a poison pill on March 23 to guard against a takeover. This week Elliott laid out its grievances in another letter to the board, urging it to reverse the poison pill and generally sit up and take notice of what it has to say.
It’s hard to tel where this will end, but it has already caused disruption to Iron Mountain’s business at a time when it is trying to get some of its digital units – notably e-Discovery – back in track after a very tough 2010. We’ll know if it’s had an effect on its Q1 performance when it announces its results, most likely int he last week of April. The shares, as is common with these sorts of investor challenges have enjoyed a strong run-up, and are currently at or around a 52-week high. The company’s annual shareholders meeting is coming up soon too. Although the date is not yet known, all shareholders on record as of April 12 will be allowed to vote at it. It could get quite lively.