Updated Data Platforms Map – January 2016

The January 2016 edition of the 451 Research Data Platforms Map is now available for download.

Initially designed to illustrate the complexity of the data platforms market, the latest version includes an updated index to help you navigate the complex array of current data platform providers.


There are numerous additions compared to the previous map, especially in the area of event/stream processing while we have also reconsidered our approach to Hadoop-as-a-service, narrowing it down to distinct Hadoop offerings rather than hosted Hadoop distributions.

We have also tried to clean up or approach to the convergence of Hadoop and search, although that remains a bit of a work in progress, to be honest. There’s also something in there for eagle-eyed Silicon Valley fans.

You can use this map to:

  • compare capabilities, offerings, and functionality.
  • understand where providers intersect and diverge.
  • identify shortlists of choices to suit enterprise needs.

The latest version of the map can be downloaded here.

Updated Database Landscape map – June 2013

I am planning on doing a major overhaul of this during the second half of the year, with a specific focus on the Hadoop sector, but in the interim, here’s the latest June 2013 update to our Database Landscape map.

Note: the latest update to the map is available here.

Updated Database Landscape map – February 2013

I have no real intention of turning our Database Landscape Map into something that is updated on a monthly basis, but there were a number of significant modifications and additions to the original December 2012 version that I wasn’t able to address for our January 2013 update for 451 clients.

So here, then, is the February 2013 version:

Note: the latest update to the map is available here.

Our 2013 Database survey is now live

451 Research’s 2013 Database survey is now live at http://bit.ly/451db13 investigating the current use of database technologies, including MySQL, NoSQL and NewSQL, as well as traditional relation and non-relational databases.

The aim of this survey is to identify trends in database usage, as well as changing attitudes to MySQL following its acquisition by Oracle, and the competitive dynamic between MySQL and other databases, including NoSQL and NewSQL technologies.

There are just 15 questions to answer, spread over five pages, and the entire survey should take less than ten minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due during Q2.

The full report will be available to 451 Research clients, while the results of the survey will also be made freely available via a
presentation at the Percona Live MySQL Conference and Expo in April.

Last year’s results have been viewed nearly 55,000 times on SlideShare so we are hoping for a good response to this year’s survey.

One of the most interesting aspects of a 2012 survey results was the extent to which MySQL users were testing and adopting PostgreSQL. Will that trend continue or accelerate in 2013? And what of the adoption of cloud-based database services such as Amazon RDS and Google Cloud SQL?

Are the new breed of NewSQL vendors having any impact on the relational database incumbents such as Oracle, Microsoft and IBM? And how is SAP HANA adoption driving interest in other in-memory databases such as VoltDB and MemSQL?

We will also be interested to see how well NoSQL databases fair in this year’s survey results. Last year MongoDB was the most popular, followed by Apache Cassandra/DataStax and Redis. Are these now making a bigger impact on the wider market, and what of Basho’s Riak, CouchDB, Neo4j, Couchbase et al?

Additionally, we have been tracking attitudes to Oracle’s ownership of MySQL since the deal to acquire Sun was announced. Have MySQL users’ attitudes towards Oracle improved or declined in the last 12 months, and what impact will the formation of the MariaDB Foundation have on MariaDB adoption?

We’re looking forward to analyzing the results and providing answers to these and other questions. Please help us to get the most representative result set by taking part in the survey at http://bit.ly/451db13

Database Landscape Map – December 2012

As previously mentioned, one of my most popular pieces of research while at 451 has been the database landscape graphic we produced for our NoSQL, NewSQL and Beyond report.

I recently published an updated version but noted that there were a group of database vendors that had emerged in 2012 that didn’t easily fit into the segments we’d created.

In order to address that I went back to the drawing board and, taking inspiration from London Underground and The Real Story Group, set about mapping the connections between the various players in the database space.

Note: the latest update to the map is available here.

I’ll be honest – I’m not convinced that this is as practically useful as the original, although I believe it is more accurate and it was an exhausting interesting exercise to put it together.

If anyone spots any glaring omissions or errors please keep them to yourself let us know. Additionally, the image is also available on posters, mugs, t-shirts and mouse pads, for a small fee 🙂

Of course, if you’re looking for some perspective on what this all means, I can recommend one of our highly competitive subscription packages

Updated database landscape graphic

One of the most popular pieces I have produced since joining 451 is not a research report or presentation but the database landscape graphic that accompanied our NoSQL, NewSQL and Beyond report.

We’ve seen it crop up in other presentations and websites – sometimes even with attribution 😉

We actually updated the image to accompany our more recent report MySQL vs. NoSQL and NewSQL: 2011-2015 but I realised that I haven’t made that newer version more generally available. So here it is:

We wouldn’t claim it to be perfect. There’s a whole new breed of data platform-as-a-service providers that have emerged in recent months that will need to be added, if we can find space for them.

Meanwhile there are a group of database vendors that have also emerged that don’t easily fit into the segments we’ve created: companies like Drawn to Scale, FoundationDB, Aerospike and Splice Machine.

But since the original graphic continues to be popular, I thought I’d share the latest iteration as well. Any feedback always welcome

Data cloud, datastructure, and the end of the EDW

There have been a spate of reports and blog posts recently postulating about the potential demise of the enterprise data warehouse (EDW) in the light of big data and evolving approaches to data management.

There are a number of connected themes that have led the likes of Colin White and Barry Devlin to ponder the future of the EDW, and as it happens I’ll be talking about these during our 451 Client event in San Francisco on Wednesday.

While my presentation doesn’t speak directly to the future of the EDW, it does cover the trends that are driving the reconsideration of the assumption that the EDW is, and should be, the central source of business intelligence in the enterprise.

As Colin points out, this is an assumption based on historical deficiencies with alternative data sources that evolved into best practices. “Although BI and BPM applications typically process data in a data warehouse, this is only because of… issues… concerning direct access [to] business transaction data. If these issues could be resolved then there would be no need for a data warehouse.”

The massive improvements in processing performance seen since the advent of data warehousing means that it is now more practical to process data where it resides, or is generated rather than forcing data to be held in a central data warehouse.

For example, while distributed caching was initially adopted to improve the performance of Web and financial applications, it also provides an opportunity to perform real-time analytics on application performance and user behaviour (enabling targeted ads for example) long before the data get anywhere near the data warehouse.

While the central EDW approach has some advantages for data control, security and reliability, this has always been more theoretical than practical, as there is the need for regional and departmental data marts, and users continue to use local copies of data.

As we put it in last year’s Data Warehousing 2009-2013 report:

“The approach of many users now is not to stop those distributed systems from being created, but rather to ensure that they can be managed according to the same data-quality and security rules as the EDW.

With the application of cloud computing capabilities to on-premises infrastructure, users now have the promise of distributed pools of enterprise data that marry central management with distributed use and control, empowering business users to create elastic and temporary data marts without the risk of data-mart proliferation.”

The concept of the “data cloud” is nascent, but companies such as eBay are pushing in that direction, while also making use of data storage and processing technologies above and beyond traditional databases.

Hadoop is a prime example, but so too are the infrastructure components that are generating vast amounts of data that can be used by the enterprise to better understand how the infrastructure is helping or hindering the business in responding to changing demands.

For the 451 client event we have come up with the term ‘datastruture’ to describe these infrastructure elements. What is ‘datastructure’? It’s the machines that are responsible for generating machine-generated data.

While that may sound like we’ve just slapped a new label on existing technology we believe that those data-generating machines will evolve over time to take advantage of improved available processing power with embedded data analytics capabilities.

Just as in-database analytics has enabled users to reduce data processing latency by taking the analytics to the data in the database, it seems likely that users will look to do the same for machine-generated data by taking the analytics to the data in the ‘datastructure’.

This ‘datastructure’ with embedded database and analytics capabilties therefore becomes part of the wider ‘data cloud’, alongside regional and departmental data marts, and the central business application data warehouse, as well as the ability to spin up and provision virtual data marts.

As Barry Devlin puts it: “A single logical storehouse is required with both a well-defined, consistent and integrated physical core and a loose federation of data whose diversity, timeliness and even inconsistency is valued.”

Making this work will require new data cloud management capabilities, as well as an approach to data management that we have called “total data”. As we previously explained:

“Total data is about more than data volumes. It’s about taking a broad view of available data sources and processing and analytic technologies, bringing in data from multiple sources, and having the flexibility to respond to changing business requirements…

Total data involves processing any data that might be applicable to the query at hand, whether that data is structured or unstructured, and whether it resides in the data warehouse, or a distributed Hadoop file system, or archived systems, or any operational data source – SQL or NoSQL – and whether it is on-premises or in the cloud.”

As for the end of the EDW, both Colin and Barry argue, and I agree, that what we are seeing does not portend the end of the EDW but recognition that the EDW is a component of business intelligence, rather than the source of all business intelligence itself.

NoSQL – consolidating and proliferating in 2011

Among the numerous prediction pieces during the rounds at the moment, Bradford Stephens, founder of Drawn to Scale suggested we could be in for continued proliferation of NoSQL database technologies in 2011, while Redmonk’s Stephen O’Grady predicted consolidation. I agree with both of them.

To understand how NoSQL could both proliferate and consolidate in 2011 it’s important to look at the small print. Bradford was talking specifically about open source tools, while Stephen was writing about commercially successful projects.

Given the levels of interest in NoSQL database technologies, the vast array of use cases, and the various interfaces and development languages – most of which are open source – I predict we’ll continue to see cross-pollination and the emergence of new projects as developers (corporate and individual) continue to scratch their own data-based itches.

However, I think we are also beginning to see the a narrowing of the commercial focus on those projects and companies that have enough traction to generate significant business opportunities and revenue, and that a few clear leaders will emerge in the various NoSQL sub-categories (key-value stores, document stores, graph databases and distributed column stores).

We can see previous evidence of the dual impact of proliferation and consolidation in the Linux market. While commercial opportunities are dominated by Red Hat, Novell and Canonical, that has not stopped the continued proliferation of Linux distributions.

The main difference between NoSQL and Linux markets, of course, is that the various Linux distributions all have a common core, and the diversity in the NoSQL space means that we are unlikely to see proliferation on the scale of Linux.

However, I think we’ll see a similar two-tier market emerge with a large number of technically interesting and differentiated open source projects, and a small number of commercially-viable general-purpose category leaders.

Webinar: navigating the changing landscape of open source databases

When we published our 2008 report on the impact of open source on the database market the overall conclusion was that adoption had been widespread but shallow.

Since then we’ve seen increased adoption of open source software, as well as the acquisition of MySQL by Oracle. Perhaps the most significant shift in the market since early 2008 has been the explosion in the number of open source database and data management projects, including the various NoSQL data stores, and of course Hadoop and its associated projects.

On Tuesday, November 9, 2010 at 11:00 am EST I’ll be joining Robin Schumacher, Director of Product Strategy from EnterpriseDB to present a webinar on navigating the changing landscape of open source databases.

Among the topics to be discussed are:

· the needs of organizations with hybrid mixed-workload environments

· how to choose the right tool for the job

· the involvement of user corporations (for better or for worse) in open source projects today.

You can find further details about the event and register here.

Sizing the data warehousing opportunity

The data warehousing market will see a compound annual growth rate of 11.5% from 2009 through 2013 to reach a total of $13.2bn in revenues.

That is the main finding highlighted by the latest report from The 451 Group’s Information Management practice, which provides market-sizing information for the data-warehousing sector from 2009 to 2013.

The report includes revenue estimates and growth projections, and examines the business and technology trends driving the market.

It was put together with the assistance of Market Monitor – the new market-sizing service from The 451 Group and Tier1 Research. Props to Greg Zwakman and Elizabeth Nelson for their number-crunching.

Among the key findings, available via the executive summary (PDF), are:

  • Four vendors dominate the data-warehouse market, with 93.6% of total revenue in 2010. These vendors are expected to retain their advantage and generate 92.2% of revenue in 2013.
  • Analytic databases are now able to take advantage of greater processor performance at a lower cost, improving price/performance and lowering barriers to entry.
  • With the application of cloud capabilities, users now have the promise of pools of enterprise data that marry central management with distributed use and control.
  • Products that take advantage of improved hardware performance will drive revenue growth for all vendors, and will protect the market share of incumbents.
  • As a result of systems performance improvements, data-warehousing vendors are also taking advantage of the opportunity to bring more advanced analytic capabilities to the DB engine.
  • Although we expect many smaller vendors to grow at a much faster rate between now and 2013, it will not be at the expense of the market’s dominant vendors.
  • While the Hadoop Core is not a direct alternative to traditional analytic DBs, the increased maturity of associated projects means that use cases for Hadoop- and MapReduce-enabled analytic DBs will overlap.

There is, of course, much more detail in the full report. 451 Group clients can download the report here, while non-clients can also use the same link to purchase the report, or request more information.