Entries from December 2010 ↓

Information management preview of 2011

Our clients will have seen our preview of 2011 last week. For those that aren’t (yet!) clients and therefore can’t see the whole 3,500-word report, here’s the introduction, followed by the titles of the sections to give you an idea of what we think will shape the information management market in 2011 and beyond. Of course the IT industry, like most others doesn’t rigorously follow the wiles of the Gregorian calendar, so some of these things will happen next year while others may not occur till 2012 and beyond. But happen they will, we believe.

We think information governance will play a more prominent role in 2011 and in the years beyond that. Specifically, we think master data management and data governance applications will appear in 2011 to replace the gaggle of spreadsheets, dashboards and scorecards commonly used today. Beyond that, we think information governance will evolve in the coming years, kick-started by end users who are asking for a more coherent way to manage their data, driven in part by their experience with the reactive and often chaotic nature of e-discovery.

In e-discovery itself, we expect to see a twin-track adoption trend. While cloud-based products have proven popular, at the same time, more enterprises buy e-discovery appliances.

‘Big data’ has become a bit of a catchall term to describe the masses of information being generated, but in 2011 we expect to see a shift to what we term a ‘total data’ approach to data management, as well as the analytics applications and tools that enable users to generate the business intelligence from their big data sets. Deeper down, the tools used in this process will include new BI tools to exploit Hadoop, as well as a push in predictive analytics beyond the statisticians and into finance, marketing and sales departments.

SharePoint 2010 may have come out in the year for which it is named, but its use will become truly widespread in 2011 as the first service pack is release and the ISV community around it completes their updates from SharePoint 2007. However, we don’t think cloud-based SharePoint will grow quite as fast as some people may expect. Finally, in the Web content management (WCM) market – so affected by SharePoint, as well as the open source movement – we expect a stratification between the everyday WCM-type scenario and Web experience management (WEM) for those organization that need to tie WCM, Web analytics, online marketing and commerce features together.

  • Governance family reunion: Information governance, meet governance, risk and compliance; meet data governance….
  • Master data management, data quality, data integration: the road to data governance
  • E-discovery post price war: affordable enough, or still too strategic to risk?
  • Data management – big, bigger, biggest
  • Putting the BI into big data in Hadoop
  • The business of predictive analytics
  • SharePoint 2010 gets real in 2011
  • WCM, WEM and stratification

And with that we’d like to wish all readers of Too Much Information a happy holiday season and a healthy and successful 2011.

Sizing the big data problem: ‘big data’ is the problem

Big data has been one of the big topics of the year in terms of client queries coming into The 451 Group, and one of the recurring questions (especially from vendors and investors) has been: “how big is the big data market?”

The only way to answer that is to ask another question: “what do you mean by ‘big data’?” We have mentioned before that the term is ill-defined, so it is essential to work out what an individual means when they use the term.

In our experience they usually mean one of two things:

  • Big data as a subset of overall data: specific volumes or classes of data that cannot be processed or analyzed by traditional approaches
  • Big data as a superset of the entire data management market, driven by the ever-increasing volume and complexity of data

Our perspective is that big data, if it means anything at all, represents a subset of overall data. However, it is not one that can be measurably defined by the size of the data volume. Specifically, as we recently articulated, we believe:

    “Big data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames.”

The confusion around the term big data also partly explains why we introduced the term “total data” to refer to a broader approach to data management, managing the storage and processing of all available data to deliver the necessary business intelligence.

The distinction is clearly important when it comes to sizing the potential opportunity. I recently came across a report from one of the big banks that put a figure on what it referred to as the “big data market”. However, they had used the superset definition.

The result was therefore not a calculation of the big data market, but a calculation of the total data management sector (although the method is in itself too simplistic for us to endorse the end result) since the approach taken was to add together the revenue estimates for all data management technologies – traditional and non-traditional.

.

Specifically, the bank had added up current market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.

In comparison, the big data market is clearly a lot smaller, and represents a subset of revenue from traditional and non-traditional data management technologies, with a leaning towards the non-traditional technologies.

It is important to note, however, that big data cannot be measurably defined by the technology used to store and process it. As we have recently seen, not every use case for Hadoop or a NoSQL database – for example – involves big data.

Clearly this is a market that is a lot smaller than the one calculated by the bank, and the calculation required is a lot more complicated. We know, for example, that Teradata generated revenue of $489m in its third quarter. How much of that was attributable to big data?

Answering that requires a stricter definition of big data than is currently in usage (by anyone). But as we have noted above, ‘big data’ cannot be defined by data volume, or the technology used to store or process it.

There’s a lot of talk about the “big data problem”. The biggest problem with big data, however, is that the term has not – and arguably cannot – be defined in any measurable way.

How big is the big data market? You may as well ask “how long is a piece of string?”

If we are to understand the opportunity for storing and processing big data sets then the industry needs to get much more specific about what it is that is being stored and processed, and what we are using to store and process it.

Total data: ‘bigger’ than big data

The 451 Group has recently published a spotlight report examining the trends that we see shaping the data management segment, including data volume, complexity, real-time processing demands and advanced analytics, as well as a perspective that no longer treats the enterprise data warehouse as the only source of trusted data for generating business intelligence.

The report examines these trends and introduces the term ‘total data’ to describe the total opportunity and challenge provided by new approaches to data management.


Johann Cruyff, exponent of total football, inspiration for total data. Source: Wikimedia. Attribution: Bundesarchiv, Bild 183-N0716-0314 / Mittelstädt, Rainer / CC-BY-SA

Total data is not simply another term for big data; it describes a broader approach to data management, managing the storage and processing of big data to deliver the necessary BI.

Total data involves processing any data that might be applicable to the query at hand, whether that data is structured or unstructured, and whether it resides in the data warehouse, or a distributed Hadoop file system, or archived systems, or any operational data source – SQL or NoSQL – and whether it is on-premises or in the cloud.

In the report we explain how total data is influencing modern data management with respect to four key trends. To summarize:

  • beyond big: total data is about processing all your data, regardless of the size of the data set
  • beyond data: total data is not just about being able to store data, but the delivery of actionable results based on analysis of that data
  • beyond the data warehouse: total data sees organisations complementing data warehousing with Hadoop, and its associated projects
  • beyond the database: total data includes the emergence of private data clouds, and the expansion of data sources suitable for analytics beyond the database

The term ‘total data’ is inspired ‘total football,’ the soccer tactic that emerged in the early 1970s and enabled Ajax of Amsterdam to dominate European football in the early part of the decade and The Netherlands to reach the finals of two consecutive World Cups, having failed to qualify for the four preceding competitions.

Unlike previous approaches that focused on each player having a fixed role to play, total football encouraged individual players to switch positions depending on what was happening around them while ensuring that the team as a whole fulfilled all the required tactical positions.

Although total data is not meant to be directly analogous to total football, we do see a connection with the latter’s fluidity that is enabled by no longer requiring players to fulfill specific roles, and total data’s desire to break down dependencies on the enterprise data warehouse as the single version of the truth, while letting go of assumptions that the relational database offers a one-size-fits-all answer to data management.

Total data is about more than data volumes. It’s about taking a broad view of available data sources and processing and analytic technologies, bringing in data from multiple sources, and having the flexibility to respond to changing business requirements.

A more substantial explanation of the concept of total data and its impact on information and infrastructure management methods and technologies is available here for 451 Group clients. Non-clients can also apply for trial access.