Sizing the big data problem: ‘big data’ is the problem

Big data has been one of the big topics of the year in terms of client queries coming into The 451 Group, and one of the recurring questions (especially from vendors and investors) has been: “how big is the big data market?”

The only way to answer that is to ask another question: “what do you mean by ‘big data’?” We have mentioned before that the term is ill-defined, so it is essential to work out what an individual means when they use the term.

In our experience they usually mean one of two things:

  • Big data as a subset of overall data: specific volumes or classes of data that cannot be processed or analyzed by traditional approaches
  • Big data as a superset of the entire data management market, driven by the ever-increasing volume and complexity of data

Our perspective is that big data, if it means anything at all, represents a subset of overall data. However, it is not one that can be measurably defined by the size of the data volume. Specifically, as we recently articulated, we believe:

    “Big data is a term applied to data sets that are large, complex and dynamic (or a combination thereof) and for which there is a requirement to capture, manage and process the data set in its entirety, such that it is not possible to process the data using traditional software tools and analytic techniques within tolerable time frames.”

The confusion around the term big data also partly explains why we introduced the term “total data” to refer to a broader approach to data management, managing the storage and processing of all available data to deliver the necessary business intelligence.

The distinction is clearly important when it comes to sizing the potential opportunity. I recently came across a report from one of the big banks that put a figure on what it referred to as the “big data market”. However, they had used the superset definition.

The result was therefore not a calculation of the big data market, but a calculation of the total data management sector (although the method is in itself too simplistic for us to endorse the end result) since the approach taken was to add together the revenue estimates for all data management technologies – traditional and non-traditional.

.

Specifically, the bank had added up current market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.

In comparison, the big data market is clearly a lot smaller, and represents a subset of revenue from traditional and non-traditional data management technologies, with a leaning towards the non-traditional technologies.

It is important to note, however, that big data cannot be measurably defined by the technology used to store and process it. As we have recently seen, not every use case for Hadoop or a NoSQL database – for example – involves big data.

Clearly this is a market that is a lot smaller than the one calculated by the bank, and the calculation required is a lot more complicated. We know, for example, that Teradata generated revenue of $489m in its third quarter. How much of that was attributable to big data?

Answering that requires a stricter definition of big data than is currently in usage (by anyone). But as we have noted above, ‘big data’ cannot be defined by data volume, or the technology used to store or process it.

There’s a lot of talk about the “big data problem”. The biggest problem with big data, however, is that the term has not – and arguably cannot – be defined in any measurable way.

How big is the big data market? You may as well ask “how long is a piece of string?”

If we are to understand the opportunity for storing and processing big data sets then the industry needs to get much more specific about what it is that is being stored and processed, and what we are using to store and process it.

Tags: , , , ,

2 comments ↓

#1 Total Data: Size Doesn’t Matter | Splunk Blogs on 12.09.10 at 3:30 pm

[…] inspiration comes from two sources: 1. Aslett’s dissatisfaction with the nebulous term “Big Data” and 2. the Dutch “Total Football” (totaalvoetbal) system, pioneered in the […]

#2 srw on 12.09.10 at 7:19 pm

Beyond the [non] definition of the term, I see the Big Data concept used for a “new” type of massive information, presumably more subjective: text, networks (links, social, etc). Organization had been used data warehouse/analytics/mining tools for years (if not decades) but that information was more objective ($).

There is a lot of noise in this new branded market, as you say it speaks mainly about tools but in my [humble] experience it’s more a craftwork and when you push the tools to the limit they don’t respond as you expect (from the buzz).