Updated: sizing the big data problem: ‘big data’ is *still* the problem

In late 2010 I published a post discussing the problems associated with trying to size the ‘big data’ market based on a lack of clarity on the definition of the term and what technologies it applies to.

In that post we discussed a 2010 Bank of America Merrill Lynch report that estimated that ‘big data’ represented a total addressable market worth $64bn. This week Wikibon estimated that the big data market stands at just over $5bn in factory revenue growing to over $50bn by 2017, while Deloitte estimated that industry revenues will likely be in the range of $1-1.5bn this year.

To put that in perspective, Bank of America Merrill Lynch estimated that the total addressable market for ‘big data’ in 2010 was this

Wikibon estimates that the ‘big data’ market in 2012 is this

and Deloite estimates that the ‘big data’ market in 2012 is this

UPDATE – IDC has become the first of the big analyst vendors to break out its big data abacuses (abaci?). IDC thinks the ‘big data’ market in 2010 was $3.2bn. That’s this

Not surprisingly they came to their numbers by different means. BoA added up market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.

Wikibon came to its estimate by adding up revenue associated with a select group of technologies and a select group of vendors, while Deloitte added up revenue estimates for database, ERP and BI software, reduced the total by 90% to reflect the proportion of data warehouses with more than five terabytes of data, and reduced that total by 80-85% to reflect the low level of current adoption.

IDC, meanwhile, went through a slightly tortuous route of defining the market based on the volume of data collected, OR deployments of ultra-high-speed messaging technology, OR rapidly growing data sets, AND the use of scale-out architecture, AND the use of two or more data types OR high-speed data sources.

There is something to be said for each of these definitions. But equally each can be easily dismissed. We previously described our issues with the all-inclusive nature of the BoA numbers, and while we find Wikibon’s process much more agreeable, some of the individual numbers they have come up with are highly questionable. Deloitte’s methodology is surreal, but defensible. IDC’s just illustrates the problem:

What this highlights is that the essential problem is the lack of definition for ‘big data’. As we stated in 2010: “The biggest problem with ‘big data’… is that the term has not been – and arguably cannot be – defined in any measurable way. How big is the ‘big data’ market? You may as well ask ‘how long is a piece of string?'”


1 comment so far ↓

#1 Mark on 02.22.12 at 1:56 pm

I tried and failed to come up with an acceptable range for the analytic database market. Doing it as you described for something as nebulous as “big data” would be nearly impossible. The wikibon estimate was based on a poor understanding of the data and database markets, as well as some bad numbers, making it the most suspect of the bunch.