Beyond ‘big data’

Alistair Croll published an interesting post this week entitled ‘there’s no such thing as big data’ in which he argued, prompted by a friend that “given how much traditional companies put [big data] to work, it might as well not exist.”

Tim O’Reilly continued the theme in his follow-up post, arguing:

“companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue”

There is much to agree with – in fact I have myself argued that when it comes to data, the key issue is not how much you have, but what you do with it. However, there is also a significant change of emphasis here from the underlying principles that have driven the interest in ‘big data’ in the last 12-18 months.

Compare Tim O’Reilly’s statement with the following, from Google’s seminal research paper The Unreasonable Effectiveness of Data:

“invariably, simple models and a lot of data trump more elaborate models based on less data”

While the two statements are not entirely contradictory, they do indicate a change in emphasis related to data. There has been so much emphasis of the ‘big’ in ‘big data’, as if the growing volume, variety and velocity of data itself would deliver improved business insights.

As I have argued in the introduction to our ‘total data’ management concept and the numerous presentations given on the subject this year, in order to deliver value from that data, you have to look beyond the nature of the data and consider what it is that the user wants to do with that data.

Specifically, we believe that one of the key factors in delivering value is companies focusing on storing and processing all of their data (or at least as much as is economically feasible) rather than analysing samples and extrapolating the results.

The other factor is time, and specifically how fast users can get to the results they are looking for. Another way of looking at this is in terms of the rate of query. Again, this is not about the nature of the data, but what the user wants to do with that data.

This focus on the rate of query has implications on the value of the data, as expressed in the following equation:

Value = (Volume ± Variety ± Velocity) x Totality/Time

The rate of query also has significant implications in terms of which technologies are deployed to store and process the data and to actually put the data to use in delivering business insight and value.

Getting back to the points made by Alistair and Tim in relation to the Unreasonable Effectiveness of Data, it would seem that to date there has been more focus on what Google referred to as “a lot of data”, and less on the “simple models” to deliver value from that data.

There is clearly a balance to be struck, and the answer lies not in ‘big data’ but “more clue” and defining and delivering those “simple models”.