‘Big Data’ Survival Guide: A 10-step guide to surviving the ‘big data’ deluge

Earlier today I presented a ‘Big Data’ Survival Guide at our HCTSEU event in London. The presentation was in effect a 10-step guide to surviving the ‘big data’ deluge.

Here’s a taster of what was discussed:

1. There’s no such thing as “big” data.
Or, more to the point: The problem is not “big” data – it’s more data. The increased use of interactive applications and websites – as well as sensors, meters and other data-generating machines – has increased the volume, velocity and variety of data to store and process.

2. ‘Big Data’ has the potential to revolutionize the IT industry.
Here we are talking less about the three Vs of big data and more about ‘big data’ as a concept, which describes the realization of greater business intelligence by storing, processing and analyzing that increased volume, velocity and variety of data. It can be summed up by the statement from Google’s The Unreasonable Effectiveness of Data that “Simple models and a lot of data trump more elaborate models based on less data”

3. Never use the term ‘big data’ when ‘data’ will do.
“Big Data” is nearing/at/over the hype peak. Be cautious about how you use it. “Big Data” and technologies like Hadoop will eventually become subsumed into the fabric of the IT industry and will simply become part of the way we do business.

4. (It’s not how big it is) It’s what you do with it that counts.
Generating value from data is about more than just the volume, variety, and velocity of data. The adoption of non-traditional data processing technologies is driven not just by the nature of the data, but also by the user’s particular data processing requirements. That is the essence of our Total Data management concept, which builds on the three Vs to also assess Totality, Exploration, Frequency and Dependency, which can be explained via:

5. All data has potential value.
Totality: The desire to process and analyze data in its entirety, rather than analyzing a sample of data and extrapolating the results.

6. You may have to search for it.
Exploration: The interest in exploratory analytic approaches, in which schema is defined in response to the nature of the query.

7. Time is of the essence.
Frequency: The desire to increase the rate of analysis to generate more accurate and timely business intelligence.

8. Make the most of what you have.
Dependency: The need to balance investment in existing technologies and skills with the adoption of new techniques.

9. Choose the right tool for the job.
There is no shortcut to determining which is the best technology to deploy for a particular workload. Several companies have developed their own approaches to solving this problem, which does provide some general guidance.

10. If your data is “big” the way you manage it should be “total”.
Everything I talked about in the presentation, including examples from eBay, Orbitz, Expedia, Vestas Wind Systems, and Disney (and several others) that I did not have space to address in this post, is included in our Total Data report. It examines the trends behind ‘big data’, explains the new and existing technologies used to store and process and deliver value from data, and outlines a Total Data management approach focused on selecting the most appropriate data storage and processing technology to deliver value from big data.