June 19th, 2012 — Data management
Platfora’s CEO Ben Werther recently wrote a great post explaining the benefits of Apache Hadoop and its potential to play a major role in a modern-day equivalent of the industrial revolution.
Ben highlights one of the important aspects of our Total Data concept, that generating value from data is about more than just the volume, variety, and velocity of ‘big data’, but also the way in which the user wants to interact with their data.
“What has changed – the heart of the ‘big data’ shift – is only peripherally about the volume of data. Companies are realizing that there is surprising value locked up in their data, but in unanticipated ways that will only emerge down the road.”
He also rightly points out that while Hadoop provides what is fast-becoming the platform of choice for storing all of this data, from an industrial revolution perspective we are still reliant on the equivalent of expert blacksmiths to make sense of all the data.
“Since every company of any scale is going to need to leverage big data, as an industry we either need to train up hundreds of thousands of expert blacksmiths (aka data scientists) or find a way into the industrialized world (aka better tools and technology that dramatically lower the bar to harnessing big data).”
This is a point that Cloudera CEO Mike Olson has been making in recent months. As he stated during his presentation at last month’s OSBC: “we need to see a new class of applications that exploit the benefits and architecture of Hadoop.”
There has been a tremendous amount of effort in the past 12-18 months to integrate Hadoop into the existing data management landscape, via the development of uni- and bi-directional connectors and translators that enable the co-existence of Hadoop with existing relational and non-relational databases and SQL analytics and reporting tools.
This is extremely valuable – especially for enterprises with a heavy investment in SQL tools and skills. As Larry Feinsmith, Managing Director, Office of the CIO, JPMorgan Chase pointed out at last year’s Hadoop World: “it is vitally important that new big data tools integrate with existing products and tools”.
This is why ‘dependency’ (on existing tools/skills) is an integral element of the Total Data concept alongside totality, exploration and frequency.
However, this integration of Hadoop into the established data management market really only gets the industry so far, and in doing-so maintains the SQL-centric view of the world that has dominated for decades.
As Ben suggests, the true start of the ‘industrial revolution’ will begin with the delivery of tools that are specifically designed to take advantage of Hadoop and other technologies and that bring the benefits of big data to the masses.
We are just beginning to see the delivery of these tools and to think beyond the SQL-centric perspective with analytics approaches specifically designed to take advantage of MapReduce and/or the Hadoop Distributed File System. This again though, signals only the end of the beginning of the revolution.
‘Big data’ describes the realization of greater business intelligence by storing, processing and analyzing data that was previously ignored due to the limitations of traditional data management technologies.
The true impact of ‘big data’ will only be realised once people and companies begin to change their behaviour, using this greater business intelligence gained from using tools specifically designed to exploit the benefits and architecture of Hadoop and other emerging data processing technologies, to alter business processes and practices.
May 14th, 2012 — Data management
The initial focus of ‘big data’ has been about its increasing volume, velocity and variety — the “three Vs” — with little mention of real world application. Now is the time to get down to business.
On Wednesday, May 30, at 9am PT I’ll be taking part in a webinar with Splunk to discuss real world successes with ‘big data’.
451 Research believes that in order to deliver value from ‘big data’, businesses need to look beyond the nature of the data and re-assess the technologies, processes and policies they use to engage with that data.
I will outline 451 Research’s ‘total data’ concept for delivering business value from ‘big data’, providing examples of how companies are seeking agile new data management technologies, business strategies and analytical approaches to turn the “three Vs” of data into actionable operational intelligence.
I’ll be joined by Sanjay Mehta, Vice President of Product Marketing at Splunk, which was founded specifically to focus on the opportunity of effectively getting value from massive and ever changing amounts of machine-generated data, one of the fastest growing and most complex segments of ‘big data’.
Sanjay will share big data achievements from three Splunk customers, Groupon, Intuit and CenturyLink. Using Splunk, these companies are turning massive volumes of unstructured and semi-structured machine data into powerful insights.
Register here.
May 8th, 2012 — Data management
April 3rd, 2012 — Data management
Earlier today I presented a ‘Big Data’ Survival Guide at our HCTSEU event in London. The presentation was in effect a 10-step guide to surviving the ‘big data’ deluge.
Here’s a taster of what was discussed:
1. There’s no such thing as “big” data.
Or, more to the point: The problem is not “big” data – it’s more data. The increased use of interactive applications and websites – as well as sensors, meters and other data-generating machines – has increased the volume, velocity and variety of data to store and process.
2. ‘Big Data’ has the potential to revolutionize the IT industry.
Here we are talking less about the three Vs of big data and more about ‘big data’ as a concept, which describes the realization of greater business intelligence by storing, processing and analyzing that increased volume, velocity and variety of data. It can be summed up by the statement from Google’s The Unreasonable Effectiveness of Data that “Simple models and a lot of data trump more elaborate models based on less data”
3. Never use the term ‘big data’ when ‘data’ will do.
“Big Data” is nearing/at/over the hype peak. Be cautious about how you use it. “Big Data” and technologies like Hadoop will eventually become subsumed into the fabric of the IT industry and will simply become part of the way we do business.
4. (It’s not how big it is) It’s what you do with it that counts.
Generating value from data is about more than just the volume, variety, and velocity of data. The adoption of non-traditional data processing technologies is driven not just by the nature of the data, but also by the user’s particular data processing requirements. That is the essence of our Total Data management concept, which builds on the three Vs to also assess Totality, Exploration, Frequency and Dependency, which can be explained via:
5. All data has potential value.
Totality: The desire to process and analyze data in its entirety, rather than analyzing a sample of data and extrapolating the results.
6. You may have to search for it.
Exploration: The interest in exploratory analytic approaches, in which schema is defined in response to the nature of the query.
7. Time is of the essence.
Frequency: The desire to increase the rate of analysis to generate more accurate and timely business intelligence.
8. Make the most of what you have.
Dependency: The need to balance investment in existing technologies and skills with the adoption of new techniques.
9. Choose the right tool for the job.
There is no shortcut to determining which is the best technology to deploy for a particular workload. Several companies have developed their own approaches to solving this problem, which does provide some general guidance.
10. If your data is “big” the way you manage it should be “total”.
Everything I talked about in the presentation, including examples from eBay, Orbitz, Expedia, Vestas Wind Systems, and Disney (and several others) that I did not have space to address in this post, is included in our Total Data report. It examines the trends behind ‘big data’, explains the new and existing technologies used to store and process and deliver value from data, and outlines a Total Data management approach focused on selecting the most appropriate data storage and processing technology to deliver value from big data.
March 22nd, 2012 — Data management
Oracle reports Q3. EMC acquires Pivotal Labs. ClearStoty launches. And much, much more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Oracle Reports Q3 GAAP EPS Up 20% to 49 Cents; Q3 Non-GAAP EPS Up 15% to 62 Cents Database and middleware revenue up 10%.
* EMC Goes Social, Open and Agile With Big Data EMC acquires Pivotal Labs, plans to release Chorus as an open source project
* ClearStory Data Launches With Investment From Google Ventures, Andreessen Horowitz and Khosla Ventures
* HP Lead Big Data Exec Chris Lynch Resigns
* “Hortonworks Names Ari Zilka Chief Products Officer
* DataStax Enterprise 2.0 Adds Enterprise Search Capabilities to Smart Big Data Platform
* MapR Unveils Most Comprehensive Data Connection Options for Hadoop
* New Web-Based Alpine Illuminator Integrates with EMC Greenplum Chorus, The Social Data Science Platform
* RainStor and IBM InfoSphere BigInsights to Address Growing Big Data Challenges
* IBM Introduces New Predictive Analytics Services and Software to Reduce Fraud, Manage Financial Performance and Deliver Next Best Action
* Datameer Releases Major New Version of Analytics Platform
* Kognitio Announces Formation of “Kognitio Cloud” Business Unit
* HStreaming Announces Free Community Edition of Its Real-Time Analytics Platform for Hadoop
* Talend and MapR Announce Certification of Big Data Integration and Big Data Quality
* Schooner Information Technology Releases Membrain 4.0
* Gazzang Launches Big Data Encryption and Key Management Platform
* Logicworks Solves Big Data Hosting Challenges With New Infrastructure Services for Hadoop
* “Big Data” Among Most Confusing Tech Buzzwords
* For 451 Research clients
# Infochimps launches Chef-based platform for Hadoop deployment Impact Report
# Big-data security, or SIEM buzzword parity? Spotlight report
# DataStax adds enterprise search and elastic reprovisioning to database platform Market Development report
# With a new CEO and IBM as a reseller, Revolution Analytics charts next growth phase Market Development report
# Cray branches out, offering storage and a ‘big data’ appliance Market Development report
# CodeFutures sees a future beyond database sharding Market Development report
# Third time lucky for ScaleOut StateServer 5.0? Market Development report
# Attunity looks to 2012 for turnaround; up to the cloud and ‘big data’ movement Market Development report
# Panorama rides Microsoft’s coattails into in-memory social BI using SQL Server 2012 Market Development report
And that’s the Data Day, today.
February 22nd, 2012 — Data management
In late 2010 I published a post discussing the problems associated with trying to size the ‘big data’ market based on a lack of clarity on the definition of the term and what technologies it applies to.
In that post we discussed a 2010 Bank of America Merrill Lynch report that estimated that ‘big data’ represented a total addressable market worth $64bn. This week Wikibon estimated that the big data market stands at just over $5bn in factory revenue growing to over $50bn by 2017, while Deloitte estimated that industry revenues will likely be in the range of $1-1.5bn this year.
To put that in perspective, Bank of America Merrill Lynch estimated that the total addressable market for ‘big data’ in 2010 was this
Wikibon estimates that the ‘big data’ market in 2012 is this
and Deloite estimates that the ‘big data’ market in 2012 is this
UPDATE – IDC has become the first of the big analyst vendors to break out its big data abacuses (abaci?). IDC thinks the ‘big data’ market in 2010 was $3.2bn. That’s this
Not surprisingly they came to their numbers by different means. BoA added up market estimates for database software, storage and servers for databases, BI and analytics software, data integration, master data management, text analytics, database-related cloud revenue, complex event processing and NoSQL databases.
Wikibon came to its estimate by adding up revenue associated with a select group of technologies and a select group of vendors, while Deloitte added up revenue estimates for database, ERP and BI software, reduced the total by 90% to reflect the proportion of data warehouses with more than five terabytes of data, and reduced that total by 80-85% to reflect the low level of current adoption.
IDC, meanwhile, went through a slightly tortuous route of defining the market based on the volume of data collected, OR deployments of ultra-high-speed messaging technology, OR rapidly growing data sets, AND the use of scale-out architecture, AND the use of two or more data types OR high-speed data sources.
There is something to be said for each of these definitions. But equally each can be easily dismissed. We previously described our issues with the all-inclusive nature of the BoA numbers, and while we find Wikibon’s process much more agreeable, some of the individual numbers they have come up with are highly questionable. Deloitte’s methodology is surreal, but defensible. IDC’s just illustrates the problem:
What this highlights is that the essential problem is the lack of definition for ‘big data’. As we stated in 2010: “The biggest problem with ‘big data’… is that the term has not been – and arguably cannot be – defined in any measurable way. How big is the ‘big data’ market? You may as well ask ‘how long is a piece of string?'”
January 27th, 2012 — Data management
January 27th, 2012 — Data management
451 Research yesterday announced that it has published its 2012 Previews report, an all-encompassing report highlighting the most disruptive and significant trends that our analysts expect to dominate and drive the enterprise IT industry agenda over the coming year.
The 93 page report provides an outlook and assessment across all 451 Research technology sectors and practice areas – including software infrastructure, cloud enablement, hosting, security, datacenter technologies, hardware, information management, mobility, networking and eco-efficient IT – with input from our team of 40+ analysts. The 2012 Previews report is available upon request here.
IM research director Simon Robinson has already provided a taster of our predictions as they relate to the information-centric landscape. Below I have outlined some of our core predictions related to the data-centric ecosystem:
The overall trend predicted for 2012 could best be described as the shifting focus from volume, velocity and velocity, to delivering value. Out concept of Total Data reflects the path from velocity and variety of information sources to the all-important endgame of deriving value from data. We expect to see increased interest in data integration and analytics technologies and approaches designed specifically to exploit the potential benefits of ‘big data’ and mainstream adoption of Hadoop and other new sources of data.
We also anticipate, and are beginning to see, increased focus on technologies that enable access to data in different storage platforms without requiring data movement. We believe there is an emerging role for what we are calling the ‘data hub‘ – an independent platform that is responsible for managing access to data on the various data storage and processing technologies.
Increased understanding of the value of analytics will also increase interest in the integration of analytics into operational applications. Embedded analytics is nothing new, but has the potential to achieve mainstream adoption this year as the dominant purveyors of applications used to run operations are increasingly focused on serving up embedded analytics as a key component within their product portfolios. Equally importantly, many of them now have database platforms capable of uniting previously disparate technologies to deliver true embedded analysis.
There has been a growing recognition over the past year or so that any type of data management project – whether focused on master data management (MDM), data or application integration, or data quality – needs to bring real benefits to business processes. Some may see this assertion as obvious and pretty easy to achieve, but that’s not necessarily the case. However, it is likely to become more so in the next 12-18 months as companies realize a process-driven approach to most data management programs makes sense and vendors deliver capabilities to meet this demand.
While ‘big data’ presents a number of opportunities, it also poses many challenges, not the least of which is the lack of developers, managers, analysts and scientists with analytics skills. The users and investors placing a bet on the opportunities offered by new data management products are unlikely to be laughing if it turns out that they cannot employ people to deploy, manage and run those products, or analysts to make sense of the data they produce. It is not surprising that, therefore, the vendors that supply those technologies are investing in ensuring that there is a competent workforce to support existing and new projects.
Finally, while cloud computing may be one of the technology industry’s hot topics, it has had relatively little impact on the data management sector to date. That is not to say that databases are not available on cloud computing platforms, but we must make a distinction between databases that are deployed in public clouds, and ‘cloud databases‘ that have the potential to fulfil the role of emerging databases in building private and hybrid clouds. The former have been available for many years. The latter are just beginning to come to fruition based on NoSQL databases, as well as a new breed of NewSQL relational databases, designed to meet the performance, scalability and flexibility needs of large-scale data processing.
451 Research clients can get more details of these specific predictions via our 2012 preview – Information Management, Part 2. Non-clients can apply for trial access at the same link, while the entire 2012 Previews report is available here.
Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…
January 27th, 2012 — Archiving, Collaboration, Content management, Data management, eDiscovery, Search, Text analysis
Every New Year affords us the opportunity to dust down our collective crystal balls and predict what we think will be the key trends and technologies dominating our respective coverage areas over the coming 12 months.We at 451 Research just published our 2012 Preview report; at almost 100 pages it’s a monster, but offers some great insights across twelve technology subsectors, spanning from managed hosting and the future of cloud to the emergence of software-defined networking and solid state storage; and everything in between. The report is available to both 451Research clients and non-clients (in return for a few details); access the landing page
here. There’s a press release of highlights
here. Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…
Here are a selection of key takeaways from the first part of the Information Management preview, which focuses on information governance, ediscovery, search, collaboration and file sharing. (Matt Aslett will be posting highlights of part 2, which focuses more on data management and analytics, shortly.)
- One of the most obvious common themes that will continue to influence technology spending decisions in the coming year is the impact of continued explosive data and information growth. This continues to shape new legal frameworks and technology stacks around information governance and e-discovery, as well as to drive a new breed of applications growing up around what we term the ‘Total Data’ landscape.
- Data volumes and distributed data drive the need for more automation and auto-classification capabilities will continue to emerge more successfully in e-discovery, information governance and data protection veins — indeed, we expect to see more intersection between these, as we noted in a recent post.
- The maturing of the cloud model – especially as it relates to file sharing and collaboration, but also from a more structured database perspective – will drive new opportunities and challenges for IT professionals in the coming year. Looks like 2012 may be the year of ‘Dropbox for the enterprise.’
- One of the big emerging issues that rose to the fore in 2011, and is bound to get more attention as the New Year proceeds, is around the dearth of IT and business skills in some of these areas, without which the industry at large will struggle to harness and truly exploit the attendant opportunities.
- The changes in information management in recent years have encouraged (or forced) collaboration between IT departments, as well as between IT and other functions. Although this highlights that many of the issues here are as much about people and processes as they are about technology, the organizations able to leap ahead in 2012 will be those that can most effectively manage the interaction of all three.
- We also see more movement of underlying information management infrastructures into the applications arena. This is true with search-based applications, as well as in the Web-experience management vein, which moves beyond pure Web content management. And while Microsoft SharePoint continues to gain adoption as a base layer of content-management infrastructure, there is also growth in the ISV community that can extend SharePoint into different areas at the application-level.
There is a lot more in the report about proposed changes in the e-discovery arena, advances of the cloud, enterprise search and impact of mobile devices and bring-your-device-to-work on information management.
January 24th, 2012 — Data management