March 5th, 2013 — Data management
SQL and Hadoop: ascloseasthis. Splunk revenue up 64%. and more.
And that’s the data day, today.
February 28th, 2013 — Data management
Rackspace buys ObjectRocket. Intel delivers Hadoop distro. And more.
And that’s the data day, today.
February 22nd, 2013 — Data management
Aster Discovery. Delphix and SAP. Hadoop use-cases. And more.
And that’s the data day, today.
February 21st, 2013 — Data management
I am very pleased and honoured to have been asked to provide a keynote presentation at the inaugural Hadoop Summit Europe, which will be held in Amsterdam on March 20-21.
The title of my talk is “What is the point of Hadoop?” which isn’t as derogatory as it sounds. Our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:
- Big data storage: Hadoop as a system for storing large, unstructured, data sets
- Big data processing/integration: Hadoop as a data ingestion/ETL layer
- Big data analytics: Hadoop as a platform new new exploratory analytic applications
The flexibility of Apache Hadoop is one of its biggest assets – enabling businesses to generate value from data that was previously considered too expensive to be stored and processed in traditional databases – but also results in Hadoop meaning different things to different people.
As early adopters press ahead with innovative new analytic applications, many mainstream enterprises are are still scratching their heads trying to demonstrate Hadoop’s value. While it is very tempting to try and run before you can walk when you see others demonstrating the potential for Hadoop-based analytics it is my view that trying to jump ahead to Hadoop-based analytics without first understanding Hadoop’s storage and integration roles runs the risk of confusion and, potentially, disillusionment.
My keynote presentation at Hadoop Summit Europe will explore the impact that Hadoop is having on the traditional data processing landscape, examining the expanding ecosystem of vendors and their relationships with Apache Hadoop, exploring adoption trends around the world, and highlighting how an understanding of the roles Hadoop can play will be essential to helping Hadoop cross the chasm from early adopters to mainstream adoption.
Anyone interested in attending the event can get a 20% discount, using the registration code 13aslett20.
February 12th, 2013 — Data management
ClearStory sheds light on data analysis service. Illuminating ‘dark data’. More.
And that’s the data day, today.
February 8th, 2013 — Data management
Teradata results. Funding for DataXu. The chemistry of data. And more.
And that’s the data day, today.
December 13th, 2012 — Data management
451 Research’s Information Management practice has published its latest long-format report: Total Data Analytics. Written by Krishna Roy, Analyst, BI and Analytics, along with myself, it examines the impact of ‘big data’ on business intelligence and analytics.
The growing emphasis on ‘big data’ has focused unprecedented attention on the potential of enterprises to gain competitive advantage from their data, helping to drive adoption of BI/analytics beyond the retail, financial services, insurance and telecom sectors.
In 2011 we introduced the concept of ‘Total Data‘ to reflect the path from the volume, velocity and variety of big data to the all-important endgame of deriving maximum value from that data. Analytics plays a key role in deriving meaningful insight – and therefore, real-world business benefits – from Total Data.
In short, big data and Total Data are changing the face of the analytics market. Advanced analytics technologies are no longer the preserve of MBAs and ‘stats geeks,’ as line-of-business managers and others increasingly require this type of analysis to do their jobs.
Total Data Analytics outlines the key drivers in the analytics sector today and in the coming years, highlighting the technologies and vendors poised to shape a future of increased reliance on offerings that deliver on the promise of analyzing structured, semi-structured and unstructured data.
The report also takes a look at M&A activity in the analytics sector in 2012, as well as the history of investment funding involving Hadoop, NoSQL and Hadoop-based analytics specialists. It also contains a list of 40 vendors we believe have the greatest potential to shape the market in the coming years.
The report is available now to 451 Research clients, here. Non-clients can get more information and download an executive summary from the same link.
December 7th, 2012 — Data management
Cloudera raises $65m. HP launches Hadoop AppSystem. And more
And that’s the Data Day, today.
November 30th, 2012 — Data management
Dan Woods recently opined that Apache Hadoop has had a weird beginning thanks to its “Three Headed Open Core” model and warned that there is a danger than it will fragment – à la Unix – thanks to competing commercial forces.
There are a couple of points to address here. The first is the assumption that the vendor community developing Hadoop is in some way ‘weird’. Not for those of us that have studied the evolution of open source-related business strategies it isn’t.
In fact, Hadoop’s multi-vendor community is a prime example of the corporate-dominated development communities we saw emerging as the fourth stage of commercial open source back in 2010.
Some people still have trouble understanding, as I wrote two years ago, that
being successful is about sharing your code development with the competition via multi-vendor open source projects in order to benefit from improved code quality and lower research and development costs for non-differentiating features AND beating your competition with proprietary complementary technologies.
This isn’t weird. I firmly believe in the not-too-distant future this will be seen as entirely normal.
Another issue to address is the suggestion that these competing vendors pose a danger to the core project. In the blog linked above I argued that the contrary is true: comparing the various competing players in collaborative communities as having a similar impact on the development of a project as various competing factors – climate, habitat, existence or dearth of predators etc – do in Darwin’s evolutionary process: i.e. making it stronger.
I would be much more concerned about the potential fragmentation of Hadoop if we were looking at four or five different competing implementations of Google’s MapReduce and file system research. Instead, you could compare the differentiating features that Cloudera, Hortonworks, MapR, IBM and EMC have introduced to the result of natural selection based on a need to evolve to certain conditions.
So long as there remains a single core Apache Hadoop project upon which these differentiating features are based I believe Hadoop will not only survive, but will thrive. If I may quote myself again: “As long as they continue to collaborate on the non-differentiating code, the project should benefit from being stretched in multiple directions.”
I believe that, as with Linux, the vendors involved have learned the lessons of the Unix wars and understand that it is in their best interests – let alone everyone else’s – not to repeat them.
Another key point when we look at the Hadoop ecosystem is that we see multiple vendors building on others’ differentiating features and often supporting multiple distributions. It’s not a case of a herd of individually differentiated Hadoops, but more like a stack of Russian Hadoop dolls.
To my mind there are (currently) eight main Hadoop business strategies, each of which has the potential to build on those before it:
Hadoop distributors
e.g. Cloudera, Hortonworks, MapR, EMC, IBM
Hadoop cloud services
e.g. Amazon EMR, Google Compute Engine
Hadoop-based deployment services
e.g. Infochimps, Metascale
Hadoop-based deployment stack/appliances
e.g. Zettaset, Oracle BDA, Dell
Hadoop-based development services
e.g. Continuuity, Mortar Data
Hadoop-based application stacks
e.g. NGDATA, Guavus
Hadoop-based database stacks
e.g. Drawn to Scale, Splice Machine
Hadoop-based analytic services
e.g. Treasure Data, Qubole
November 9th, 2012 — Data management
Funding for Neo, Elasticsearch and Hadapt. And more
And that’s the Data Day, today.