The Data Day: October 12, 2018

MongoDB acquires mLab. And more.

And that’s the Data Day, today.

The Data Day: October 5, 2018

Cloudera and Hortonworks agree acquisition deal. And more.

And that’s the Data Day, today.

The Data Day: October 6, 2017

Why Isn’t the Senate Intel Committee looking into the fake data in OUR country to see why so much of our analytics is just made up-FAKE!

And that’s the data day, today.

The Data Day: September 29, 2017

Even Usain Bolt from Jamaica, one of the greatest runners and athletes of all time, showed RESPECT for our data and analytics!

And that’s the data day, today.

Hadoop: why enterprises need something to aspire to

Merv Adrian wrote a blog boast recently bemoaning the “aspirational marketing” that surrounds Hadoop, in particular the fact that current deployments are a long way from delivering on the vision.

While I completely agree that many enterprises are struggling to translate tactical use-cases into the business use-cases required to drive more strategic adoption beyond the proof of concept stage, I don’t think that aspirational marketing around Hadoop is necessarily a bad thing.

It is certainly true that part of the problem lies in clearly understanding how Hadoop can be used as a complement to traditional relational database technologies deployed as an enterprise data warehouse.

That is why we recently asked Is Hadoop a planet? – comparing confusion around Hadoop’s classification to that of Pluto – while also describing Hadoop as a framework in search of a metaphor.

Given the confusion, however, I believe it is incumbent on Hadoop providers to describe not just the functional use-cases that are driving tactical adoption, but also the bigger vision that will drive more strategic adoption.

The data management industry has become accustomed to thinking about the storage, processing and analysis of data in analytical databases as akin to warehousing, to the extent that the phrase ‘data warehouse’ no longer requires an explanation.

We believe that a good understanding of the potential strategic role of Hadoop, even if it is only aspirational at this stage, will be important in encouraging broader and deeper adoption of Hadoop.

In addition, it is not as if there are no enterprises deploying Hadoop more strategically. Cloudera estimates that about 20% of its 300 subscription customers are already deploying Hadoop as what it calls an Enterprise Data Hub.

I’m not personally convinced that Enterprise Data Hub is really the right term, (not least since we previously used the term Data Hub in a slightly different context). Other potential terms include data lake and data refinery.

Although the latter better describes Hadoop’s role in aggregating and processing data and the industrial-scale processes used to make data more acceptable for different analytic use-cases, it appears to have quickly passed out of fashion compared to the former.

I have begun using the term ‘data treatment plant’ as a combination of the two concepts to describe how Hadoop can be used as a single ‘logical’ unified data platform into which you simply poor data, while industrial-scale processes – the multiple data processing and analytic engines that will be supported by Hadoop 2.0: such as MapReduce, streaming processing, SQL and NoSQL – are used to make data more acceptable for a desired end-use.

451 clients can get more detail on the ‘data treatment plant’ and why we believe a bit of aspirational marketing may not be a bad thing for Hadoop, from our recent report, Hadoop: a framework in search of a metaphor.

Previewing data management and analytics in 2012

451 Research yesterday announced that it has published its 2012 Previews report, an all-encompassing report highlighting the most disruptive and significant trends that our analysts expect to dominate and drive the enterprise IT industry agenda over the coming year.

The 93 page report provides an outlook and assessment across all 451 Research technology sectors and practice areas – including software infrastructure, cloud enablement, hosting, security, datacenter technologies, hardware, information management, mobility, networking and eco-efficient IT – with input from our team of 40+ analysts. The 2012 Previews report is available upon request here.

IM research director Simon Robinson has already provided a taster of our predictions as they relate to the information-centric landscape. Below I have outlined some of our core predictions related to the data-centric ecosystem:

The overall trend predicted for 2012 could best be described as the shifting focus from volume, velocity and velocity, to delivering value. Out concept of Total Data reflects the path from velocity and variety of information sources to the all-important endgame of deriving value from data. We expect to see increased interest in data integration and analytics technologies and approaches designed specifically to exploit the potential benefits of ‘big data’ and mainstream adoption of Hadoop and other new sources of data.

We also anticipate, and are beginning to see, increased focus on technologies that enable access to data in different storage platforms without requiring data movement. We believe there is an emerging role for what we are calling the ‘data hub‘ – an independent platform that is responsible for managing access to data on the various data storage and processing technologies.

Increased understanding of the value of analytics will also increase interest in the integration of analytics into operational applications. Embedded analytics is nothing new, but has the potential to achieve mainstream adoption this year as the dominant purveyors of applications used to run operations are increasingly focused on serving up embedded analytics as a key component within their product portfolios. Equally importantly, many of them now have database platforms capable of uniting previously disparate technologies to deliver true embedded analysis.

There has been a growing recognition over the past year or so that any type of data management project – whether focused on master data management (MDM), data or application integration, or data quality – needs to bring real benefits to business processes. Some may see this assertion as obvious and pretty easy to achieve, but that’s not necessarily the case. However, it is likely to become more so in the next 12-18 months as companies realize a process-driven approach to most data management programs makes sense and vendors deliver capabilities to meet this demand.

While ‘big data’ presents a number of opportunities, it also poses many challenges, not the least of which is the lack of developers, managers, analysts and scientists with analytics skills. The users and investors placing a bet on the opportunities offered by new data management products are unlikely to be laughing if it turns out that they cannot employ people to deploy, manage and run those products, or analysts to make sense of the data they produce. It is not surprising that, therefore, the vendors that supply those technologies are investing in ensuring that there is a competent workforce to support existing and new projects.

Finally, while cloud computing may be one of the technology industry’s hot topics, it has had relatively little impact on the data management sector to date. That is not to say that databases are not available on cloud computing platforms, but we must make a distinction between databases that are deployed in public clouds, and ‘cloud databases‘ that have the potential to fulfil the role of emerging databases in building private and hybrid clouds. The former have been available for many years. The latter are just beginning to come to fruition based on NoSQL databases, as well as a new breed of NewSQL relational databases, designed to meet the performance, scalability and flexibility needs of large-scale data processing.

451 Research clients can get more details of these specific predictions via our 2012 preview – Information Management, Part 2. Non-clients can apply for trial access at the same link, while the entire 2012 Previews report is available here.

Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…