Storage — Too much information

The Data Day, A few days: November 15-21 2013

November 21st, 2013 — Data management

Storage and big data: what went wrong? And more

For 451 Research clients: Storage and the 'big data' opportunity: what went wrong? http://t.co/1AijYv9rkA By @simonrob451

— Matt Aslett (@maslett) November 21, 2013

For 451 clients: Actuate realigns portfolio, sales force, looks to boost BI linchpin with iHub refresh http://t.co/jjE8yPLG3f By Krishna Roy

— Matt Aslett (@maslett) November 18, 2013

For 451 Research clients: QlikTech looks to be a disruptive force in BI once again with QlikView.next http://t.co/knswiiC68j By Krishna Roy

— Matt Aslett (@maslett) November 20, 2013

For 451 Research clients: RapidMiner bags funding, restructures analysis wares for US push http://t.co/tTlPVZUUnA By Krishna Roy

— Matt Aslett (@maslett) November 21, 2013

For 451 Research clients: Boundless unleashes OpenGeo Suite 4.0 following spinoff http://t.co/mWzOSREsdu

— Matt Aslett (@maslett) November 20, 2013

For 451 clients: New wave databases in the cloud, part 4: GoGrid's one-button 'big data' products http://t.co/3NIcjuNcQv By @liameagle

— Matt Aslett (@maslett) November 20, 2013

Amazon Web Services announces Amazon Kinesis data stream processing service. http://t.co/7WNJLJ88in

— Matt Aslett (@maslett) November 18, 2013

Bime raises $4m series A financing. http://t.co/9xksAEAfVS

— Matt Aslett (@maslett) November 21, 2013

Concurrent introduces Cascading 2.5 http://t.co/yBuTOzGWnE with support for Hadoop2 and the GA of Cascading Lingual http://t.co/gxtCEYhUtm

— Matt Aslett (@maslett) November 19, 2013

HP launches version 7 of the HP Vertica Analytics Platform, aka HP Vertica Crane, including HP Vertica Flex Zone. http://t.co/hZtVIsvPhc

— Matt Aslett (@maslett) November 19, 2013

Pentaho has announced immediate availability of the Pentaho Community Edition 5.0. http://t.co/kHSn0rp0Kf

— Matt Aslett (@maslett) November 20, 2013

Datawatch announces Datawatch Desktop, with integrated Panopticon Designer visual data discovery software. http://t.co/moX4i6qpFK

— Matt Aslett (@maslett) November 21, 2013

EnterpriseDB releases Postgres Plus Advanced Server 9.3 and Postgres Enterprise Manager 4.0. http://t.co/KNpVlX88BP

— Matt Aslett (@maslett) November 19, 2013

FatCloud announces new Community Edition of FatDB. http://t.co/SQ4uGB4QI1

— Matt Aslett (@maslett) November 21, 2013

Cray launches a new "Big Data" framework to run Apache Hadoop on Cray XC30 supercomputers. http://t.co/2tlAnk13Jx

— Matt Aslett (@maslett) November 19, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: November 15-21 2013

7 Hadoop questions. Q7: Hadoop’s role

October 23rd, 2013 — Data management

What is the point of Hadoop? It’s a question we’ve asked a few times on this blog, and continues to be a significant question asked by users, investors and vendors about Apache Hadoop. That is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

As I explained during our keynote presentation at the inaugural Hadoop Summit Europe earlier this year, our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

Big data storage: Hadoop as a system for storing large, unstructured, data sets
Big data processing/integration: Hadoop as a data ingestion/ETL layer
Big data analytics: Hadoop as a platform new new exploratory analytic applications

And we’re not the only ones that see it that way. This blog from Cloudera CTO Amr Awadallah outlines three very similar, if differently-named use-cases (Transformation, Active Archive, and Exploration).

In fact, as I also explained during the Hadoop Summit keynote, we see these three roles as a process of maturing adoption, starting with low cost storage, moving on to high-performance data aggregation/ingestion, and finally exploratory analytics.

As such it is interesting to view the current results of our Hadoop survey, which show that the highest proportion of respondents that have implemented or plan to implement Hadoop (63%) for data analytics, followed by 48% for data integration and 43% for data storage.

This would suggest that our respondents include some significantly early Hadoop adopters. I look forward to properly analysing the results to see what they can tell us, but in the meantime it is interesting to note that the percentage of respondents using Hadoop for analytics is significantly higher among those that adopted Hadoop prior to 2012 (88%) compared to those that adopted in in 2012 or 2013 (65%).

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q7: Hadoop’s role

What is the point of Hadoop?

October 10th, 2011 — Data management

Among the many calls we have fielded from users, investors and vendors about Apache Hadoop, the most common underlying question we hear could be paraphrased ‘what is the point of Hadoop?’.

It is a more fundamental question than ‘what analytic workloads is Hadoop used for’ and really gets to the heart of uncovering why businesses are deploying or considering deploying Apache Hadoop. Our research suggests there are three core roles:

– Big data storage: Hadoop as a system for storing large, unstructured, data sets
– Big data integration: Hadoop as a data ingestion/ETL layer
– Big data analytics: Hadoop as a platform new new exploratory analytic applications

While much of the attention for Apache Hadoop use-cases focuses on the innovative new analytic applications it has enabled in this latter role thanks to its high-profile adoption at Web properties, for more traditional enterprises and later adopters the first two, more mundane, roles are more likely the trigger for initial adoption. Indeed there are some good examples of these three roles representing an adoption continuum.

We also see the multiple roles playing out at a vendor level, with regards to strategies for Hadoop-related products. Oracle’s Big Data Appliance (451 coverage), for example, is focused very specifically on Apache Hadoop as a pre-processing layer for data to be analyzed in Oracle Database.

While Oracle focuses on Hadoop’s ETL role, it is no surprise that the other major incumbent vendors showing interest in Hadoop can be grouped into three main areas:

– Storage vendors
– Existing database/integration vendors
– Business intelligence/analytics vendors

The impact of these roles on vendor and user adoption plans will be reflected in my presentation at Hadoop World in November, the Blind Men and The Elephant.

You can help shape this presentation, and our ongoing research into Hadoop adoption drivers and trends, by taking our survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.

Comments Off on What is the point of Hadoop?

Upcoming presentation on virtualization and storage

March 19th, 2010 — Storage

I’m going to be presenting the introductory session at a BrightTalk virtual conference on March 25 on the role and impact of the virtual server revolution on the storage infrastructure. Although it’s been evident for some time that the emergence of server virtualization has had — and continues to have — a meaningful impact on the storage world, the sheer pace of change here makes this a worthwhile topic to revisit. As the first presenter of the event — the conference runs all day — it’s my job to set the scene; as well as introducing the topic within the context of the challenges that IT and storage managers face, I’ll outline a few issues that will hopefully serve as discussion points throughout the day.

Deciding on which issues to focus on is actually a lot harder than it sounds — I only have 45 minutes — because, when you start digging into it, the impact of virtualization on storage is profound on just about every level; performance, capacity (and more importantly, capacity utilization), data protection and reliability, and management.

I’ll aim to touch on as many of these points as time allows, as well as provide some thoughts on the questions that IT and storage managers should be asking when considering how to improve their storage infrastructure to get the most out of an increasingly virtualized datacenter.

The idea is to make this a thought-provoking and interactive session. Register for the live presentation here: http://www.brighttalk.com/webcast/6907. After registering you will receive a confirmation email as well as a 24-hour reminder email. As a live attendee you will be able to interact with me by posing questions which I will be able to answer on air. If you are unable to watch live, the presentation will remain available via the link above for on-demand participation.

1 Comment

Bridging the “storage-information” gap

April 1st, 2008 — Archiving, Content management, Storage

When Nick first unveiled this blog last month he rightly noted ‘storage’ as one of the many categories that falls into a capacious bucket we term ‘information management.’ With this in mind he reminded me that it would be appropriate for the 451 Group’s storage research team to contribute to the debate, so here it is!

For the uninitiated, storage can appear to be either a bit of a black hole, or just a lot of spinning rust, so I’m not going to start with a storage 101 (although if you have a 451 password you can peruse our recent research here). Suffice to say that storage is just one element of the information management infrastructure, but its role is certainly evolving.

Storage systems and associated software traditionally have provided applications and users with the data they need, when they need it, along with the required levels of protection. Clearly, storage has had to become smarter (not to mention cheaper) to deal with issues like data growth; technologies such as data deduplication help firms grapple with the “too much” part of information management. But up until now the lines of demarcation between “storage” (and data management) and “information” management have been fairly clear. Even though larger “portfolio” vendors such as EMC and IBM have feet in both camps, the reality is that such products and services are organized, managed and sold separately.

That said, there’s no doubt these worlds are coming together. The issues we as analysts are grappling with relate to where and why this taking place, how it manifests itself, the role of technology, and the impact of this on vendor, investor and end-user strategies. At the very least there is a demand for technologies that help organizations bridge the gap – and the juxtaposition – between the fairly closeted, back-end storage “silo” and the more, shall we say, liberated, front-end interface where information meets its consumers.

Here, a number of competing forces are challenging, even forcing, organizations to become smarter about understanding what “information” they have in their storage infrastructure; data retention vs data disposition, regulated vs unregulated data and public vs private data being just three. Armed with such intelligence, firms can, in theory, make better decisions about how (and how long) data is stored, protected, retained and made available to support changing business requirements.

“Hang on a minute,” I hear you cry. “Isn’t this what Information Lifecycle Management (ILM) was supposed to be about?” Well, yes, I’m afraid it was. And one thing that covering the storage industry for almost a decade has told me is that it moves at a glacial pace. In the case of ILM, the iceberg has probably lapped it by now. The hows and whys of ILM’s failure to capture the imagination of the industry is probably best left for another day, but I believe that at least one aim of ILM – helping organizations better understand their data so it can better support the business — still makes perfect sense.

What we are now seeing is the emergence of some real business drivers that are compelling a variety of stakeholders – from CIOs to General Counsel — to take an active interest in better understanding their data. This, in turn, is driving industry consolidation as larger vendors in particular move to fill out their product portfolios; the latest example of this is the news of HP’s acquisition of Australia-based records management specialist Tower Software. Over the next few weeks I’ll be exploring in more detail three areas where we think this storage-information gap is being bridged; in eDiscovery, archiving and security. Stay tuned for our deeper thoughts and perspectives in this fast-moving space.

Comments Off on Bridging the “storage-information” gap

The Data Day, A few days: November 15-21 2013

7 Hadoop questions. Q7: Hadoop’s role

What is the point of Hadoop?

Upcoming presentation on virtualization and storage

Bridging the “storage-information” gap

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

The Data Day, A few days: November 15-21 2013

7 Hadoop questions. Q7: Hadoop’s role

What is the point of Hadoop?

Upcoming presentation on virtualization and storage

Bridging the “storage-information” gap

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives