7 Hadoop questions. Q4: alternative file systems

Which is your preferred Hadoop file system? The obvious answer is likely to be the Hadoop Distributed File System itself, although in recent years we’ve seen an increasing number of vendors pitching their own file system technologies as potential alternatives to HDFS. That’s why the use of alternative file systems is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

hadoop-elephant

The limitations of HDFS are well-publicised, and it is no surprise that many vendors see an opportunity to pitch their existing files system technologies as alternatives to HDFS.

There is now a large number of HDFS alternatives to choose from, including: Cleversafe Dispersed Storage Network, DataStax CassandraFS, EMC Isilon OneFS, IBM GPFS, InkTank Ceph, MapR NFS, Quantcast QFS, Red Hat Storage (GlusterFS), and Symantec Veritas CFS.

Our research indicates that adoption of alternatives to HDFS is limited at this stage and early efforts, such as Appistry’s CloudIQ Storage Hadoop Edition, have come and gone.

However, as adoption of Hadoop grows into more mainstream enterprises, we increasingly see interest in some of these HDFS alternatives, particularly in relation to attempts to reduce duplication of effort with regards to file system management and maintenance.

survey

The early responses to our Hadoop survey are therefore interesting: MapR NFS has scored highest in terms of adoption so far, but there is interest across the board (especially Red Hat Storage, CassandraFS, GPFS, OneFS and Ceph). By and large though, its true to say that most respondents have not considered, tested or adopted an alternative file system to date.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, Two days: August 13/14 2012

Datomic calls time on RDBMS. Actian offers $154m for Pervasive. And more

And that’s the Data Day, today.

Symantec gets the M&A ball rolling in 2012

As if to underscore our belief that the cloud is set to play a bigger role in all things Information Management-related in 2012, Symantec announced this week that it had acquired cloud archiving specialist LiveOffice for $115m, its first acquisition in eight months (451 research clients can read the full deal-analysis report here.

Though the deal was not a huge surprise — some of LiveOffice’s executive team (including CEO and COO) hail from Symantec, which has for the last year been reselling LiveOffice, rebranded as EnterpriseVault.Cloud – it is a significant endorsement of the cloud archiving market; a sub-sector that we have been following closely for a couple of years (we published a detailed, long-form report on the market in late 2010), but has yet to really come to life.

Symantec, which of course dominates the on-premise email archiving market, notes that about half of all archive deployments now go to the cloud. In this respect, cloud archiving is a market that it simply has to participate in more directly. Accordingly, LiveOffice provides Symantec with a better means of serving the smaller organizations that tend to opt for the cloud model, which requires far fewer skills and resources to set up and manage than on-prem models. Of course, it also means Symantec doesn’t have to be religious about which model it promotes; whether on-prem, cloud or a hybrid of the two, it now caters to all requirements.

Symantec also made an interesting comment that LiveOffice is at the right point in its own development where the application of Symantec’s huge scale can help in growing the business, rather than be a hindrance. This is a refreshingly honest acknowledgement that it hasn’t always got the balance right in the past; buy a company that is too small, and the weight of a giant like Symantec risks starving it of oxygen altogether, rather than fanning the flames that made it successful in the first place.

The question now is whether this move may help spark broader growth of the cloud archiving market. LiveOffice was one of the first cloud providers to archive other data types beyond email, and can now store and index a wide variety of data, including from social media, file servers, SharePoint and  even SaaS applications; as more data, workloads and applications move to the cloud, so cloud-based archiving will become more relevant. One big factor in the cloud players’ favor is that email is increasingly going the hosted route, especially for SMEs; if you run corporate email as a service, then you aren’t going to deploy an email archive on-premise.

All in all, we think this is a good move by Symantec, and one that could drive interest in the other cloud-archiving pure plays out there.