The Data Day, A few days: October 11 – October 17 2014

Insanely large Strata-Hadoop World edition

And that’s the data day, today.

7 Hadoop questions. Q4: alternative file systems

Which is your preferred Hadoop file system? The obvious answer is likely to be the Hadoop Distributed File System itself, although in recent years we’ve seen an increasing number of vendors pitching their own file system technologies as potential alternatives to HDFS. That’s why the use of alternative file systems is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

hadoop-elephant

The limitations of HDFS are well-publicised, and it is no surprise that many vendors see an opportunity to pitch their existing files system technologies as alternatives to HDFS.

There is now a large number of HDFS alternatives to choose from, including: Cleversafe Dispersed Storage Network, DataStax CassandraFS, EMC Isilon OneFS, IBM GPFS, InkTank Ceph, MapR NFS, Quantcast QFS, Red Hat Storage (GlusterFS), and Symantec Veritas CFS.

Our research indicates that adoption of alternatives to HDFS is limited at this stage and early efforts, such as Appistry’s CloudIQ Storage Hadoop Edition, have come and gone.

However, as adoption of Hadoop grows into more mainstream enterprises, we increasingly see interest in some of these HDFS alternatives, particularly in relation to attempts to reduce duplication of effort with regards to file system management and maintenance.

survey

The early responses to our Hadoop survey are therefore interesting: MapR NFS has scored highest in terms of adoption so far, but there is interest across the board (especially Red Hat Storage, CassandraFS, GPFS, OneFS and Ceph). By and large though, its true to say that most respondents have not considered, tested or adopted an alternative file system to date.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, Today: Feb 3 2012

New CEO at Revolution. Pentaho goes big data. EMC Hadoop gets Isilon. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Revolution Analytics Names David Rich New CEO

* Pentaho Open Sources Big Data Capabilities to Further Fuel Widespread Adoption

* EMC Isilon is Industry’s First Scale-Out NAS System with Native Hadoop Support

* Actuate Reports Fourth Quarter and Fiscal Year 2011 Financial Results

* Sumo Logic Raises $15M Series B Round for Next Generation Log Management and Analytics

* Announcing Oracle R Enterprise 1.0

* Paul Cormier Joins Hortonworks’ Board of Directors

* DataStax Launches First Complete Solution for Cassandra Development on Windows and Mac

* Latest Release of Kalido Information Engine Eliminates Data Mart Migration and Consolidation Hassles

* Karmasphere Brings More Power, Collaboration, and Faster Insights to Big Data Analytics Teams on Hadoop

* Why Big Data Won’t Make You Smart, Rich, Or Pretty

* SAP HANA – slowly moving out of hype into actual projects

* For 451 Research clients

# Actuate gets ready to go shopping in the ‘big data’ mall Acquirer IQ

# Couchbase cites enterprise adoption, clarifies distributed NoSQL database strategy Impact report

# SpagoBI illuminates 2012 roadmap, takes open source model to US, Latin America Impact report

# Customer data analysis provider nPario combines big data and smart segmentation Impact report

# Tableau details 2012 growth strategy, gets semantic for visual analytics Market development report

# EMC integrates re-branded Hadoop distribution with Isilon NAS Market development report

# Quiterian seeks funding for new customer analytics in the cloud focus Market development report

# Hortonworks refines its commercial strategy for Apache Hadoop Market development report

# Digital Reasoning pledges to automate the analysis of complex data Market development report

And that’s the Data Day, today.

What’s in a name? EMC Greenplum rebrands its Hadoop distros

As expected, EMC has announced that it is integrating its Greenplum HD distribution of Apache Hadoop with its Isilon scale-out NAS technology. The move coincides with a re-branding of the company’s Hadoop distributions that, while slight, could prove significant.

Specifically, EMC has enabled the Hadoop Distributed File System (HDFS) as a native protocol supported on OneFS in addition to Network File System (NFS) and Common Internet File System (CIFS) support, enabling Isilon systems to provide the underlying storage layer for Hadoop processing, as well as a common storage pool for Hadoop and other systems.

EMC is talking up the benefits of combining Isilon with Greenplum HD. For the record, that’s the Hadoop distribution previously known as Greenplum HD Community Edition, based on the Apache Hadoop 0.20.1 code branch.

Greenplum HD Enterprise Edition, based on MapR Technologies’ M5 distribution, is now known as Greenplum MR, and is not supported by Isilon due to the fact that it replaces HDFS with Direct Access NFS.

EMC notes that Greenplum MR is being positioned as a high-performance Hadoop offering for customers that have failed to achieve their required performance from other distributions.

While EMC is quick to maintain its happiness with the MapR relationship and its commitment to Greenplum MR, it’s clear that tight integration with Isilon, particularly in the EMC Greenplum DCA, will result in an expanded role for Greenplum HD.

Additionally, while the company’s Greenplum Command Center provides unified management for the Greenplum Database, Greenplum HD and Greenplum Chorus as part of the recently announced Unified Analytics Platform (UAP), MapR has its own management and monitoring functionality.

Since we expect EMC to pitch the benefits of integrated software in UAP and software and hardware in DCA, it is now clear that Greenplum HD, rather than the Greenplum MR, is considered the company’s primary Hadoop distribution.

Given Greenplum HD’s starring role in the Unified Analytics Platform (UAP), Data Computing Appliance (DCA) and integration with Isilon, Greenplum MR’s role is likely to become increasingly niche.