7 Hadoop questions. Q4: alternative file systems

Which is your preferred Hadoop file system? The obvious answer is likely to be the Hadoop Distributed File System itself, although in recent years we’ve seen an increasing number of vendors pitching their own file system technologies as potential alternatives to HDFS. That’s why the use of alternative file systems is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

hadoop-elephant

The limitations of HDFS are well-publicised, and it is no surprise that many vendors see an opportunity to pitch their existing files system technologies as alternatives to HDFS.

There is now a large number of HDFS alternatives to choose from, including: Cleversafe Dispersed Storage Network, DataStax CassandraFS, EMC Isilon OneFS, IBM GPFS, InkTank Ceph, MapR NFS, Quantcast QFS, Red Hat Storage (GlusterFS), and Symantec Veritas CFS.

Our research indicates that adoption of alternatives to HDFS is limited at this stage and early efforts, such as Appistry’s CloudIQ Storage Hadoop Edition, have come and gone.

However, as adoption of Hadoop grows into more mainstream enterprises, we increasingly see interest in some of these HDFS alternatives, particularly in relation to attempts to reduce duplication of effort with regards to file system management and maintenance.

survey

The early responses to our Hadoop survey are therefore interesting: MapR NFS has scored highest in terms of adoption so far, but there is interest across the board (especially Red Hat Storage, CassandraFS, GPFS, OneFS and Ceph). By and large though, its true to say that most respondents have not considered, tested or adopted an alternative file system to date.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

7 Hadoop questions. Q1: Hadoop and the data warehouse

What is the relationship between Hadoop and the data warehouse? That’s one of the primary questions being asked in the 451 Research 2013 Hadoop survey. Through our conversations with Hadoop users to date we’ve seen that the answer to that question differs from company to company, depending on how far advanced they are in terms of their adoption.

hadoop-elephant

For the most part we see that Hadoop is being used for workloads that were not previously on the data warehouse as part of a strategy of storing, processing and analyzing data that was previous ignored due to being unsuitable – either in terms of cost or data format – for analysis using a relational data warehouse.

However, we also see some companies taking advantage of the cost advantages of storing data in Hadoop to offload workloads from the data warehouse, either temporarily or permanently.

And at the other end of the spectrum we also see companies in which Hadoop is being used, or at least considered at this stage, as a replacement for the data warehouse.
survey

Which use-cases are most popular? That’s one of the things our survey is designed to find out. The early results indicate a greater preference for Hadoop being used for workloads that were not previously on the data warehouse and also Hadoop being used to permanently migrate some workloads from the data warehouse, but it is still early stages.

While that accounts for the way in which Hadoop is being used today, it doesn’t get to the heart of the long-term potential for Hadoop in relation to the data warehouse. Therefore, the survey also asks about the long-term potential to replace the data warehouse.

Again we see a spectrum of strategies in action, from some companies planning for Hadoop to eventually completely replace the data warehouse, through some moving the majority of workloads to Hadoop, through others moving a minority of workloads to Hadoop, to those that believe Hadoop will never replace the data warehouse.

Again the early survey results are interesting, with ‘a minority of workloads will move to Hadoop’ and ‘Hadoop will never replace the data warehouse’ the most popular answers at this early stage.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

451 Research Hadoop survey is now live

If you’re using or considering using Hadoop, please help shape our understanding of global Hadoop usage by taking our 2013 Hadoop survey, which can be found at http://www.surveymonkey.com/s/451Hadoop

The aim of this survey is to identify trends in Hadoop usage, as well as attitudes to Hadoop as it relates to data warehousing.

There are a minimum of 15 questions to answer, and a maximum of 24 (including three optional questions) depending on your organisation’s level of adoption, and the entire survey should take no longer than fifteen minutes to complete.

Some of the specific aspects covered by the survey are:

  • Current and planned Hadoop usage
  • Responsibility for managing Hadoop clusters
  • Preferred infrastructure for Hadoop deployments
  • Hadoop and the data warehouse
  • Potential Hadoop improvements
  • Hadoop-as-a-Service
  • Hadoop hardware
  • Alternative file systems
  • SQL-on/in-Hadoop

All individual responses are of course confidential. The results will be published as part of a major research report due during Q4 which will include market sizing estimates for the analytic database sector, as well as Hadoop. The full report will be available to 451 Research clients, while the results of the survey will also be made freely available.

Thank you in advance for your participation.

http://www.surveymonkey.com/s/451Hadoop

Is MySQL usage really declining?

If you’re a MySQL user, tell us about your adoption plans by taking our current survey.

Back in late 2009, at the height of the concern about Oracle’s imminent acquisition of Sun Microsystems and MySQL, 451 Research conducted a survey of open source software users to assess their database usage and attitudes towards Oracle.

The results provided an interesting snapshot of the potential implications of the acquisition and the concerns of MySQL users and even, so I am told, became part of the European Commission’s hearing into the proposed acquisition (used by both sides, apparently, which says something about both our independence and the malleability of data).

One of the most interesting aspects concerned the apparently imminent decline in the usage of MySQL. Of the 285 MySQL users in our 2009 survey, only 90.2% still expected to be using it two years later, and only 81.8% in 2014.

Other non-MySQL users expected to adopt the open source database after 2009, but the overall prediction was decline. While 82.1% of our sample of 347 open source users were using MySQL in 2009, only 78.7% expected to be using it in 2011, declining to 72.3% in 2014.

This represented an interesting snapshot of sentiment towards MySQL, but the result also had to be taken with a pinch of salt given the significant level of concern regarding MySQL future at the time the survey was conducted.

The survey also showed that only 17% of MySQL users thought that Oracle should be allowed to keep MySQL, while 14% of MySQL users were less likely to use MySQL if Oracle completed the acquisition.

That is why we are asking similar questions again, in our recently launched MySQL/NoSQL/NewSQL survey.

More than two years later Oracle has demonstrated that it did not have nefarious plans for MySQL. While its stewardship has not been without controversial moments, Oracle has also invested in the MySQL development process and improved the performance of the core product significantly. There are undoubtedly users that have turned away from MySQL because of Oracle but we also hear of others that have adopted the open source database specifically because of Oracle’s backing.

That is why we are now asking MySQL users to again tell us about their database usage, as well as attitudes to MySQL following its acquisition by Oracle. Since the database landscape has changed considerably late 2009, we are now also asking about NoSQL and NewSQL adoption plans.

Is MySQL usage really in decline, or was the dip suggested by our 2009 survey the result of a frenzy of uncertainty and doubt given the imminent acquisition. Will our current survey confirm or contradict that result? If you’re a MySQL user, tell us about your adoption plans by taking our current survey.