The Data Day, A few days: July 18-25 2014

HP invests in Hortonworks, Teradata acquires Hadapt and Revelytix. And more

And that’s the data day, today.

The Data Day, A few days: October 12-18 2013

Apache Hadoop 2 goes GA. Teradata cuts guidance. And more

And that’s the data day, today.

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

hadoop-elephant

I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.

SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.

SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.

SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.

survey

It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, A few days: September 20-30 2013

Three reasons why Nirvanix failed. And more

And that’s the data day, today.

The Data Day, A few days: March 11-14 2013

SAP’s predictive analytics plans. Dell’s Boomi MDM. And more

And that’s the data day, today.

The Data Day, Today: November 14 2012

Funding for Continuuity and 10gen. Wibi Data launches the Kiji. And more.

And that’s the Data Day, today.

The Data Day, Two days: November 8/9 2012

Funding for Neo, Elasticsearch and Hadapt. And more

And that’s the Data Day, today.

The Data Day, Two days: October 15/16 2012

NGDATA searches for consumer intelligence. Sparsity looks for social analytics partners.

And that’s the Data Day, today.

The Data Day, Three days: September 3/4/5 2012

Basho joins CloudStack. Pentaho rides Hadoop wave. And more.

And that’s the Data Day, today.

The Data Day, Two days: August 6/7 2012

Hadapt goes GA (quietly). Birst delivers Distributed Business Analytics

And that’s the Data Day, today.