PolyBase — Too much information

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.

SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.

SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.

SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.

It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off

The Data Day, Two days: November 12/13 2012

November 13th, 2012 — Data management

Platfora raises $20m. IBM trumpets ‘integration anywhere’. And more

For 451 Research clients: Microsoft previews SQL Server in-memory data processing and Hadoop coexistence bit.ly/TG55su

— Matt Aslett (@maslett) November 13, 2012

For 451 Research clients: IBM trumpets ‘integration anywhere,’ moves into reference data management bit.ly/RShjOM By Krishna Roy

— Matt Aslett (@maslett) November 12, 2012

Platfora raises $20m series B to fund its in-memory BI platform for Hadoop. mwne.ws/SjpmT4

— Matt Aslett (@maslett) November 13, 2012

DataSift Raises $15M To Help Businesses Mine And Analyze Social Data ow.ly/ffzaD via @techcrunch

— DataSift (@DataSift) November 13, 2012

We are introducing Jaspersoft 5 today. Read more about our next-gen platform for data exploration ow.ly/feEtn. #bigdata #BI

— Jaspersoft Corp. (@Jaspersoft) November 13, 2012

RethinkDB reemerges with distributed document database. bit.ly/RShoSK

— Matt Aslett (@maslett) November 12, 2012

Big data visualization startup Zoomdata launches with $1.1m of seed funding. prn.to/ZCE6Sm

— Matt Aslett (@maslett) November 13, 2012

Oracle has made a strategic minority investment in Engine Yard. bit.ly/ZC49cj

— Matt Aslett (@maslett) November 13, 2012

FairCom claims SQL-NoSQL bridge with updated c-treeACE. bit.ly/Zuwa5C

— Matt Aslett (@maslett) November 12, 2012

McObject launches eXtremeDB Financial Edition mwne.ws/ZuvP2Q

— Matt Aslett (@maslett) November 12, 2012

And that’s the Data Day, today.

Comments Off

The Data Day, Two days: November 6/7 2012

November 7th, 2012 — Data management

Microsoft launches Hekaton, PolyBase. Appcelerator acquires Nodeable. And more

For 451 Research clients: Total data analytics: predicting the future bit.ly/SN3yCu The third extract from our Total Data Analytics

— Matt Aslett (@maslett) November 7, 2012

For 451 Research clients: SQLstream preps real-time SQL analysis of log and file data streams bit.ly/SN3L8w

— Matt Aslett (@maslett) November 7, 2012

Microsoft unveils in-memory transaction processing for SQL Server and ability to execute Hadoop queries from PDW. bit.ly/SN5kDz

— Matt Aslett (@maslett) November 7, 2012

Appcelerator has acquired Nodeable, will make StreamReduce open source. mwne.ws/RIN9NX

— Matt Aslett (@maslett) November 7, 2012

Pentaho adds instant big data discovery and mobile analysis to Pentaho Business Analytics Enterprise Edition. bit.ly/RR0DW8

— Matt Aslett (@maslett) November 6, 2012

Precog bring its Data Science Platform to MongoDB. mwne.ws/RWyv3T

— Matt Aslett (@maslett) November 7, 2012

Attunity launches Attunity Managed File Transfer (MFT) for Hadoop. prn.to/RR1KoO

— Matt Aslett (@maslett) November 6, 2012

And that’s the Data Day, today.

Comments Off

He did win a massive democratic mandate but that didn’t give him carte blanche to lie with impunity. @BorisJohnson… https://t.co/OKCsqKXifc

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

The Data Day, Two days: November 12/13 2012

The Data Day, Two days: November 6/7 2012

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

The Data Day, Two days: November 12/13 2012

The Data Day, Two days: November 6/7 2012

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives