The Data Day, A few days: June 1-12, 2015

Teradata supports Presto. And more

And that’s the data day, today.

The Data Day, A few days: December 13-19 2014

Teradata acquires RainStor, MongoDB acquires WiredTiger. And more

And that’s the data day, today.

The Data Day, A few days: July 10-17 2014

Introducing the Total Data Warehouse. And more

And that’s the data day, today.

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.


I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.

SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.

SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.

SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.


It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, A few days: June 26-28 2013

Hortonworks raises $50m, previews next-generation Hadoop. And more

And that’s the data day, today.

The Data Day, A few days: June 11-25 2013

A bumper round-up of the past 14 days’ data-related news

* Cisco announced its intention to acquire Composite Software.

* Software AG acquired Apama.

* TIBCO Software acquired StreamBase Systems.

* Cloudera appointed Tom Reilly as Chief Executive Officer and Mike Olson as Chief Strategy Officer and Chairman of the Board.

* Sears Holdings named Jeff Balagna Chief Executive Officer of MetaScale

* Ex-Yahoo CTO launched Altiscale, hardcore Hadoop as a service.

* SpaceCurve raised a $10M Series B round of financing.

* Sqrrl announced general availability of Sqrrl Enterprise.

* GE launched Predictivity services, supported by supported by Proficy Historian HD.

* Datameer announced Datameer 3.0.

* Oracle announced the general availability of MySQL Cluster 7.3.

* MemSQL announced the upcoming availability of MemSQL 2.1.

* Continuuity announced the release of Weave, a new open source project that enables Java developers to rapidly build scalable, distributed applications on YARN.

* RainStor adds security, text search features to database complement for Hadoop.

* Composite Software introduced version 6.2 SP3 of its Composite Data Virtualization Platform

* TokuDB launched TokuMX.

* Terracotta announced the immediate availability of Terracotta Universal Messaging.

* HP united its data management assets under HAVEn brand.

* Hortonworks and Red Hat announced an engineering collaboration around Hadoop.

* Rackspace Hosting’s ObjectRocket Database as a Service entered into a strategic agreement with 10gen.

* Simon Phipps posted State Of The Sea Lion – June 2013.

* Netflix announced that its Genie Hadoop-aaS management software is now open source

* Storm-YARN released as open source.

* Big Data arrived at the Oxford English Dictionary

And that’s the data day, today.

The Data Day, The week that was: October 22-26 2012

Cloudera launches Impala. Actuate snags Quiterian. Microsoft previews HDInsight.

And the rest:
– Microsoft previewed its Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows.

– SAP launched a new “big data” bundle and go-to-market strategy.

– Informatica introduced Informatica PowerCenter Big Data Edition and reported its third quarter results.

– Also announcing financial results last week were QlikTech and Pervasive.

– Teradata updated its Unity suite with the addition of Unity Loader, and introduced its Unified Data Environment and the Unified Data Architecture.

– Splunk confirmed the release of Splunk Hadoop Connect and the Splunk App for HadoopOps.

– 10gen added five vice presidents to its management team.

– Rackspace partnered with Hortonworks to create OpenStack and Hadoop-based offerings for public and private cloud.

– Talend added support for Cassandra, HBase and MongoDB , and introduced big data profiling for Apache Hadoop to its integration platform

– MarkLogic announced support for HDFS and expanded its relationship with Hortonworks.

– Kognitio adopted a free licensing model.

– Calpont launched InfiniDB 3.5.

– MetaMarkets announced that it is open sourcing its Druid streaming, real-time data store.

– YarcData updated its uRiKA Big Data appliance for graph analytics.

– Alpine Data Labs announced a global OEM partnership with QlikTech.

– Actian and Attunity announced Attunity Replicate for Actian Vectorwise.

And that’s the Data Day, today.

The Data Day, Today: October 4 2012

SkySQL goes cloud. RainStor raises $12m. And more

And that’s the Data Day, today.

The Data Day, Today: July 24 2012

Adaptive Planning moves into visual discovery. New CEO for Citrusleaf. And more.

And that’s the Data Day, today.

The Data Day, Today: Mar 22 2012

Oracle reports Q3. EMC acquires Pivotal Labs. ClearStoty launches. And much, much more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Oracle Reports Q3 GAAP EPS Up 20% to 49 Cents; Q3 Non-GAAP EPS Up 15% to 62 Cents Database and middleware revenue up 10%.

* EMC Goes Social, Open and Agile With Big Data EMC acquires Pivotal Labs, plans to release Chorus as an open source project

* ClearStory Data Launches With Investment From Google Ventures, Andreessen Horowitz and Khosla Ventures

* HP Lead Big Data Exec Chris Lynch Resigns

* “Hortonworks Names Ari Zilka Chief Products Officer

* DataStax Enterprise 2.0 Adds Enterprise Search Capabilities to Smart Big Data Platform

* MapR Unveils Most Comprehensive Data Connection Options for Hadoop

* New Web-Based Alpine Illuminator Integrates with EMC Greenplum Chorus, The Social Data Science Platform

* RainStor and IBM InfoSphere BigInsights to Address Growing Big Data Challenges

* IBM Introduces New Predictive Analytics Services and Software to Reduce Fraud, Manage Financial Performance and Deliver Next Best Action

* Datameer Releases Major New Version of Analytics Platform

* Kognitio Announces Formation of “Kognitio Cloud” Business Unit

* HStreaming Announces Free Community Edition of Its Real-Time Analytics Platform for Hadoop

* Talend and MapR Announce Certification of Big Data Integration and Big Data Quality

* Schooner Information Technology Releases Membrain 4.0

* Gazzang Launches Big Data Encryption and Key Management Platform

* Logicworks Solves Big Data Hosting Challenges With New Infrastructure Services for Hadoop

* “Big Data” Among Most Confusing Tech Buzzwords

* For 451 Research clients

# Infochimps launches Chef-based platform for Hadoop deployment Impact Report

# Big-data security, or SIEM buzzword parity? Spotlight report

# DataStax adds enterprise search and elastic reprovisioning to database platform Market Development report

# With a new CEO and IBM as a reseller, Revolution Analytics charts next growth phase Market Development report

# Cray branches out, offering storage and a ‘big data’ appliance Market Development report

# CodeFutures sees a future beyond database sharding Market Development report

# Third time lucky for ScaleOut StateServer 5.0? Market Development report

# Attunity looks to 2012 for turnaround; up to the cloud and ‘big data’ movement Market Development report

# Panorama rides Microsoft’s coattails into in-memory social BI using SQL Server 2012 Market Development report

And that’s the Data Day, today.