June 12th, 2015 — Data management
Teradata supports Presto. And more
And that’s the data day, today.
December 19th, 2014 — Data management
Teradata acquires RainStor, MongoDB acquires WiredTiger. And more
And that’s the data day, today.
July 17th, 2014 — Data management
Introducing the Total Data Warehouse. And more
And that’s the data day, today.
October 9th, 2013 — Data management
What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.
I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.
SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.
SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.
SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.
It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ
To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.
June 28th, 2013 — Data management
Hortonworks raises $50m, previews next-generation Hadoop. And more
And that’s the data day, today.
June 25th, 2013 — Data management
A bumper round-up of the past 14 days’ data-related news
* Cisco announced its intention to acquire Composite Software.
* Software AG acquired Apama.
* TIBCO Software acquired StreamBase Systems.
* Cloudera appointed Tom Reilly as Chief Executive Officer and Mike Olson as Chief Strategy Officer and Chairman of the Board.
* Sears Holdings named Jeff Balagna Chief Executive Officer of MetaScale
* Ex-Yahoo CTO launched Altiscale, hardcore Hadoop as a service.
* SpaceCurve raised a $10M Series B round of financing.
* Sqrrl announced general availability of Sqrrl Enterprise.
* GE launched Predictivity services, supported by supported by Proficy Historian HD.
* Datameer announced Datameer 3.0.
* Oracle announced the general availability of MySQL Cluster 7.3.
* MemSQL announced the upcoming availability of MemSQL 2.1.
* Continuuity announced the release of Weave, a new open source project that enables Java developers to rapidly build scalable, distributed applications on YARN.
* RainStor adds security, text search features to database complement for Hadoop.
* Composite Software introduced version 6.2 SP3 of its Composite Data Virtualization Platform
* TokuDB launched TokuMX.
* Terracotta announced the immediate availability of Terracotta Universal Messaging.
* HP united its data management assets under HAVEn brand.
* Hortonworks and Red Hat announced an engineering collaboration around Hadoop.
* Rackspace Hosting’s ObjectRocket Database as a Service entered into a strategic agreement with 10gen.
* Simon Phipps posted State Of The Sea Lion – June 2013.
* Netflix announced that its Genie Hadoop-aaS management software is now open source
* Storm-YARN released as open source.
* Big Data arrived at the Oxford English Dictionary
And that’s the data day, today.
October 29th, 2012 — Data management
Cloudera launches Impala. Actuate snags Quiterian. Microsoft previews HDInsight.
And the rest:
– Microsoft previewed its Windows Azure HDInsight Service and Microsoft HDInsight Server for Windows.
– SAP launched a new “big data” bundle and go-to-market strategy.
– Informatica introduced Informatica PowerCenter Big Data Edition and reported its third quarter results.
– Also announcing financial results last week were QlikTech and Pervasive.
– Teradata updated its Unity suite with the addition of Unity Loader, and introduced its Unified Data Environment and the Unified Data Architecture.
– Splunk confirmed the release of Splunk Hadoop Connect and the Splunk App for HadoopOps.
– 10gen added five vice presidents to its management team.
– Rackspace partnered with Hortonworks to create OpenStack and Hadoop-based offerings for public and private cloud.
– Talend added support for Cassandra, HBase and MongoDB , and introduced big data profiling for Apache Hadoop to its integration platform
– MarkLogic announced support for HDFS and expanded its relationship with Hortonworks.
– Kognitio adopted a free licensing model.
– Calpont launched InfiniDB 3.5.
– MetaMarkets announced that it is open sourcing its Druid streaming, real-time data store.
– YarcData updated its uRiKA Big Data appliance for graph analytics.
– Alpine Data Labs announced a global OEM partnership with QlikTech.
– Actian and Attunity announced Attunity Replicate for Actian Vectorwise.
And that’s the Data Day, today.
October 4th, 2012 — Data management
SkySQL goes cloud. RainStor raises $12m. And more
And that’s the Data Day, today.
July 24th, 2012 — Data management
Adaptive Planning moves into visual discovery. New CEO for Citrusleaf. And more.
And that’s the Data Day, today.
March 22nd, 2012 — Data management
Oracle reports Q3. EMC acquires Pivotal Labs. ClearStoty launches. And much, much more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Oracle Reports Q3 GAAP EPS Up 20% to 49 Cents; Q3 Non-GAAP EPS Up 15% to 62 Cents Database and middleware revenue up 10%.
* EMC Goes Social, Open and Agile With Big Data EMC acquires Pivotal Labs, plans to release Chorus as an open source project
* ClearStory Data Launches With Investment From Google Ventures, Andreessen Horowitz and Khosla Ventures
* HP Lead Big Data Exec Chris Lynch Resigns
* “Hortonworks Names Ari Zilka Chief Products Officer
* DataStax Enterprise 2.0 Adds Enterprise Search Capabilities to Smart Big Data Platform
* MapR Unveils Most Comprehensive Data Connection Options for Hadoop
* New Web-Based Alpine Illuminator Integrates with EMC Greenplum Chorus, The Social Data Science Platform
* RainStor and IBM InfoSphere BigInsights to Address Growing Big Data Challenges
* IBM Introduces New Predictive Analytics Services and Software to Reduce Fraud, Manage Financial Performance and Deliver Next Best Action
* Datameer Releases Major New Version of Analytics Platform
* Kognitio Announces Formation of “Kognitio Cloud” Business Unit
* HStreaming Announces Free Community Edition of Its Real-Time Analytics Platform for Hadoop
* Talend and MapR Announce Certification of Big Data Integration and Big Data Quality
* Schooner Information Technology Releases Membrain 4.0
* Gazzang Launches Big Data Encryption and Key Management Platform
* Logicworks Solves Big Data Hosting Challenges With New Infrastructure Services for Hadoop
* “Big Data” Among Most Confusing Tech Buzzwords
* For 451 Research clients
# Infochimps launches Chef-based platform for Hadoop deployment Impact Report
# Big-data security, or SIEM buzzword parity? Spotlight report
# DataStax adds enterprise search and elastic reprovisioning to database platform Market Development report
# With a new CEO and IBM as a reseller, Revolution Analytics charts next growth phase Market Development report
# Cray branches out, offering storage and a ‘big data’ appliance Market Development report
# CodeFutures sees a future beyond database sharding Market Development report
# Third time lucky for ScaleOut StateServer 5.0? Market Development report
# Attunity looks to 2012 for turnaround; up to the cloud and ‘big data’ movement Market Development report
# Panorama rides Microsoft’s coattails into in-memory social BI using SQL Server 2012 Market Development report
And that’s the Data Day, today.