Matthew Aslett — Too much information

The Data Day, A few days: October 26-November 1 2013

November 1st, 2013 — Data management

Cloudera launches Enterprise Data Hub. And more

For 451 Research clients: Safe HAVEn? Has HP found an identity with which to fulfill its big-data ambitions? http://t.co/nXje0zQvyp

— Matt Aslett (@maslett) October 30, 2013

For 451 Research clients: ClearStory shares its SaaSy tale of multi-source discovery and analysis http://t.co/fg3HJLOfLt By Krishna Roy

— Matt Aslett (@maslett) October 29, 2013

For 451 Research clients: Rackspace launches cloud and managed Hadoop-as-a-service offerings http://t.co/wHxlvWuG7a

— Matt Aslett (@maslett) October 29, 2013

For 451 Research clients: Calpont retries open source analytic database strategy with Hadoop-friendly twist http://t.co/rOUp6JoNVf

— Matt Aslett (@maslett) November 1, 2013

For 451 Research clients: IBM seeks to better serve the business of BI, planning and statistics http://t.co/Re17fOD6fk By Krishna Roy

— Matt Aslett (@maslett) October 28, 2013

For 451 Research clients: Infobright illuminates its plans for the analysis of machine data http://t.co/RNgwSyvawu

— Matt Aslett (@maslett) November 1, 2013

For 451 clients: Adaptive Planning plots growth strategy, bags series D extension, salesforce backing http://t.co/En9mnl5v8q By Krishna Roy

— Matt Aslett (@maslett) November 1, 2013

For 451 Research clients: ScaleOut delivers hServer V2, in-memory MapReduce engine for Hadoop. http://t.co/FOhjoPoPd0

— Matt Aslett (@maslett) October 29, 2013

Teradata reports net income of $98m on Q3 revenue up 3% to $666m. http://t.co/brgymsB86m

— Matt Aslett (@maslett) November 1, 2013

Teradata CEO says 4% to 8% of the total workload on Teradata data warehouses could potentially move to Hadoop. http://t.co/lYh5gFXziP

— Matt Aslett (@maslett) November 1, 2013

MicroStrategy reports net income of $17.1m on Q3 revenue up 4% to $141.9m. http://t.co/pLQ0nFmvcV

— Matt Aslett (@maslett) October 29, 2013

Cloudera launches beta of Cloudera Enterprise 5, positioned as an Enterprise Data Hub. http://t.co/c1nsux1Vqg

— Matt Aslett (@maslett) October 30, 2013

Clustrix has launched ClustrixDB as a software download for real-time analytics on live operational data. http://t.co/eANohc3iWP

— Matt Aslett (@maslett) November 1, 2013

Basho announces the technical preview of Riak 2.0. http://t.co/yasX2cx1T3

— Matt Aslett (@maslett) October 30, 2013

AWS adds updates Elastic MapReduce to support Hadoop 2.2, including YARN, as well as MapR M7. http://t.co/q6oI5ImxQD

— Matt Aslett (@maslett) October 30, 2013

Pivotal's Spring for Apache Hadoop is certified with Pivotal HD, Cloudera http://t.co/IyTWmA2RWu and Hortonworks http://t.co/4b6VF9LfaK

— Matt Aslett (@maslett) October 30, 2013

HP to resell Hortonworks Data Platform. http://t.co/XnCqbm23DP

— Matt Aslett (@maslett) October 30, 2013

Talend, has announced version 5.4 of its data integration platform, with support for YARN. http://t.co/RDCUqM26OM

— Matt Aslett (@maslett) October 29, 2013

Rackspace launches cloud and managed Hadoop-as-a-Service offerings. http://t.co/bMQKEIFL0d

— Matt Aslett (@maslett) October 28, 2013

IBM's Softlayer teams up with Cloudera to offer Hadoop on bare metal infrastructure asa service. http://t.co/mPuFe1ETeG

— Matt Aslett (@maslett) October 29, 2013

ClearStory Data has introduced its Data Intelligence offering. http://t.co/bE8wQy4sMS

— Matt Aslett (@maslett) October 28, 2013

Informatica and Cloudera launch reference architecture for Hadoop-based Data Warehouse Optimization. http://t.co/rXdR9axZ1r

— Matt Aslett (@maslett) October 30, 2013

SAS Institute introduces SAS/ACCESS Interface to Cloudera's Impala SQL-on-Hadoop offering. http://t.co/gUDacBLJp8

— Matt Aslett (@maslett) October 29, 2013

MapR has added native security authentication and authorization to the MapR Distribution for Apache Hadoop. http://t.co/S9lC2CG1Mq

— Matt Aslett (@maslett) October 28, 2013

Microsoft's Windows Azure HDInsight Hadoop service is now generally available. http://t.co/ZBFJeoCkqy

— Matt Aslett (@maslett) October 28, 2013

Calpont announces InfiniDB for Apache Hadoop. http://t.co/4XqOGEg0Sq

— Matt Aslett (@maslett) October 30, 2013

Elasticsearch has hired the creator of Logstash to create an open source search and log management offering. http://t.co/oflInpJVkB

— Matt Aslett (@maslett) November 1, 2013

Google adds support for the MySQL Wire Protocol to Google Cloud SQL. http://t.co/Ynt58fCMAn

— Matt Aslett (@maslett) November 1, 2013

Paxata launches self-service data preparation platform, $8m series B funding. http://t.co/26VM9i8808 (PDF)

— Matt Aslett (@maslett) October 28, 2013

Julian Hyde joins Hortonworks to integrate Optiq’s cost based optimizer with Apache Hive. http://t.co/0bp89zgQIg

— Matt Aslett (@maslett) October 29, 2013

Cloudera launches cloud-focused partnership program, involving Verizon, Savvis, SoftLayer and T-Systems. http://t.co/avWKIJVpRZ

— Matt Aslett (@maslett) October 28, 2013

Cloudera announces support for Apache Spark, Cloudera Connect: Innovators partnership with Databricks. http://t.co/hRP7xbMTxH

— Matt Aslett (@maslett) October 28, 2013

DataRPM previews natural language search for embedded business intelligence. http://t.co/fYodlJeCZQ

— Matt Aslett (@maslett) October 29, 2013

Doug Cutting discusses the future of data. http://t.co/27ZwleFDaR

— Matt Aslett (@maslett) November 1, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 26-November 1 2013

The Data Day, A few days: October 19-25 2013

October 25th, 2013 — Data management

Hadoop and Teradata go to the cloud. And more.

For 451 Research clients: Savvis offers new Hadoop-based big-data services http://t.co/Vhyd2VU99Z

— Matt Aslett (@maslett) October 23, 2013

For 451 clients: Glassbeam unveils SaaS analysis stack for the Internet of Things, bags seed money http://t.co/3hjbCH81LM By Krishna Roy

— Matt Aslett (@maslett) October 22, 2013

For 451 clients: Automated Insights heads into narrative-led analytic services for the enterprise http://t.co/lneDn0Qaff By Krishna Roy

— Matt Aslett (@maslett) October 23, 2013

For 451 Research clients: Acunu focuses on self-service analytics for streaming and historical data http://t.co/F19yAZYjfG By Krishna Roy

— Matt Aslett (@maslett) October 24, 2013

For 451 Research clients: Percona aims to match Oracle MySQL with Percona Server 5.6 http://t.co/DPQbC4iryh

— Matt Aslett (@maslett) October 21, 2013

For 451 Research clients: ParStream raises $8m series B to fuel real-time analytic database growth http://t.co/B7vBqdKzJ4

— Matt Aslett (@maslett) October 22, 2013

For 451 Research clients: Hazelcast steps up its open source in-memory data grid ambitions with series A funding http://t.co/noJ1hRxuud

— Matt Aslett (@maslett) October 24, 2013

For 451 Research clients: ScaleArc drives toward expanded market for database traffic management software http://t.co/3sdmj0UyBw

— Matt Aslett (@maslett) October 25, 2013

For 451 clients: Orchestra expands into MDM-related terrain as it plays to a bigger US audience http://t.co/Ns610DejP2 By Krishna Roy

— Matt Aslett (@maslett) October 21, 2013

SAP claims Q3 HANA revenue of €149 million, and over HANA 2,100 customers. http://t.co/nXB2As5vhh

— Matt Aslett (@maslett) October 21, 2013

Informatica reports Q3 net income of $10.4m, on revenue up 24% to $235.4m. http://t.co/JcxJZEpZdd

— Matt Aslett (@maslett) October 25, 2013

Software AG reports Q3 net income of €31.1m on revenue down 7% to €238.5m http://t.co/mxLuQ2WOfE Includes €10m+ from 'big data' products

— Matt Aslett (@maslett) October 25, 2013

QlikTech reports Q3 net income of $3.0m on revenue up 21% to $104.1m. http://t.co/cMwloPZsSt

— Matt Aslett (@maslett) October 25, 2013

Attunity reports Q3 net income of $0.7m on revenue up 11% to $6.6m. http://t.co/ogYIvFxkty

— Matt Aslett (@maslett) October 25, 2013

Intel Capital leads $20m investment in SkySQL to grow MariaDB. http://t.co/2kywf3iJDW

— Matt Aslett (@maslett) October 23, 2013

CoolaData closes $7.5m series A round. http://t.co/7X8SGmTFxw

— Matt Aslett (@maslett) October 22, 2013

Sqrrl Data closes $5.2m series A round, releases version 1.2 of Sqrrl Enterprise. http://t.co/hPwIMmlQd2

— Matt Aslett (@maslett) October 21, 2013

Nutonian raises $4M to 'uncover truth from chaos' in big data http://t.co/rhPKfAFMB5

— Matt Aslett (@maslett) October 23, 2013

Hortonworks Data Platform 2.0 now generally available. http://t.co/6Wdymi6u6D

— Matt Aslett (@maslett) October 23, 2013

Pivotal launches Pivotal HD 1.1 and Pivotal GemFire XD http://t.co/YK2715ezJb Pivotal Data Dispatch http://t.co/TZaL8WkLxH

— Matt Aslett (@maslett) October 23, 2013

Savvis introduces Big Data Solutions – based on managed services for Cloudera and MapR. http://t.co/uiOm6zMM4U

— Matt Aslett (@maslett) October 22, 2013

Virtustream launches HANA-Hadoop Managed Service based on SAP HANA and Intel Distribution for Apache Hadoop. http://t.co/FxlR27l1Tn

— Matt Aslett (@maslett) October 23, 2013

Teradata introduces Teradata Cloud including Teradata Database, Aster Discovery Platform and Hadoop as a Service. http://t.co/gvMqCnD7Zk

— Matt Aslett (@maslett) October 21, 2013

Teradata will add JSON support to Teradata Database during the second quarter of 2014, targeting Internet of Things. http://t.co/pvih4HuSXB

— Matt Aslett (@maslett) October 21, 2013

Teradata introduces Data Warehouse Appliance 2750 http://t.co/CBQuHhvIpu and Teradata Extreme Data Platform 1700. http://t.co/RMSUUi3N3L

— Matt Aslett (@maslett) October 21, 2013

Teradata introduces new Teradata Data Stream Architecture for backup and recovery. http://t.co/9gCUtGgjQm

— Matt Aslett (@maslett) October 22, 2013

Glassbeam has launched Glassbeam SCALAR, a cloud-based platform for machine data analytics. http://t.co/6GJSRry7IN

— Matt Aslett (@maslett) October 23, 2013

Platfora delivers Platfora Big Data Analytics Platform 3.0, including event-stream analytics. http://t.co/TfjJajfB6y

— Matt Aslett (@maslett) October 23, 2013

SAP unveils service pack 7 for SAP HANA http://t.co/ymTj3acLyd strategic partnership with SAS. http://t.co/YCUe37VcXv

— Matt Aslett (@maslett) October 22, 2013

Capgemini forms global Hadoop partnership with Cloudera. http://t.co/HIZ9HgfRJJ

— Matt Aslett (@maslett) October 25, 2013

Basho and Seagate partner on Riak-based scale-out cloud storage. http://t.co/r1C9hUdi3C

— Matt Aslett (@maslett) October 22, 2013

WhereScape and Teradata sign global reseller agreement. http://t.co/sXprtI3J0z

— Matt Aslett (@maslett) October 21, 2013

Facebook explains how it scale MySQL with MySQL Pool Scanner. http://t.co/h1hs05CrDs

— Matt Aslett (@maslett) October 24, 2013

Splice Machine seeks evaluators for Hadoop-based transactional SQL database. http://t.co/Vg4yZOSy6e

— Matt Aslett (@maslett) October 23, 2013

Distributed Caching is Dead – Long Live… http://t.co/Q2oHSweAfb

— Matt Aslett (@maslett) October 21, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 19-25 2013

7 Hadoop questions. Q7: Hadoop’s role

October 23rd, 2013 — Data management

What is the point of Hadoop? It’s a question we’ve asked a few times on this blog, and continues to be a significant question asked by users, investors and vendors about Apache Hadoop. That is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

As I explained during our keynote presentation at the inaugural Hadoop Summit Europe earlier this year, our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

Big data storage: Hadoop as a system for storing large, unstructured, data sets
Big data processing/integration: Hadoop as a data ingestion/ETL layer
Big data analytics: Hadoop as a platform new new exploratory analytic applications

And we’re not the only ones that see it that way. This blog from Cloudera CTO Amr Awadallah outlines three very similar, if differently-named use-cases (Transformation, Active Archive, and Exploration).

In fact, as I also explained during the Hadoop Summit keynote, we see these three roles as a process of maturing adoption, starting with low cost storage, moving on to high-performance data aggregation/ingestion, and finally exploratory analytics.

As such it is interesting to view the current results of our Hadoop survey, which show that the highest proportion of respondents that have implemented or plan to implement Hadoop (63%) for data analytics, followed by 48% for data integration and 43% for data storage.

This would suggest that our respondents include some significantly early Hadoop adopters. I look forward to properly analysing the results to see what they can tell us, but in the meantime it is interesting to note that the percentage of respondents using Hadoop for analytics is significantly higher among those that adopted Hadoop prior to 2012 (88%) compared to those that adopted in in 2012 or 2013 (65%).

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q7: Hadoop’s role

The Data Day, A few days: October 12-18 2013

October 18th, 2013 — Data management

Apache Hadoop 2 goes GA. Teradata cuts guidance. And more

For 451 Research clients: Teradata adds graph analytics and file store to Aster Discovery Platform http://t.co/rPpnuvfhhc

— Matt Aslett (@maslett) October 14, 2013

For 451 Research clients: Couchbase goes mobile with JSON Anywhere NoSQL database strategy http://t.co/IXKoNmh1Ia

— Matt Aslett (@maslett) October 17, 2013

For 451 Research clients: Hadapt launches 'schemaless SQL' to cover the spectrum of SQL-NoSQL analytics http://t.co/nZI9jlODBa

— Matt Aslett (@maslett) October 16, 2013

For 451 Research clients: SQLstream launches visualization offering for streaming analytics http://t.co/FZXwgxh3zM

— Matt Aslett (@maslett) October 18, 2013

For 451 Research clients: It's good to share: 1010data discovers the key to increasing its customer count http://t.co/3QlYvPp0gN

— Matt Aslett (@maslett) October 15, 2013

The Apache Software Foundation Announces Apache Hadoop 2. http://t.co/tTk82zG73K

— Matt Aslett (@maslett) October 16, 2013

Teradata Lowers Guidance for 2013 http://t.co/LE4jbmwfOh Q3 revenue declined 21% in Asia Pac, 19% in Middle East and Africa.

— Matt Aslett (@maslett) October 15, 2013

SAS Institute and Hortonworks expand strategic alliance for increased joint marketing, R&D and customer support. http://t.co/0SVamNtxg4

— Matt Aslett (@maslett) October 18, 2013

Great interview with Doug Cutting about the past, present and future of Hadoop. http://t.co/cF6xM3tmQD

— Matt Aslett (@maslett) October 18, 2013

Syncsort's Data Protection Business Acquired by Executive Management, Bedford Venture Partners and Windcrest Partners http://t.co/QsD7bAbyqJ

— Matt Aslett (@maslett) October 16, 2013

Talend will transition its Talend Open Studio product family to the Apache License. http://t.co/MXiF8mn68W

— Matt Aslett (@maslett) October 15, 2013

NuoDB has launched version 2.0 of its distributed database. http://t.co/y0vmO9HO7q

— Matt Aslett (@maslett) October 16, 2013

Acunu Launches Acunu Analytics 5.0 for Cassandra http://t.co/pc3PdwUJ2D

— Matt Aslett (@maslett) October 17, 2013

Hortonworks will include Apache Storm in the Hortonworks Data Platform in Q1 of 2014. http://t.co/s7aeYYK9Mb Preview coming in Q4 2013.

— Matt Aslett (@maslett) October 15, 2013

DataStax introduces DevCenter – a free visual query tool for creating and running Cassandra Query Language queries. http://t.co/2r3FSog67f

— Matt Aslett (@maslett) October 17, 2013

Calpont Launches InfiniDB 4 and InfiniDB for the Cloud http://t.co/HUx8Pg1fv8 adopts GPLv2 http://t.co/VF7dasfwws

— Matt Aslett (@maslett) October 15, 2013

Calpont announces Jack McDonnell as new CEO. http://t.co/32gLS1G36V

— Matt Aslett (@maslett) October 15, 2013

Kapow Software is working with Oracle on data integration for Oracle Endeca Information Discovery. http://t.co/4M0t5UzfUC

— Matt Aslett (@maslett) October 18, 2013

Alteryx and Revolution Analytics integrate for R-based predictive analytics. http://t.co/jYGLGy3gEy

— Matt Aslett (@maslett) October 15, 2013

Zettaset files trade secret misappropriation lawsuit against Intel related to Hadoop management software. http://t.co/dQ11TRVYfX

— Matt Aslett (@maslett) October 14, 2013

Tibco assembles its data management portfolio into a 'big data architecture'. http://t.co/Jh19px4gSx

— Matt Aslett (@maslett) October 15, 2013

AquaFold releases Aqua Data Studio 14, adding support for MongoDB and Cassandra, Hive and Microsoft’s SQL Azure. http://t.co/hAKiB9sPgi

— Matt Aslett (@maslett) October 15, 2013

Facebook’s vs Twitter’s Approach to Real-Time Analytics http://t.co/aZAAyeVvR7

— Matt Aslett (@maslett) October 14, 2013

Adding ACID to Apache Hive http://t.co/kzvNjzn1pt

— Matt Aslett (@maslett) October 14, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 12-18 2013

7 Hadoop questions. Q6: Hadoop’s shortcomings

October 16th, 2013 — Data management

What are the major shortcomings of Hadoop? The answer to that questions looks set to shape the future development roadmap for the open source data processing framework, which is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

The limitations of Hadoop have been widely reported over the years, but as the Apache Hadoop community and related vendors have responded to issues such as reliability and high availability – not least via the now generally available Apache Hadoop 2 – so attention turns to other areas such as security, administration and performance, as well as more advanced functionality requirements, including graph processing, stream processing, improved SQL support and virtualization support.

The list of potential improvements is therefore fairly long, and as we near the end of our survey it is interesting to see that the list of key advances respondents are looking for in order to increase adoption of Hadoop is fairly widespread.

So far the responses to our Hadoop survey suggest administration tooling and performance top the list, followed by reliability, SQL support and backup and recovery, but development tools and authentication and access control are not far behind.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q6: Hadoop’s shortcomings

The Data Day, A few days: October 5-11 2013

October 11th, 2013 — Data management

TransLattice acquires StormDB. Funding for Cirro and TempoDB. And more.

For 451 Research clients: $150m funding highlights the scale of MongoDB's ambition, and the scale of its challenge http://t.co/nnwxr0wiWi

— Matt Aslett (@maslett) October 9, 2013

For 451 Research clients: TransLattice acquires StormDB to boost PostgreSQL-based elastic database http://t.co/6hgUq1zBEX

— Matt Aslett (@maslett) October 10, 2013

For 451 Research clients: Looker scores $16m series A funding for in-database-analysis play http://t.co/zeJbKlredF By Krishna Roy

— Matt Aslett (@maslett) October 10, 2013

For 451 Research clients: Altiscale aims to take Hadoop as a service to another level http://t.co/pW6Gg0y5Zj

— Matt Aslett (@maslett) October 10, 2013

For 451 clients: Data quality, governance and MDM: the other tenants in Total Data integration http://t.co/B5yWJOJhUC By Krishna Roy

— Matt Aslett (@maslett) October 8, 2013

For 451 Research clients: Datical emerges from stealth with a database change management tool http://t.co/X3XqNNCvIL By @cote

— Matt Aslett (@maslett) October 11, 2013

TransLattice Acquires StormDB to Enhance TransLattice Elastic Database. http://t.co/iIGc26rNVM

— Matt Aslett (@maslett) October 9, 2013

Cirro has closed $8 million in Series A funding led by Toba Capital. http://t.co/XBzzBGMKMf

— Matt Aslett (@maslett) October 9, 2013

TempoDB raises $3.2m for database service for the Internet of Things. http://t.co/N0hrD9pmPl

— Matt Aslett (@maslett) October 9, 2013

Teradata updates Aster Discovery Platform with graph engine, new file store and SNAP Framework. http://t.co/JtrUoRSpEb

— Matt Aslett (@maslett) October 8, 2013

MarkLogic 7 introduces native HDFS and AWS S3 integration and MarkLogic Semantics with MarkLogic 7. http://t.co/Mm3tIEjMf0

— Matt Aslett (@maslett) October 10, 2013

MapR CEO John Schroeder claims company has over 500 (software license purchasing) paying customers. http://t.co/XHWutcAZlx

— Matt Aslett (@maslett) October 8, 2013

Actian launches ParAccel Dataflow Operators to integrate the ParAccel Big Data Analytics Platform with Hadoop. http://t.co/rDsR6ugjoh

— Matt Aslett (@maslett) October 11, 2013

Percona Server 5.6 is now generally available. http://t.co/eJA8HTdXQm

— Matt Aslett (@maslett) October 8, 2013

SQLstream announces visualisation platform for real-time, streaming analytics. http://t.co/NfkCR44y3Q

— Matt Aslett (@maslett) October 10, 2013

Basho is working with the NHS to develop the new Spine2 database, supported Riak http://t.co/aTbpiBXXhA

— Matt Aslett (@maslett) October 8, 2013

MongoDB has introduced a free usage tier to MMS Backup, its online backup service http://t.co/DaoN5fWKqm #gatewayeffect

— Matt Aslett (@maslett) October 8, 2013

What economic lessons can #cloud providers learn from Nirvanix? http://t.co/zwXnFhav9x by @eekygeeky, @cloudscompared (FREE REPORT)

— simon robinson (@simonrob451) October 7, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 5-11 2013

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

October 9th, 2013 — Data management

What is your preferred approach to integrating SQL and Hadoop? Until recently that was a straight shoot-out between Hive and Pig, but in 2013 the options for making use of existing SQL skills to analyze data in Hadoop have increased dramatically. That’s why the choice of approach to SQL in/on/and Hadoop is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

I write in/on/and as I believe that is a good way of understanding the various approaches and how they compare at this point.

SQL in Hadoop
Hive’s classic approach of converting SQL queries into MapReduce jobs falls into this category, but lacks the performance that some users are looking for to enable more interactive analysis. Hortonworks has started the Stinger Initiative to align HiveQL more closely with standard SQL, optimize Hive’s query execution plans and introduce a new columnar file format for storing Hive data.

SQL on Hadoop
Rather than attempting to improve the performance of SQL-via-MapReduce, several efforts are underway to create a SQL engine that enables native SQL-based processing of data in HDFS while avoiding MapReduce. Key efforts include Cloudera’s Impala project and Cloudera Enterprise RTQ product, the MapR-initiated Apache Drill project, Pivotal’s HAWQ and JethroData. IBM’s Big SQL also appears to fit into this category.

SQL and Hadoop
Co-location of relational database technologies and Hadoop enables data to be processed in each platform, using SQL in the RDBMS and MapReduce in HDFS. Hadapt pioneered this approach, while RainStor launched RainStor Big Data Analytics on Hadoop in early 2012, combining its column-based database software, and Microsoft has been previewing PolyBase, which will offer the ability to join tables from SQL Server PDW with data from HDFS to return a combined result. SQL and Hadoop is a broader category in which we would also include Citus Data, which takes advantage of PostgreSQL’s foreign data wrapper technology to query data in HDFS via the local query execution, as well as Teradata’s SQL-H, which enables SQL analysts to invoke MapReduce and SQL-MapReduce jobs against Hadoop from Teradata’s databases. We would absolutely concede that there are distinct differences between the approaches in this category.

It is naturally early stages for most of these approaches given that most of them only appeared in 2013 and some are still in development and testing. So far the responses to our Hadoop survey suggest higher levels of interest in Cloudera Impala, Cloudera RTQ, and Apache Drill, followed by IBM Big SQL, Hadapt and Pivotal HAWQ

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

The Data Day, A few days: October 1-4 2013

October 4th, 2013 — Data management

MongoDB raises $150m. And more.

For 451 Research clients: Treasure Data shines as an on-ramp to rapid adoption of Hadoop-based analytics http://t.co/a5fx3y650n

— Matt Aslett (@maslett) October 2, 2013

For 451 Research clients: Facebook friends NoSQL for broad 'big data' DCIM http://t.co/JEG7hRFHQn By @rascierto

— Matt Aslett (@maslett) October 1, 2013

For 451 Research clients: Syncsort boards the M&A train, snares Circle Computer http://t.co/6scg9bViMk By Krishna Roy and @ScottDenne

— Matt Aslett (@maslett) October 1, 2013

Holy ****ing **** MongoDB Raises $150 Million http://t.co/5IYbSgRpWY

— Matt Aslett (@maslett) October 4, 2013

Splunk launches Splunk Enterprise 6 http://t.co/JZ4cZUTJ94 and the General Availability of Splunk Cloud. http://t.co/oMAvvurxqb

— Matt Aslett (@maslett) October 1, 2013

IBM to acquire analytics software company The Now Factory. http://t.co/tuMGjEtR20

— Matt Aslett (@maslett) October 1, 2013

The Cloudera Model, explained by @mikeolson http://t.co/Isxa16lbZo

— Matt Aslett (@maslett) October 4, 2013

Former NetApp Executive Manish Goel Joins Guavus as CEO http://t.co/9bFKg886PP

— Matt Aslett (@maslett) October 3, 2013

Logentries raises $10m and names Andrew Burton as CEO. http://t.co/TYHAtJfgIL (PDF)

— Matt Aslett (@maslett) October 1, 2013

WibiData updates Kiji SDK for building big data applications. http://t.co/tpC9JIuA1S

— Matt Aslett (@maslett) October 4, 2013

Classifying the SQL-on-Hadoop Solutions http://t.co/dBdQmBkVC6

— Matt Aslett (@maslett) October 2, 2013

BMC launches Control-M for Hadoop workload automation offering. http://t.co/AlGQ1TPIPR

— Matt Aslett (@maslett) October 2, 2013

Monty Widenius provides an update on MariaDB foundation progress. http://t.co/NDaW1yiPP4

— Matt Aslett (@maslett) October 2, 2013

SkySQL confirms plans for MariaDB Enterprise. http://t.co/J5n40EeZN8

— Matt Aslett (@maslett) October 2, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 1-4 2013

7 Hadoop questions. Q4: alternative file systems

October 2nd, 2013 — Data management

Which is your preferred Hadoop file system? The obvious answer is likely to be the Hadoop Distributed File System itself, although in recent years we’ve seen an increasing number of vendors pitching their own file system technologies as potential alternatives to HDFS. That’s why the use of alternative file systems is one of the primary questions being asked in the 451 Research 2013 Hadoop survey.

The limitations of HDFS are well-publicised, and it is no surprise that many vendors see an opportunity to pitch their existing files system technologies as alternatives to HDFS.

There is now a large number of HDFS alternatives to choose from, including: Cleversafe Dispersed Storage Network, DataStax CassandraFS, EMC Isilon OneFS, IBM GPFS, InkTank Ceph, MapR NFS, Quantcast QFS, Red Hat Storage (GlusterFS), and Symantec Veritas CFS.

Our research indicates that adoption of alternatives to HDFS is limited at this stage and early efforts, such as Appistry’s CloudIQ Storage Hadoop Edition, have come and gone.

However, as adoption of Hadoop grows into more mainstream enterprises, we increasingly see interest in some of these HDFS alternatives, particularly in relation to attempts to reduce duplication of effort with regards to file system management and maintenance.

The early responses to our Hadoop survey are therefore interesting: MapR NFS has scored highest in terms of adoption so far, but there is interest across the board (especially Red Hat Storage, CassandraFS, GPFS, OneFS and Ceph). By and large though, its true to say that most respondents have not considered, tested or adopted an alternative file system to date.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q4: alternative file systems

NoSQL LinkedIn Skills Index – September 2013

October 1st, 2013 — Data management

With our rebooted NoSQL LinkedIn Skills Index, based on the number of LinkedIn member profiles mentioning each of the NoSQL projects, now into its second year, I thought it was a good time to add some newer projects to the list; specifically: ArangoDB, FoundationDB, RethinkDB, and Titan.

It shouldn’t surprise anyone to find that those four new additions failed to make a dent in the top ten list of the NoSQL databases most often cited in LinkedIn profiles. However, there is still some interesting activity this quarter, with Riak leapfrogging MarkLogic (as predicted).

Outside the top ten, Apache Accumulo overtook Voldemort, and saw the second fastest growth in mentions in Q3, behind only DynamoDB and ahead of Neo4j, MongoDB, and Cassandra.

That growth saw MongoDB extend its lead as the most popular NoSQL database, according to LinkedIn profile mentions. As the chart below illustrates, it now accounts for 49% of all mentions of NoSQL technologies in LinkedIn profiles, according to our sample, compared with 47% in June.

Incidentally, adding the four new NoSQL databases to the analysis did not have a significant impact on MongoDB’s share. Without them it still registered 49%. Expect MongoDB to pass the 50% threshold in Q4, however, as well as Couchbase to overtake MarkLogic.

Of course, we would also note that this is not meant to be a comprehensive analysis, but rather a snapshot of one particular data source.

Comments Off on NoSQL LinkedIn Skills Index – September 2013

The Data Day, A few days: October 26-November 1 2013

The Data Day, A few days: October 19-25 2013

7 Hadoop questions. Q7: Hadoop’s role

The Data Day, A few days: October 12-18 2013

7 Hadoop questions. Q6: Hadoop’s shortcomings

The Data Day, A few days: October 5-11 2013

7 Hadoop questions. Q5: SQL in Hadoop, SQL on Hadoop, or SQL and Hadoop?

The Data Day, A few days: October 1-4 2013

7 Hadoop questions. Q4: alternative file systems

NoSQL LinkedIn Skills Index – September 2013

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives