hadoop — Too much information

Hadoop: why enterprises need something to aspire to

January 22nd, 2014 — Data management

Merv Adrian wrote a blog boast recently bemoaning the “aspirational marketing” that surrounds Hadoop, in particular the fact that current deployments are a long way from delivering on the vision.

While I completely agree that many enterprises are struggling to translate tactical use-cases into the business use-cases required to drive more strategic adoption beyond the proof of concept stage, I don’t think that aspirational marketing around Hadoop is necessarily a bad thing.

It is certainly true that part of the problem lies in clearly understanding how Hadoop can be used as a complement to traditional relational database technologies deployed as an enterprise data warehouse.

That is why we recently asked Is Hadoop a planet? – comparing confusion around Hadoop’s classification to that of Pluto – while also describing Hadoop as a framework in search of a metaphor.

Given the confusion, however, I believe it is incumbent on Hadoop providers to describe not just the functional use-cases that are driving tactical adoption, but also the bigger vision that will drive more strategic adoption.

The data management industry has become accustomed to thinking about the storage, processing and analysis of data in analytical databases as akin to warehousing, to the extent that the phrase ‘data warehouse’ no longer requires an explanation.

We believe that a good understanding of the potential strategic role of Hadoop, even if it is only aspirational at this stage, will be important in encouraging broader and deeper adoption of Hadoop.

In addition, it is not as if there are no enterprises deploying Hadoop more strategically. Cloudera estimates that about 20% of its 300 subscription customers are already deploying Hadoop as what it calls an Enterprise Data Hub.

I’m not personally convinced that Enterprise Data Hub is really the right term, (not least since we previously used the term Data Hub in a slightly different context). Other potential terms include data lake and data refinery.

Although the latter better describes Hadoop’s role in aggregating and processing data and the industrial-scale processes used to make data more acceptable for different analytic use-cases, it appears to have quickly passed out of fashion compared to the former.

I have begun using the term ‘data treatment plant’ as a combination of the two concepts to describe how Hadoop can be used as a single ‘logical’ unified data platform into which you simply poor data, while industrial-scale processes – the multiple data processing and analytic engines that will be supported by Hadoop 2.0: such as MapReduce, streaming processing, SQL and NoSQL – are used to make data more acceptable for a desired end-use.

451 clients can get more detail on the ‘data treatment plant’ and why we believe a bit of aspirational marketing may not be a bad thing for Hadoop, from our recent report, Hadoop: a framework in search of a metaphor.

1 Comment

The Data Day, A few days: January 11-17 2014

January 17th, 2014 — Data management

Neo updates Neo4j. Cloudera sharpens focus on Accumulo. And more

For 451 clients: IBM steps up commitment to Watson with BU, $1bn investment, $100m startup fund http://t.co/ptHrWCY7Io By Krishna Roy

— Matt Aslett (@maslett) January 17, 2014

For 451 clients: Diyotta aims to bring data integration to the yottabyte generation using ELT approach http://t.co/oFXyZiJDO4 By Krishna Roy

— Matt Aslett (@maslett) January 13, 2014

Neo Technology has announced the general availability of Neo4j 2.0 and the launch of its online training program. http://t.co/FasHkFW3pE

— Matt Aslett (@maslett) January 14, 2014

Basho introduces commercial support for open source Riak with Riak Starter and Riak Basic. http://t.co/ONGXaYs13d

— Matt Aslett (@maslett) January 14, 2014

SGI announces plans to develop an in-memory appliance based on SAP HANA. http://t.co/AFZCLFYBbI

— Matt Aslett (@maslett) January 14, 2014

Cloudera steps up focused on Apache Accumulo via joint development partnership with Koverse. http://t.co/y3Y30TBfzf

— Matt Aslett (@maslett) January 16, 2014

Verizon introduces Per-Hour Billing for Oracle Database and Oracle Fusion Middleware on Verizon Cloud http://t.co/PtIUq5z02O

— Matt Aslett (@maslett) January 13, 2014

Veristorm Launches vStorm Enterprise, a commercial Hadoop distribution for Linux on mainframe. http://t.co/Zf1GXHbhZd

— Matt Aslett (@maslett) January 14, 2014

Google launches a preview release of the Google Cloud Storage connector for Hadoop. http://t.co/dI3HQ6oOGJ

— Matt Aslett (@maslett) January 14, 2014

Garantia Data's Redis Cloud service is now available on IBM's SoftLayer IaaS platform. http://t.co/ZfUtciJr4M

— Matt Aslett (@maslett) January 15, 2014

And that’s the data day, today.

Comments Off on The Data Day, A few days: January 11-17 2014

The Data Day, A few days: December 13-10 2013

December 19th, 2013 — Data management

VC funding for Hadoop and NoSQL tops $1bn. And more

For 451 Research clients: Venture funding for Hadoop and NoSQL vendors tops $1bn http://t.co/plDXwoFnVK When will revenue catch up?

— Matt Aslett (@maslett) December 17, 2013

For 451 Research: AWS responds to demand for real-time data with Kinesis stream-processing service http://t.co/tfA7YiVj2b

— Matt Aslett (@maslett) December 18, 2013

For 451 Research clients: With $4m series A funding, Nutonian elucidates machine learning strategy http://t.co/iJfXwS2gvW By Krishna Roy

— Matt Aslett (@maslett) December 16, 2013

For 451 Research clients: 451 Live – IT professionals discuss modernizing the storage infrastructure http://t.co/zYqge8nQ0Z

— Matt Aslett (@maslett) December 17, 2013

Datameer raises $19m series D round led by Next World Capital and involving Workday, Citi Ventures, and Software AG. http://t.co/vkISja6bt2

— Matt Aslett (@maslett) December 18, 2013

RethinkDB raises $8m series A funding round. http://t.co/YZTJ7F8VMx

— Matt Aslett (@maslett) December 16, 2013

Amazon Web Services makes Amazon Kinesis generally available http://t.co/Csc4jYOvqo That was quick

— Matt Aslett (@maslett) December 17, 2013

Amazon Web Services adds Global Secondary Indexes to Amazon Dynamo DB. http://t.co/AzR9OMya89

— Matt Aslett (@maslett) December 16, 2013

Amazon adds support for Impala to Amazon Elastic MapReduce. http://t.co/gsR6nenXTS

— Matt Aslett (@maslett) December 16, 2013

Qubole's Data Service is now available on Google Compute Engine. http://t.co/cPJZbx5QBf

— Matt Aslett (@maslett) December 16, 2013

Intel launches Graph Builder for Apache Hadoop v2.0, and Intel Distribution for Apache Hadoop 3.0 http://t.co/lMddv3w9Jn

— Matt Aslett (@maslett) December 17, 2013

Splice Machine brings transactional SQL-on-Hadoop database to MapR. http://t.co/jzXAwzviNQ

— Matt Aslett (@maslett) December 17, 2013

"One of the additions to Fedora 20 is the inclusion of Apache Hadoop" http://t.co/oDFE6X4gDi

— Matt Aslett (@maslett) December 17, 2013

EnterpriseDB has announced the general availability of EDB Failover Manager for Postgres. http://t.co/kBhGA2uvzY

— Matt Aslett (@maslett) December 19, 2013

CouchDB creator Damien Katz is joining http://t.co/td8DAC08WS to work on "ridiculously cool" infrastructure project. http://t.co/EpzodYwDAK

— Matt Aslett (@maslett) December 17, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: December 13-10 2013

Visualizing the $1bn+ VC investment in Hadoop and NoSQL

December 17th, 2013 — Data management, M&A

Cumulative VC funding for Hadoop and NoSQL vendors broke through the $1bn barrier in 2013, according a Spotlight report published by 451 Research, based on data provided by The 451 M&A KnowledgeBase.

The data indicates that there was a substantial increase in funding in 2013 ($530.5m, not including RethinkDB’s $8m announced yesterday) compared to 2012 ($190.9m), thanks to major rounds for the likes of MongoDB, Pivotal, Hortonworks and DataStax.

The report includes a visualization created by 451’s Director of Data Strategy and Solutions, Barbara Peng, that illustrates the connections between the various investors and the NoSQL and Hadoop vendors in which they have invested.

A snapshot of the visualization is shown below but the the original is interactive, enabling 451 Research clients to drag the various elements around for greater emphasis, as well as isolate the NoSQL or Hadoop categories.

451 Research clients can also scroll over the blue circles to see the total amount of funding raised by the individual Hadoop and NoSQL vendors, and scroll over the smaller orange circles to see which investors have backed which companies.

The sample set was limited to 16 vendors for visual clarity, but the six Hadoop and 10 NoSQL providers cited account for more than 87% of funding to date (with Pivotal representing the vast majority of the remaining 13%).

This visualization illustrates that investment in Hadoop and NoSQL providers comes from a relatively small group of VC firms (52 to be specific, excluding individual seed investors), resulting in a relatively tightly clustered graph.

However, the visualization also enables us to put to the test the recent blog post by MarkLogic’s Adam Fowler in which he stated:

“Just look at the number of investors who are investing in multiple NoSQL companies. They’re hedging their bets because they’re not sure themselves which businesses will survive.”

In fact investment in multiple Hadoop and NoSQL vendors is relatively rare. Only 11 out of the 52 VC firms have invested in more than one Hadoop and/or NoSQL vendor, with seven of those picking one Hadoop vendor and one NoSQL provider. Less hedging their bets as picking a winner in each category.

Of the remaining four investment shops, two have invested in one Hadoop distributor, one NoSQL specialist and one Hadoop-as-a-service provider (MapR, DataStax and Qubole for Lightspeed Venture Partners; Cloudera, Couchbase and Altiscale for Accel Partners), while In-Q-Tel has invested in one Hadoop supplier, one NoSQL vendor and one NoSQL-as-a-service provider (Cloudera, MongoDB and Cloudant).

Only Sequoia Capital has invested in multiple NoSQL vendors (as well as Hadoop-as-a-service provider Altiscale) having invested in MongoDB, DataStax and – hold onto your hats, irony fans – MarkLogic. It should be noted however that Sequoia has not invested in DataStax since its series A round in late 2010.

The full report, Venture funding for Hadoop and NoSQL vendors tops $1bn is available now to 451 Research clients and also includes our perspective on when combined Hadoop and NoSQL revenue might begin to exceed combined Hadoop and NoSQL VC funding, as well as the potential for M&A and IPO activity in 2014.

Comments Off on Visualizing the $1bn+ VC investment in Hadoop and NoSQL

The Data Day, A few days: December 6-12 2013

December 12th, 2013 — Data management

Talend raises $40m. GridGain names new CEO. And more

For 451 Research clients: Talend steps further down 'big data' management path, anoints new CEO http://t.co/AduYuAMVe3 By Krishna Roy

— Matt Aslett (@maslett) December 11, 2013

For 451 Research clients: MarkLogic continues NoSQL push with support for RDF triples in MarkLogic 7 http://t.co/ZShcclkqkA

— Matt Aslett (@maslett) December 12, 2013

For 451 clients: 0xdata seeking to serve the need for speed in predictive analytics using R, Hadoop http://t.co/yLIuqSUg87 By Krishna Roy

— Matt Aslett (@maslett) December 9, 2013

For 451 clients QuantCell harnesses love of spreadsheets to create polyglot IDE for advanced analytics http://t.co/iKQEEx4f0Z By Krishna Roy

— Matt Aslett (@maslett) December 10, 2013

Talend closes $40m strategic funding from Bpifrance, Iris Capital and existing investors http://t.co/7InPMRehZp

— Matt Aslett (@maslett) December 11, 2013

GridGain Names Abe Kleinfeld as CEO http://t.co/aceprFRsDS

— Matt Aslett (@maslett) December 9, 2013

Oracle launches Oracle Exadata Database Machine X4 http://t.co/62X1J39Qrj

— Matt Aslett (@maslett) December 12, 2013

Enterprise Hadoop Market in 2013: Reflections and Directions http://t.co/DqvWODmeGc In which @shaunconnolly outlines Hortonworks' strategy

— Matt Aslett (@maslett) December 11, 2013

SAP launches open development environment, Node.js Connector, open source UI framework for SAP HANA. http://t.co/hKpy3eu8wK

— Matt Aslett (@maslett) December 11, 2013

WANdisco’s Non-Stop Hadoop technology is now certified to run on Cloudera’s Distribution for Hadoop version 4. http://t.co/s9AllrlPGD

— Matt Aslett (@maslett) December 11, 2013

BitYota delivers cloud-based data analytics for JSON data from MongoDB http://t.co/vNzQy0n6ih

— Matt Aslett (@maslett) December 11, 2013

Never, ever do this to Hadoop http://t.co/yQiQtNjVi9 Hadoop on the SAN?

— Matt Aslett (@maslett) December 9, 2013

#InfiniSQL license is now GPL, changed from AGPL. Community input compelled this decision. https://t.co/bzIzEjeZAO

— InfiniSQL (@InfiniSQL) December 12, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: December 6-12 2013

The Data Day, A few days: October 26-November 1 2013

November 1st, 2013 — Data management

Cloudera launches Enterprise Data Hub. And more

For 451 Research clients: Safe HAVEn? Has HP found an identity with which to fulfill its big-data ambitions? http://t.co/nXje0zQvyp

— Matt Aslett (@maslett) October 30, 2013

For 451 Research clients: ClearStory shares its SaaSy tale of multi-source discovery and analysis http://t.co/fg3HJLOfLt By Krishna Roy

— Matt Aslett (@maslett) October 29, 2013

For 451 Research clients: Rackspace launches cloud and managed Hadoop-as-a-service offerings http://t.co/wHxlvWuG7a

— Matt Aslett (@maslett) October 29, 2013

For 451 Research clients: Calpont retries open source analytic database strategy with Hadoop-friendly twist http://t.co/rOUp6JoNVf

— Matt Aslett (@maslett) November 1, 2013

For 451 Research clients: IBM seeks to better serve the business of BI, planning and statistics http://t.co/Re17fOD6fk By Krishna Roy

— Matt Aslett (@maslett) October 28, 2013

For 451 Research clients: Infobright illuminates its plans for the analysis of machine data http://t.co/RNgwSyvawu

— Matt Aslett (@maslett) November 1, 2013

For 451 clients: Adaptive Planning plots growth strategy, bags series D extension, salesforce backing http://t.co/En9mnl5v8q By Krishna Roy

— Matt Aslett (@maslett) November 1, 2013

For 451 Research clients: ScaleOut delivers hServer V2, in-memory MapReduce engine for Hadoop. http://t.co/FOhjoPoPd0

— Matt Aslett (@maslett) October 29, 2013

Teradata reports net income of $98m on Q3 revenue up 3% to $666m. http://t.co/brgymsB86m

— Matt Aslett (@maslett) November 1, 2013

Teradata CEO says 4% to 8% of the total workload on Teradata data warehouses could potentially move to Hadoop. http://t.co/lYh5gFXziP

— Matt Aslett (@maslett) November 1, 2013

MicroStrategy reports net income of $17.1m on Q3 revenue up 4% to $141.9m. http://t.co/pLQ0nFmvcV

— Matt Aslett (@maslett) October 29, 2013

Cloudera launches beta of Cloudera Enterprise 5, positioned as an Enterprise Data Hub. http://t.co/c1nsux1Vqg

— Matt Aslett (@maslett) October 30, 2013

Clustrix has launched ClustrixDB as a software download for real-time analytics on live operational data. http://t.co/eANohc3iWP

— Matt Aslett (@maslett) November 1, 2013

Basho announces the technical preview of Riak 2.0. http://t.co/yasX2cx1T3

— Matt Aslett (@maslett) October 30, 2013

AWS adds updates Elastic MapReduce to support Hadoop 2.2, including YARN, as well as MapR M7. http://t.co/q6oI5ImxQD

— Matt Aslett (@maslett) October 30, 2013

Pivotal's Spring for Apache Hadoop is certified with Pivotal HD, Cloudera http://t.co/IyTWmA2RWu and Hortonworks http://t.co/4b6VF9LfaK

— Matt Aslett (@maslett) October 30, 2013

HP to resell Hortonworks Data Platform. http://t.co/XnCqbm23DP

— Matt Aslett (@maslett) October 30, 2013

Talend, has announced version 5.4 of its data integration platform, with support for YARN. http://t.co/RDCUqM26OM

— Matt Aslett (@maslett) October 29, 2013

Rackspace launches cloud and managed Hadoop-as-a-Service offerings. http://t.co/bMQKEIFL0d

— Matt Aslett (@maslett) October 28, 2013

IBM's Softlayer teams up with Cloudera to offer Hadoop on bare metal infrastructure asa service. http://t.co/mPuFe1ETeG

— Matt Aslett (@maslett) October 29, 2013

ClearStory Data has introduced its Data Intelligence offering. http://t.co/bE8wQy4sMS

— Matt Aslett (@maslett) October 28, 2013

Informatica and Cloudera launch reference architecture for Hadoop-based Data Warehouse Optimization. http://t.co/rXdR9axZ1r

— Matt Aslett (@maslett) October 30, 2013

SAS Institute introduces SAS/ACCESS Interface to Cloudera's Impala SQL-on-Hadoop offering. http://t.co/gUDacBLJp8

— Matt Aslett (@maslett) October 29, 2013

MapR has added native security authentication and authorization to the MapR Distribution for Apache Hadoop. http://t.co/S9lC2CG1Mq

— Matt Aslett (@maslett) October 28, 2013

Microsoft's Windows Azure HDInsight Hadoop service is now generally available. http://t.co/ZBFJeoCkqy

— Matt Aslett (@maslett) October 28, 2013

Calpont announces InfiniDB for Apache Hadoop. http://t.co/4XqOGEg0Sq

— Matt Aslett (@maslett) October 30, 2013

Elasticsearch has hired the creator of Logstash to create an open source search and log management offering. http://t.co/oflInpJVkB

— Matt Aslett (@maslett) November 1, 2013

Google adds support for the MySQL Wire Protocol to Google Cloud SQL. http://t.co/Ynt58fCMAn

— Matt Aslett (@maslett) November 1, 2013

Paxata launches self-service data preparation platform, $8m series B funding. http://t.co/26VM9i8808 (PDF)

— Matt Aslett (@maslett) October 28, 2013

Julian Hyde joins Hortonworks to integrate Optiq’s cost based optimizer with Apache Hive. http://t.co/0bp89zgQIg

— Matt Aslett (@maslett) October 29, 2013

Cloudera launches cloud-focused partnership program, involving Verizon, Savvis, SoftLayer and T-Systems. http://t.co/avWKIJVpRZ

— Matt Aslett (@maslett) October 28, 2013

Cloudera announces support for Apache Spark, Cloudera Connect: Innovators partnership with Databricks. http://t.co/hRP7xbMTxH

— Matt Aslett (@maslett) October 28, 2013

DataRPM previews natural language search for embedded business intelligence. http://t.co/fYodlJeCZQ

— Matt Aslett (@maslett) October 29, 2013

Doug Cutting discusses the future of data. http://t.co/27ZwleFDaR

— Matt Aslett (@maslett) November 1, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 26-November 1 2013

The Data Day, A few days: October 19-25 2013

October 25th, 2013 — Data management

Hadoop and Teradata go to the cloud. And more.

For 451 Research clients: Savvis offers new Hadoop-based big-data services http://t.co/Vhyd2VU99Z

— Matt Aslett (@maslett) October 23, 2013

For 451 clients: Glassbeam unveils SaaS analysis stack for the Internet of Things, bags seed money http://t.co/3hjbCH81LM By Krishna Roy

— Matt Aslett (@maslett) October 22, 2013

For 451 clients: Automated Insights heads into narrative-led analytic services for the enterprise http://t.co/lneDn0Qaff By Krishna Roy

— Matt Aslett (@maslett) October 23, 2013

For 451 Research clients: Acunu focuses on self-service analytics for streaming and historical data http://t.co/F19yAZYjfG By Krishna Roy

— Matt Aslett (@maslett) October 24, 2013

For 451 Research clients: Percona aims to match Oracle MySQL with Percona Server 5.6 http://t.co/DPQbC4iryh

— Matt Aslett (@maslett) October 21, 2013

For 451 Research clients: ParStream raises $8m series B to fuel real-time analytic database growth http://t.co/B7vBqdKzJ4

— Matt Aslett (@maslett) October 22, 2013

For 451 Research clients: Hazelcast steps up its open source in-memory data grid ambitions with series A funding http://t.co/noJ1hRxuud

— Matt Aslett (@maslett) October 24, 2013

For 451 Research clients: ScaleArc drives toward expanded market for database traffic management software http://t.co/3sdmj0UyBw

— Matt Aslett (@maslett) October 25, 2013

For 451 clients: Orchestra expands into MDM-related terrain as it plays to a bigger US audience http://t.co/Ns610DejP2 By Krishna Roy

— Matt Aslett (@maslett) October 21, 2013

SAP claims Q3 HANA revenue of €149 million, and over HANA 2,100 customers. http://t.co/nXB2As5vhh

— Matt Aslett (@maslett) October 21, 2013

Informatica reports Q3 net income of $10.4m, on revenue up 24% to $235.4m. http://t.co/JcxJZEpZdd

— Matt Aslett (@maslett) October 25, 2013

Software AG reports Q3 net income of €31.1m on revenue down 7% to €238.5m http://t.co/mxLuQ2WOfE Includes €10m+ from 'big data' products

— Matt Aslett (@maslett) October 25, 2013

QlikTech reports Q3 net income of $3.0m on revenue up 21% to $104.1m. http://t.co/cMwloPZsSt

— Matt Aslett (@maslett) October 25, 2013

Attunity reports Q3 net income of $0.7m on revenue up 11% to $6.6m. http://t.co/ogYIvFxkty

— Matt Aslett (@maslett) October 25, 2013

Intel Capital leads $20m investment in SkySQL to grow MariaDB. http://t.co/2kywf3iJDW

— Matt Aslett (@maslett) October 23, 2013

CoolaData closes $7.5m series A round. http://t.co/7X8SGmTFxw

— Matt Aslett (@maslett) October 22, 2013

Sqrrl Data closes $5.2m series A round, releases version 1.2 of Sqrrl Enterprise. http://t.co/hPwIMmlQd2

— Matt Aslett (@maslett) October 21, 2013

Nutonian raises $4M to 'uncover truth from chaos' in big data http://t.co/rhPKfAFMB5

— Matt Aslett (@maslett) October 23, 2013

Hortonworks Data Platform 2.0 now generally available. http://t.co/6Wdymi6u6D

— Matt Aslett (@maslett) October 23, 2013

Pivotal launches Pivotal HD 1.1 and Pivotal GemFire XD http://t.co/YK2715ezJb Pivotal Data Dispatch http://t.co/TZaL8WkLxH

— Matt Aslett (@maslett) October 23, 2013

Savvis introduces Big Data Solutions – based on managed services for Cloudera and MapR. http://t.co/uiOm6zMM4U

— Matt Aslett (@maslett) October 22, 2013

Virtustream launches HANA-Hadoop Managed Service based on SAP HANA and Intel Distribution for Apache Hadoop. http://t.co/FxlR27l1Tn

— Matt Aslett (@maslett) October 23, 2013

Teradata introduces Teradata Cloud including Teradata Database, Aster Discovery Platform and Hadoop as a Service. http://t.co/gvMqCnD7Zk

— Matt Aslett (@maslett) October 21, 2013

Teradata will add JSON support to Teradata Database during the second quarter of 2014, targeting Internet of Things. http://t.co/pvih4HuSXB

— Matt Aslett (@maslett) October 21, 2013

Teradata introduces Data Warehouse Appliance 2750 http://t.co/CBQuHhvIpu and Teradata Extreme Data Platform 1700. http://t.co/RMSUUi3N3L

— Matt Aslett (@maslett) October 21, 2013

Teradata introduces new Teradata Data Stream Architecture for backup and recovery. http://t.co/9gCUtGgjQm

— Matt Aslett (@maslett) October 22, 2013

Glassbeam has launched Glassbeam SCALAR, a cloud-based platform for machine data analytics. http://t.co/6GJSRry7IN

— Matt Aslett (@maslett) October 23, 2013

Platfora delivers Platfora Big Data Analytics Platform 3.0, including event-stream analytics. http://t.co/TfjJajfB6y

— Matt Aslett (@maslett) October 23, 2013

SAP unveils service pack 7 for SAP HANA http://t.co/ymTj3acLyd strategic partnership with SAS. http://t.co/YCUe37VcXv

— Matt Aslett (@maslett) October 22, 2013

Capgemini forms global Hadoop partnership with Cloudera. http://t.co/HIZ9HgfRJJ

— Matt Aslett (@maslett) October 25, 2013

Basho and Seagate partner on Riak-based scale-out cloud storage. http://t.co/r1C9hUdi3C

— Matt Aslett (@maslett) October 22, 2013

WhereScape and Teradata sign global reseller agreement. http://t.co/sXprtI3J0z

— Matt Aslett (@maslett) October 21, 2013

Facebook explains how it scale MySQL with MySQL Pool Scanner. http://t.co/h1hs05CrDs

— Matt Aslett (@maslett) October 24, 2013

Splice Machine seeks evaluators for Hadoop-based transactional SQL database. http://t.co/Vg4yZOSy6e

— Matt Aslett (@maslett) October 23, 2013

Distributed Caching is Dead – Long Live… http://t.co/Q2oHSweAfb

— Matt Aslett (@maslett) October 21, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 19-25 2013

7 Hadoop questions. Q7: Hadoop’s role

October 23rd, 2013 — Data management

What is the point of Hadoop? It’s a question we’ve asked a few times on this blog, and continues to be a significant question asked by users, investors and vendors about Apache Hadoop. That is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

As I explained during our keynote presentation at the inaugural Hadoop Summit Europe earlier this year, our research suggests there are hundreds of potential workloads that are suitable for Hadoop, but three core roles:

Big data storage: Hadoop as a system for storing large, unstructured, data sets
Big data processing/integration: Hadoop as a data ingestion/ETL layer
Big data analytics: Hadoop as a platform new new exploratory analytic applications

And we’re not the only ones that see it that way. This blog from Cloudera CTO Amr Awadallah outlines three very similar, if differently-named use-cases (Transformation, Active Archive, and Exploration).

In fact, as I also explained during the Hadoop Summit keynote, we see these three roles as a process of maturing adoption, starting with low cost storage, moving on to high-performance data aggregation/ingestion, and finally exploratory analytics.

As such it is interesting to view the current results of our Hadoop survey, which show that the highest proportion of respondents that have implemented or plan to implement Hadoop (63%) for data analytics, followed by 48% for data integration and 43% for data storage.

This would suggest that our respondents include some significantly early Hadoop adopters. I look forward to properly analysing the results to see what they can tell us, but in the meantime it is interesting to note that the percentage of respondents using Hadoop for analytics is significantly higher among those that adopted Hadoop prior to 2012 (88%) compared to those that adopted in in 2012 or 2013 (65%).

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q7: Hadoop’s role

The Data Day, A few days: October 12-18 2013

October 18th, 2013 — Data management

Apache Hadoop 2 goes GA. Teradata cuts guidance. And more

For 451 Research clients: Teradata adds graph analytics and file store to Aster Discovery Platform http://t.co/rPpnuvfhhc

— Matt Aslett (@maslett) October 14, 2013

For 451 Research clients: Couchbase goes mobile with JSON Anywhere NoSQL database strategy http://t.co/IXKoNmh1Ia

— Matt Aslett (@maslett) October 17, 2013

For 451 Research clients: Hadapt launches 'schemaless SQL' to cover the spectrum of SQL-NoSQL analytics http://t.co/nZI9jlODBa

— Matt Aslett (@maslett) October 16, 2013

For 451 Research clients: SQLstream launches visualization offering for streaming analytics http://t.co/FZXwgxh3zM

— Matt Aslett (@maslett) October 18, 2013

For 451 Research clients: It's good to share: 1010data discovers the key to increasing its customer count http://t.co/3QlYvPp0gN

— Matt Aslett (@maslett) October 15, 2013

The Apache Software Foundation Announces Apache Hadoop 2. http://t.co/tTk82zG73K

— Matt Aslett (@maslett) October 16, 2013

Teradata Lowers Guidance for 2013 http://t.co/LE4jbmwfOh Q3 revenue declined 21% in Asia Pac, 19% in Middle East and Africa.

— Matt Aslett (@maslett) October 15, 2013

SAS Institute and Hortonworks expand strategic alliance for increased joint marketing, R&D and customer support. http://t.co/0SVamNtxg4

— Matt Aslett (@maslett) October 18, 2013

Great interview with Doug Cutting about the past, present and future of Hadoop. http://t.co/cF6xM3tmQD

— Matt Aslett (@maslett) October 18, 2013

Syncsort's Data Protection Business Acquired by Executive Management, Bedford Venture Partners and Windcrest Partners http://t.co/QsD7bAbyqJ

— Matt Aslett (@maslett) October 16, 2013

Talend will transition its Talend Open Studio product family to the Apache License. http://t.co/MXiF8mn68W

— Matt Aslett (@maslett) October 15, 2013

NuoDB has launched version 2.0 of its distributed database. http://t.co/y0vmO9HO7q

— Matt Aslett (@maslett) October 16, 2013

Acunu Launches Acunu Analytics 5.0 for Cassandra http://t.co/pc3PdwUJ2D

— Matt Aslett (@maslett) October 17, 2013

Hortonworks will include Apache Storm in the Hortonworks Data Platform in Q1 of 2014. http://t.co/s7aeYYK9Mb Preview coming in Q4 2013.

— Matt Aslett (@maslett) October 15, 2013

DataStax introduces DevCenter – a free visual query tool for creating and running Cassandra Query Language queries. http://t.co/2r3FSog67f

— Matt Aslett (@maslett) October 17, 2013

Calpont Launches InfiniDB 4 and InfiniDB for the Cloud http://t.co/HUx8Pg1fv8 adopts GPLv2 http://t.co/VF7dasfwws

— Matt Aslett (@maslett) October 15, 2013

Calpont announces Jack McDonnell as new CEO. http://t.co/32gLS1G36V

— Matt Aslett (@maslett) October 15, 2013

Kapow Software is working with Oracle on data integration for Oracle Endeca Information Discovery. http://t.co/4M0t5UzfUC

— Matt Aslett (@maslett) October 18, 2013

Alteryx and Revolution Analytics integrate for R-based predictive analytics. http://t.co/jYGLGy3gEy

— Matt Aslett (@maslett) October 15, 2013

Zettaset files trade secret misappropriation lawsuit against Intel related to Hadoop management software. http://t.co/dQ11TRVYfX

— Matt Aslett (@maslett) October 14, 2013

Tibco assembles its data management portfolio into a 'big data architecture'. http://t.co/Jh19px4gSx

— Matt Aslett (@maslett) October 15, 2013

AquaFold releases Aqua Data Studio 14, adding support for MongoDB and Cassandra, Hive and Microsoft’s SQL Azure. http://t.co/hAKiB9sPgi

— Matt Aslett (@maslett) October 15, 2013

Facebook’s vs Twitter’s Approach to Real-Time Analytics http://t.co/aZAAyeVvR7

— Matt Aslett (@maslett) October 14, 2013

Adding ACID to Apache Hive http://t.co/kzvNjzn1pt

— Matt Aslett (@maslett) October 14, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: October 12-18 2013

7 Hadoop questions. Q6: Hadoop’s shortcomings

October 16th, 2013 — Data management

What are the major shortcomings of Hadoop? The answer to that questions looks set to shape the future development roadmap for the open source data processing framework, which is why it is one of the major questions being asked as part of our 451 Research 2013 Hadoop survey.

The limitations of Hadoop have been widely reported over the years, but as the Apache Hadoop community and related vendors have responded to issues such as reliability and high availability – not least via the now generally available Apache Hadoop 2 – so attention turns to other areas such as security, administration and performance, as well as more advanced functionality requirements, including graph processing, stream processing, improved SQL support and virtualization support.

The list of potential improvements is therefore fairly long, and as we near the end of our survey it is interesting to see that the list of key advances respondents are looking for in order to increase adoption of Hadoop is fairly widespread.

So far the responses to our Hadoop survey suggest administration tooling and performance top the list, followed by reliability, SQL support and backup and recovery, but development tools and authentication and access control are not far behind.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

Comments Off on 7 Hadoop questions. Q6: Hadoop’s shortcomings

Hadoop: why enterprises need something to aspire to

The Data Day, A few days: January 11-17 2014

The Data Day, A few days: December 13-10 2013

Visualizing the $1bn+ VC investment in Hadoop and NoSQL

The Data Day, A few days: December 6-12 2013

The Data Day, A few days: October 26-November 1 2013

The Data Day, A few days: October 19-25 2013

7 Hadoop questions. Q7: Hadoop’s role

The Data Day, A few days: October 12-18 2013

7 Hadoop questions. Q6: Hadoop’s shortcomings

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives