hadoop — Too much information

Forthcoming Webinar: Get Down to Serious Business with Hadoop

July 9th, 2013 — Data management

On Wednesday, July 17, at 11:00am ET / 8:00am PT, I’ll be taking part in a webinar in association with MarkLogic on the subject of Hadoop.

As we’ve stated a few times, we believe that the flexibility of Apache Hadoop is one of its biggest assets – enabling organizations to generate value from data that was previously considered too expensive to be stored and processed in traditional databases – but it also results in “Hadoop” meaning different things to different people.

The result is that organizations still struggle over which Hadoop ecosystem components to adopt in order to obtain the greatest value, which application workloads might be suitable for deployment on Hadoop, and how to deploy Hadoop in conjunction with existing relational and non-relational databases.

On the webinar I’ll be providing an overview of the current state of the Hadoop ecosystem, geographic adoption, use cases, while MarkLogic’s Director of Product Management Justin Makeig to will provide an introduction to complementary technology from MarkLogic that can help your organization achieve real-time analysis, transactional data updates, integrity, granular security, and full-text search.

For full details, and to register, click here.

Comments Off on Forthcoming Webinar: Get Down to Serious Business with Hadoop

The Data Day, A few days: June 26-28 2013

June 28th, 2013 — Data management

Hortonworks raises $50m, previews next-generation Hadoop. And more

For 451 Research clients: Hortonworks raises $50m, previews next-generation Hadoop http://t.co/Ps5TNmca5H

— Matt Aslett (@maslett) June 28, 2013

For 451 Research clients: Splunk launches exploratory analytics platform for Hadoop http://t.co/d59uW67k7D

— Matt Aslett (@maslett) June 27, 2013

For 451 Research clients: RainStor turns attention to Hadoop security for analytic database update http://t.co/KTk2uK1EV7

— Matt Aslett (@maslett) June 26, 2013

For 451 Research clients: Teradata launches Hadoop portfolio via Hortonworks reseller deal http://t.co/MnIkAO4pIH Analyst note.

— Matt Aslett (@maslett) June 28, 2013

For 451 clients: ADVIZOR eyes growth in OEM business, revamps visual discovery and analysis wares http://t.co/9OODSupm7o By Krishna Roy

— Matt Aslett (@maslett) June 26, 2013

Hortonworks raises $50m from Tenaya Capital and Dragoneer Investment Group, along with Benchmark, Index and Yahoo http://t.co/gKSPhWLLBS

— Matt Aslett (@maslett) June 26, 2013

Rackspace launches Rackspace Big Data service powered by Hortonworks, Pentaho and Karmasphere. http://t.co/7m73fVEvzm

— Matt Aslett (@maslett) June 28, 2013

Hortonworks announces community preview of HDP 2.0 http://t.co/MihEPdv32l and reseller agreement with NetApp. http://t.co/L6J2dFDW6t

— Matt Aslett (@maslett) June 27, 2013

VMware announces vSphere Big Data Extensions Beta to provision, manage, and monitor Hadoop clusters on vSphere. http://t.co/ye9Vf1b7gi

— Matt Aslett (@maslett) June 28, 2013

WANdisco announces preview of support for Spark and Shark as an add-on to WANdisco Distro 3.6. http://t.co/WZ2TSIorS1

— Matt Aslett (@maslett) June 26, 2013

WANdisco delivers S3-enabled HDFS http://t.co/N86Y2d0y5B and Non-Stop NameNode WAN Edition http://t.co/3jowZsQHgA

— Matt Aslett (@maslett) June 26, 2013

Kognitio launches version 8 of the Kognitio Analytical Platform. http://t.co/EKsMSVMmDB

— Matt Aslett (@maslett) June 28, 2013

Pentaho adds Big Data Integration layer and launches Pentaho Labs. http://t.co/433w7VpiIF

— Matt Aslett (@maslett) June 26, 2013

Splunk announces beta version of Hunk: Splunk Analytics for Hadoop. http://t.co/gdiD5F0eOZ

— Matt Aslett (@maslett) June 26, 2013

Teradata is now reselling Hortonworks Dat Platform as part of the Teradata Portfolio for Hadoop. http://t.co/M27WwXeQ0b

— Matt Aslett (@maslett) June 26, 2013

EnterpriseDB introduces RemoteDBA services for PostgreSQL. http://t.co/jQjiUmsyUR

— Matt Aslett (@maslett) June 26, 2013

Couchbase has announced the general availability of Couchbase Server 2.1. http://t.co/rvC1sN1u0U

— Matt Aslett (@maslett) June 26, 2013

Zettaset embeds Informatica PowerCenter Big Data Edition within Zettaset Orchestrator Hadoop cluster management. http://t.co/AOyZaqHtxe

— Matt Aslett (@maslett) June 26, 2013

DataTorrent raises $8m Series A funding http://t.co/xFuaxHcuto launches Hadoop-based streaming platform. http://t.co/i75p3pFk2E

— Matt Aslett (@maslett) June 26, 2013

Cloudera announces high-speed Teradata Connector. http://t.co/v2u6ySZovk

— Matt Aslett (@maslett) June 28, 2013

Revelytix announced early access availability of Loom Dataset Management for Hadoop. http://t.co/G7u7ZrSayv

— Matt Aslett (@maslett) June 28, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: June 26-28 2013

The Data Day, A few days: June 6-10 2013

June 10th, 2013 — Data management

Beyond Hadoop. IBM embraces MongoDB. And more.

For 451 Research clients: Beyond Hadoop: distributors prepare to do battle for next-generation data platforms bit.ly/13RaLVA

— Matt Aslett (@maslett) June 10, 2013

For 451 clients: Skytree lands $18m series A to spread machine-learning roots for advanced analysis bit.ly/110KD8M By Krishna Roy

— Matt Aslett (@maslett) June 6, 2013

For 451 clients: Informatica invigorates brand with Vibe and PowerCenter Express bit.ly/110Kx0T By @carllehmann1 and Krishna Roy

— Matt Aslett (@maslett) June 6, 2013

For 451 Research clients: Amazon RDS reaches general availability with 99.95% availability SLA bit.ly/13RaIcD Analyst note.

— Matt Aslett (@maslett) June 10, 2013

For 451 Research clients: IBM embraces MongoDB to tap into next-gen Web and mobile apps bit.ly/194ssqx Analyst note.

— Matt Aslett (@maslett) June 7, 2013

Updated Database Landscape map – June 2013 bit.ly/15Qzgnb Spot the difference

— Matt Aslett (@maslett) June 10, 2013

IBM and 10gen confirm their collaboration prn.to/194svCJ

— Matt Aslett (@maslett) June 7, 2013

Talend’s Platform for Big Data and Open Studio for Big Data are now certified for use with Couchbase Server. bit.ly/15QtWAn

— Matt Aslett (@maslett) June 10, 2013

Amazon RDS is now generally available amzn.to/194sBub with an SLA amzn.to/194sJdd and everything.

— Matt Aslett (@maslett) June 7, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: June 6-10 2013

Updated Database Landscape map – June 2013

June 10th, 2013 — Data management

I am planning on doing a major overhaul of this during the second half of the year, with a specific focus on the Hadoop sector, but in the interim, here’s the latest June 2013 update to our Database Landscape map.

Note: the latest update to the map is available here.

Comments Off on Updated Database Landscape map – June 2013

The Data Day, A few days: April 22-26 2013

April 26th, 2013 — Data management

Pivotal launches. SkySQL and Mony Program merge. And much, much more

Our report on the changes in the MySQL ecosystem is now available for 451 clients and non-clients alike at bit.ly/451mysql

— Matt Aslett (@maslett) April 25, 2013

For 451 Research clients: VMware expands Serengeti’s horizons with updated Hadoop virtualization project bit.ly/17muQFI

— Matt Aslett (@maslett) April 26, 2013

For 451 Research clients: SkySQL, Monty Program merge to support MariaDB following formation of MariaDB Foundation bit.ly/10dsdjf

— Matt Aslett (@maslett) April 24, 2013

For 451 Research clients: With two funding rounds and an acquisition, Guavus expands its addressable market bit.ly/11Fd4LH

— Matt Aslett (@maslett) April 25, 2013

For 451 Research clients: Pentaho snags Webdetails to get more visual and business user-friendly bit.ly/13VDxIW

— Matt Aslett (@maslett) April 23, 2013

For 451 Research clients: Digital Infrastructure: what it is and why you need a DI strategy bit.ly/YFI9M2 By the 451 collective

— Matt Aslett (@maslett) April 22, 2013

Pivotal announced that GE plans to make a $105m strategic investment, representing a 10% equity stake. bit.ly/10dstP6

— Matt Aslett (@maslett) April 24, 2013

Actian acquires ParAccel. bit.ly/11Fchu8

— Matt Aslett (@maslett) April 25, 2013

SAP HANA contributed €86 million to SAP’s software revenue in Q1. bit.ly/12BljcG

— Matt Aslett (@maslett) April 22, 2013

Informatica reports Q1 revenue up 9% to $214.3m. bit.ly/ZQ0o1B

— Matt Aslett (@maslett) April 25, 2013

QlikTech reports Q1 revenue up 22% to $96.5m. bit.ly/15WgOfU

— Matt Aslett (@maslett) April 26, 2013

SkySQL today announced that it has signed a merger agreement with Monty Program Ab bit.ly/17g6ygy

— Matt Aslett (@maslett) April 23, 2013

Tokutek goes open source, making the source code for TokuDB v7 freely available under the GPLv2. bit.ly/12BkOPP

— Matt Aslett (@maslett) April 22, 2013

Continuent Tungsten Replicator Is Now 100% Open Source bit.ly/ZI9OMy

— Matt Aslett (@maslett) April 22, 2013

Press Release: @qubole Closes Series A Funding, Reaches Half Petabyte of Data Processed. Read more here: qubole.com/press-releases

— qubole (@qubole) April 23, 2013

GenieDB has announced GenieDB Enterprise 2.0, including compatibility With MySQL 5.6. mwne.ws/10dsyST

— Matt Aslett (@maslett) April 24, 2013

Continuent Announces New Continuent Tungsten 2.0 bit.ly/ZKoNp3

— Matt Aslett (@maslett) April 23, 2013

MemSQL has announced the GA of the distributed version of its in-memory database and real-time analytics platform bit.ly/ZKnZAI

— Matt Aslett (@maslett) April 23, 2013

Pentaho Acquires Dashboard and UI Specialist Partner Webdetails bit.ly/12BknoW

— Matt Aslett (@maslett) April 22, 2013

Announcing the MySQL Applier for Apache Hadoop bit.ly/ZMUm0K

— Matt Aslett (@maslett) April 23, 2013

Revolution Analytics launches Revolution R Enterprise 6.2. bit.ly/10dsmTN

— Matt Aslett (@maslett) April 24, 2013

Clustrix launches Clustrix 5.0 on Amazon Web Services. mwne.ws/10dsgLX

— Matt Aslett (@maslett) April 24, 2013

Comments Off on The Data Day, A few days: April 22-26 2013

The Data Day, A few days: April 15-19 2013

April 19th, 2013 — Data management

‘Information governance’ in the era of big data. MariaDB Foundation takes next steps. And more.

Great report for 451 clients: ‘Information governance’ in the era of the cloud and big data By @carllehmann1 @davidhorrigan @alanpelzsharpe

— Matt Aslett (@maslett) April 19, 2013

For 451 Research clients: Deep Information Sciences raises $10m for general-purpose ‘big data’ database bit.ly/17FQ0zN

— Matt Aslett (@maslett) April 17, 2013

For 451 clients: Javlin elucidates CloverETL strategy as it continues to take aim at data integration bit.ly/119rEIn By Krishna Roy

— Matt Aslett (@maslett) April 16, 2013

ANNOUNCEMENT: MariaDB Foundation Takes Next Steps To Community Governance j.mp/YwKRU1

— MariaDB (@mariadb) April 18, 2013

Guavus Raises Another $9m, Brings Total Funding to $87m. bit.ly/XUBvGt

— Matt Aslett (@maslett) April 19, 2013

Teradata updates Active EDW, adds Mellanox InfiniBand bit.ly/10WHNBZ and Hadoop bit.ly/10WHUxi to Unified Data Architecture

— Matt Aslett (@maslett) April 15, 2013

Hortonworks, Mirantis and Red Hat Team to Simplify Deployment and Management of Apache Hadoop on OpenStack bit.ly/ZocZfk

— Matt Aslett (@maslett) April 16, 2013

ScaleOut Software launched ScaleOut hServer, enabling Hadoop analysis of grid-based data. mwne.ws/Zodg1z

— Matt Aslett (@maslett) April 16, 2013

Announcing Local Secondary Indices – Expanding the Cloud: Faster, More Flexible Queries with #DynamoDB #AWS wv.ly/17IXlys

— Werner Vogels (@Werner) April 18, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: April 15-19 2013

Forthcoming webinar on ‘big data’ and the ‘single version of the truth’

April 17th, 2013 — Data management

Many enterprises were persuaded to adopt enterprise data warehousing (EDW) technology to achieve a ‘single version of the truth’ for enterprise data.

In reality, the promises were rarely fulfilled with many stories of failed, lengthy and over budget projects. Even if an EDW project reached deployment, the warehouse schema is designed to answer a specific set of queries and is inflexible to change and accommodate growing variety of data.

On April 30 at 1pm ET I’ll be taking part in a webinar with NGDATA to discuss whether ‘big data’ technologies such as Hadoop, HBase and Solr can deliver on the promise of “single version of truth” by providing a real-time, 360° view of customers and products.

In this webinar, you will learn:

Why the inflexibility of EDWs failed to deliver 360° view
How big data technologies can finally make 360° view a reality
Overview of an interactive Big Data management solution
Best practices and success stories from leading companies

For more details, and to register, click here.

Comments Off on Forthcoming webinar on ‘big data’ and the ‘single version of the truth’

The Data Day, A few days: April 9-12 2013

April 12th, 2013 — Data management

Funding for MarkLogic and ParElastic. And more

For 451 Research clients: IBM accelerates ‘big data’ portfolio, launches Hadoop appliance bit.ly/YN4Xgc

— Matt Aslett (@maslett) April 11, 2013

For 451 clients: Quid applies machine learning and visualization to Hadoop-based analysis service bit.ly/XvQd4z By Krishna Roy

— Matt Aslett (@maslett) April 12, 2013

MarkLogic raises $25m led by Sequoia Capital and Tenaya Capital, with Northgate Capital and CEO Gary Bloom. mwne.ws/12HvG15

— Matt Aslett (@maslett) April 10, 2013

ParElastic Raises $5.7M in Series A Round bit.ly/10JO041

— Matt Aslett (@maslett) April 9, 2013

Oracle launches Big Data Appliance X3-2 Starter Rack and In-Rack Expansion. mwne.ws/XCs0LY

— Matt Aslett (@maslett) April 10, 2013

Oracle has launched Endeca Information Discovery 3.0 and Business Intelligence Foundation Suite Release 11.1.1.7 mwne.ws/10JPZph

— Matt Aslett (@maslett) April 9, 2013

Predixion Software has launched Predixion Enterprise Insight 3.0. bit.ly/YN52k7

— Matt Aslett (@maslett) April 11, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: April 9-12 2013

The Data Day, A few days: March 11-14 2013

March 14th, 2013 — Data management

SAP’s predictive analytics plans. Dell’s Boomi MDM. And more

For 451 Research clients: SAP sheds light on predictive-analytics business bit.ly/ZEc5eu By Krishna Roy

— Matt Aslett (@maslett) March 12, 2013

For 451 Research clients: Dell Boomi launches into MDM with cloud service for the midmarket bit.ly/ZEcjSK By Krishna Roy

— Matt Aslett (@maslett) March 12, 2013

Teradata delivers Teradata Data Warehouse Appliance 2700. prn.to/Yre3ug

— Matt Aslett (@maslett) March 13, 2013

MIT researchers are developing a system called DBSeer to improve the efficiency of databases in cloud environments. bit.ly/10GAVcx

— Matt Aslett (@maslett) March 13, 2013

Twitter and Cloudera are open-sourcing Parquet: columnar storage format for Hadoop. Take a look! parquet.github.com

— Dmitriy Ryaboy (@squarecog) March 12, 2013

Scality integrates its RING storage software with Hadoop. bit.ly/YbfUaB

— Matt Aslett (@maslett) March 14, 2013

Hadapt adds Netezza co-founder Jit Saxena to Board of Directors. prn.to/Xardwd

— Matt Aslett (@maslett) March 14, 2013

What it means to be “all in” on Hadoop bit.ly/Y4Svre This isn’t about which vendor has the most Hadoop committers.

— Matt Aslett (@maslett) March 11, 2013

And that’s the data day, today.

Comments Off on The Data Day, A few days: March 11-14 2013

What it means to be “all in” on Hadoop

March 11th, 2013 — Data management

Pivotal HD is not Hadoop
Neither is Cloudera’s Distribution, including Apache Hadoop.
Nor the Hortonworks Data Platform.
Nor the MapR Distribution.
Nor IBM’s InfoSphere BigInsights.
Nor the WANdisco Distro.
Nor Intel’s Distribution for Apache Hadoop.

Apache Hadoop is Hadoop. And Hadoop is Apache Hadoop.

I don’t write that to be pedantic, or controversial, but because it is the only logical conclusion you can reach after reading Defining Apache Hadoop from the Apache Hadoop Wiki.

“The key point is that the only products that may be called Apache Hadoop or Hadoop are the official releases by the Apache Hadoop project as managed by that Project Management Committee (PMC)… Products that are derivative works of Apache Hadoop are not Apache Hadoop, and may not call themselves versions of Apache Hadoop, nor Distributions of Apache Hadoop.”

It is with this in mind that one should view the reaction to EMC Greenplum’s recent launch of of Pivotal HD; and in particular this statement from Scott Yara, EMC Greenplum senior Vice President, Products and Co-Founder:

“We’re all in on Hadoop, period.”

What does it mean to be “all in on Hadoop”? Based on a strict reading of Defining Apache Hadoop (a document that demands by its own words to be read strictly), being “all in” on Hadoop means only one thing: being “all in” on Apache Hadoop.

I have no doubt that EMC Greenplum is “all in” on Pivotal HD, but that’s not the same thing at all.

Not a purity debate

There is nothing wrong with offering additional functionality beyond the scope of Apache Hadoop – the licensing terms clearly encourage it.

As my fellow analyst Merv Adrian notes:

“Having some components of your solution stack provided by the open source community is a fact of life and a benefit for all. So are roads, but nobody accuses Fedex or your pizza delivery guy of being evil for using them without contributing some asphalt.”

That is true. However, to continue the analogy, you would expect any company that claimed to be “all in on roads” to be getting involved in laying and maintaining them, rather than just driving on top of them.

Despite what some people may think this isn’t a matter of arguing about which vendor has the most Hadoop committers. It is a matter of defining what users understand Hadoop to be, and what they understand it not to be. It is a matter of drawing a line between Hadoop – Apache Hadoop – and additional, proprietary, functionality beyond the scope of the project.

User preference

Whether users will choose to go with a pure approach to Hadoop-based products and services is another matter. Dan Woods, for one, clearly believes that products like Pivotal HD will drive further mainstream adoption beyond “the limits of open source.”

The idea is that most enterprises don’t care if it meets the Apache definition of Hadoop or not, as long as it works.

While I have no doubt that some companies will be drawn to the additional features and confidence that vendors such as EMC and Intel can provide, I have also spoken to multiple enterprises – including one very large enterprise just last week – for which the preference is to default to open in order to avoid any potential for lock-in and vendor-specific architecture choices.

There are many vendors that do very much care whether what they are adopting meets the Apache definition of Hadoop.

Which of these attitudes will dominate? I’m not going to pretend I know the answer to that question at this point, but our previous coverage of open source adoption suggests that once the door to openness has been unlocked its very hard to force it shut again.

Dan Woods responded to my (sarcastic) comment about this as follows:

@maslett Linux is an enterprise product. The use-value players (IBM, HP, Intel) took it over, invested, and adapted it to enterprise needs.

— Dan Woods (@danwoodscito) March 5, 2013

I would dispute that players like IBM, HP, and Intel “took Linux over” but in any case it is undeniable that they had a significant role to play – alongside Red Hat, Novell et al, and individual developers – in turning Linux into an enterprise-grade operating system.

The point is though that they did so by engaging with the Linux project, not by launching their own differentiated versions of Linux.

1 Comment

Forthcoming Webinar: Get Down to Serious Business with Hadoop

The Data Day, A few days: June 26-28 2013

The Data Day, A few days: June 6-10 2013

Updated Database Landscape map – June 2013

The Data Day, A few days: April 22-26 2013

The Data Day, A few days: April 15-19 2013

Forthcoming webinar on ‘big data’ and the ‘single version of the truth’

The Data Day, A few days: April 9-12 2013

The Data Day, A few days: March 11-14 2013

What it means to be “all in” on Hadoop

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives