Kudu — Too much information

The Data Day: February 3, 2017

February 3rd, 2017 — Data management

The Data Platforms and Analytics massacre

For @451Research clients: @CaskData completes evolution to data integration platform with CDAP 4 https://t.co/DGuUm7f2NP

— Matt Aslett (@maslett) February 3, 2017

For @451Research clients: @Cloudera runs with @ApacheKudu, integrates it with Enterprise Data Hub https://t.co/3JtkHGKy2K By @jmscrts

— Matt Aslett (@maslett) January 30, 2017

For @451Research clients: @Datameer bolsters management to boost enterprise Hadoop analysis appeal https://t.co/fy8r8NCAzy By Krishna Roy

— Matt Aslett (@maslett) February 1, 2017

For @451Research clients: With $25m series B, @PeriscopeData targets data scientists https://t.co/wRfJHmXTOJ By Krishna Roy

— Matt Aslett (@maslett) January 31, 2017

For @451Research clients: @HVR_Software looks to make a name for itself with real-time data replication services https://t.co/xYDryKUT3a

— Matt Aslett (@maslett) January 30, 2017

MicroStrategy’s Q4 net inc was $31.1m on rev down 2% to $140.1m, FY net inc – $90.9m – on rev down 3.3% to $512.2m https://t.co/ZLzgHSZP7r

— Matt Aslett (@maslett) February 1, 2017

Cloudera announces general availability of Apache Kudu with Cloudera Enterprise 5.10 https://t.co/1JAAn3UWtN

— Matt Aslett (@maslett) January 31, 2017

For @451Research clients: @CenturyLink launches managed bare metal big data as a service with @Cloudera https://t.co/YF6WT0lUt9 By @jmscrts

— Matt Aslett (@maslett) February 2, 2017

VoltDB launches v7 of its in-memory SQL database https://t.co/PbuhqXBLOV

— Matt Aslett (@maslett) February 2, 2017

Tableau reported a Q4 net loss of $21.1m on rev up 24% to $250.7m, FY net loss of $144.4m on rev up 27% to $826.9m https://t.co/cCdNxwbrFO

— Matt Aslett (@maslett) February 3, 2017

Oracle has effectively doubled the price of its software running on AWS https://t.co/c2NIX1FaTl

— Matt Aslett (@maslett) January 31, 2017

Transceptor Technology open sources SiriDB time series database https://t.co/XBOX94Pyyq

— Matt Aslett (@maslett) February 2, 2017

Amazon Web Services’ MXNet deep learning project has been accepted as an Apache Incubator project. https://t.co/15vXbxxNP8

— Matt Aslett (@maslett) January 31, 2017

And that’s the data day, today.

Comments Off on The Data Day: February 3, 2017

The Data Day: September 23, 2016

September 23rd, 2016 — Data management

What happened in data platforms and analytics this week will blow your mind

For @451Research clients: @Salesforce goes all-in with machine learning with Project Einstein https://t.co/OrFAdXswvP By @nickpatience

— Matt Aslett (@maslett) September 19, 2016

For @451Research clients: @ContinuumIO aims to support comprehensive open-source-based data science https://t.co/S1wQFmjhRJ By Krishna Roy

— Matt Aslett (@maslett) September 23, 2016

For @451Research clients: @Cisco prepares to expand data management strategy via virtualization https://t.co/Nit5OX5lgg

— Matt Aslett (@maslett) September 19, 2016

For @451Research clients: Post pivot, @Basho picks up momentum while focusing on time-series data https://t.co/WFlc6YNmCN By @jmscrts

— Matt Aslett (@maslett) September 23, 2016

For @451Research clients: @ScaleOut_Inc aims its in-memory data grid at operational intelligence https://t.co/eY5m4AJAxh By @jasonstamper

— Matt Aslett (@maslett) September 23, 2016

For @451Research clients: @aloomainc beams in cloud-based spin on ETL-based data integration https://t.co/QLCWlBMnCq

— Matt Aslett (@maslett) September 23, 2016

InfluxData raised $16m series B financing led by Battery Ventures https://t.co/3VSKkwqELZ

— Matt Aslett (@maslett) September 21, 2016

Greenwave Systems acquires Predixion Software, adding edge analytics to its IoT platform https://t.co/pdRLSrPixK

— Matt Aslett (@maslett) September 21, 2016

Salesforce introduces Salesforce Einstein artificial intelligence capabilities https://t.co/uGvqkUur4v

— Matt Aslett (@maslett) September 19, 2016

Oracle unveils Database 12c Release 2 (amongst other things) https://t.co/RCvLG1pCSx

— Matt Aslett (@maslett) September 19, 2016

Oracle launches Analytics Cloud https://t.co/hOo11tMFQ7 among a variety of new data-related cloud services https://t.co/OHoghBF7yf

— Matt Aslett (@maslett) September 19, 2016

Informatica expands Data Lake Management including Big Data Management, Enterprise Information Catalog https://t.co/1SsvAsI8wJ

— Matt Aslett (@maslett) September 21, 2016

Trifacta releases Trifacta v4, targeting more users, more diverse data sources and more cloud environments. https://t.co/8eJJCfah8f

— Matt Aslett (@maslett) September 20, 2016

IBM and Hortonworks collaborate on Hortonworks Data Platform (HDP) for IBM Power Systems https://t.co/v26stzIOMj

— Matt Aslett (@maslett) September 20, 2016

Kinetica unveils GPU-accelerated database for analyzing streaming data https://t.co/DOlXfvwxgU

— Matt Aslett (@maslett) September 21, 2016

Qubole’s Data Service (QDS) will be available on Oracle Cloud Platform, and Oracle Cloud at Customer https://t.co/a3EX0b9kXX

— Matt Aslett (@maslett) September 20, 2016

Cask pitches forthcoming CDAP 4 as “unified integration platform” for big data. https://t.co/ZLMZKyUSao

— Matt Aslett (@maslett) September 19, 2016

Pepperdata delivers performance monitoring for Amazon EMR, previews adaptive scaling capability https://t.co/9OTwPOn0Tg

— Matt Aslett (@maslett) September 22, 2016

Syncsort delivers open metadata management capabilities in its DMX-h integration software for Hadoop. https://t.co/PqIpwc1KXY

— Matt Aslett (@maslett) September 22, 2016

Apache Software Foundation announces Apache CouchDB v2.0 https://t.co/CKac2Lpnqp

— Matt Aslett (@maslett) September 20, 2016

The Apache Software Foundation announces Apache Kudu v1.0 https://t.co/5mj0J2ZAd4

— Matt Aslett (@maslett) September 20, 2016

And that’s the data day, today.

Comments Off on The Data Day: September 23, 2016

The Data Day, A few days: November 10-23, 2015

November 23rd, 2015 — Data management

Cloudera herds Impala and Kudu to the ASF. And more

For @451Research clients: @cloudera herds Impala and Kudu to Apache Software Foundation https://t.co/uWO2iD1WYi

— Matt Aslett (@maslett) November 17, 2015

For @451Research clients: @H2Oai raises $20m series B to capitalize on rapid open source machine-learning growth https://t.co/NM6wheahS4

— Matt Aslett (@maslett) November 13, 2015

For @451Research clients: @Google uses open source to encourage TensorFlow take-up for deep-learning https://t.co/GRRBqIR995 By Krishna Roy

— Matt Aslett (@maslett) November 10, 2015

For @451Research clients: @SAP adds spark to Spark with HANA Vora in-memory query engine https://t.co/3vLa1Peg2f By Jim Curtis

— Matt Aslett (@maslett) November 23, 2015

For @451Research clients: Cloud or on-premises? @OracleDatabase says both, serves up hybrid strategy https://t.co/kC4qClHzcx By Jim Curtis

— Matt Aslett (@maslett) November 11, 2015

For @451research clients: @Pivotal (almost) completes open source data products makeover with @Greenplum Database https://t.co/8CAS1oyNJE

— Matt Aslett (@maslett) November 11, 2015

For @451Research clients: @Teradata: Enabling analytics in all the right places https://t.co/2AnEAKG2VP By Jim Curtis

— Matt Aslett (@maslett) November 16, 2015

For @451Research clients: @Oracle doubles down on @MySQL investment with 5.7 release https://t.co/oZd9mFyLpO By @jasonstamper

— Matt Aslett (@maslett) November 17, 2015

For @451Research clients: @InsightSquared finds 50 ways to connect SMBs with new data sources https://t.co/M71jWzSooP By Jim Curtis

— Matt Aslett (@maslett) November 16, 2015

For @451Research clients: @Informatica completes v10 release with Big Data Management launch https://t.co/qo5oKEi7aI

— Matt Aslett (@maslett) November 10, 2015

For @451Research clients: @DundasData focuses on expanded market opportunity with BI platform play https://t.co/Nnqxd59nqB By Krishna Roy

— Matt Aslett (@maslett) November 10, 2015

For @451Research clients: @venasolutions courts Excel devotees with perf management cloud service https://t.co/JdLqa9YJ0Y By Krishna Roy

— Matt Aslett (@maslett) November 13, 2015

For @451Research clients: @Cloud9Charts emerges with reporting and visualization service https://t.co/wP9l5Jnml2 By Krishna Roy

— Matt Aslett (@maslett) November 17, 2015

Splunk reports net loss of $73m on Q3 revenue up 50% to $174.4m https://t.co/qXeDDHoHoh

— Matt Aslett (@maslett) November 20, 2015

Cloudera proposes to donate Impala and Kudu to The Apache Software Foundation https://t.co/6o0GySa1Pq

— Matt Aslett (@maslett) November 17, 2015

Fidelity has reportedly written down its investment in MongoDB by 54% https://t.co/441YaMTSIz

— Matt Aslett (@maslett) November 16, 2015

Godfrey Sullivan retires as Splunk CEO, succeeded by Doug Merritt https://t.co/bVj1KykNqT

— Matt Aslett (@maslett) November 20, 2015

Cloudera adds optimization guidance to latest version of Cloudera Enterprise with Cloudera Navigator Optimizer https://t.co/dK5SoitR1y

— Matt Aslett (@maslett) November 19, 2015

Informatica launches Informatica Big Data Management combining integration, data quality, governance, and security https://t.co/zydZWUo8Px

— Matt Aslett (@maslett) November 10, 2015

The Apache Software Foundation announces Apache Cassandra v3.0 https://t.co/pVttFV0V9H

— Matt Aslett (@maslett) November 10, 2015

Glassbeam updates machine data analytics platform to version 4.8 https://t.co/vVCyg3s9Ph

— Matt Aslett (@maslett) November 11, 2015

Tesora Database as a Service adds MySQL Enterprise, enhances support for Oracle 12c https://t.co/9xJOp0i1Qj

— Matt Aslett (@maslett) November 11, 2015

Podium Data launches Podium 2.0 data lake management platform https://t.co/G2PpuMPnC8

— Matt Aslett (@maslett) November 10, 2015

Splice Machine announces version 2.0 of its RDBMS, powered by Hadoop and Spark https://t.co/LmHUnj8LFO

— Matt Aslett (@maslett) November 17, 2015

WANdisco updates Fusion Platform for active-active Hadoop replication and continuous availability https://t.co/z9MqbItigO

— Matt Aslett (@maslett) November 19, 2015

And that’s the data day, today.

Comments Off on The Data Day, A few days: November 10-23, 2015

The Data Day, A few days: September 26-October 2, 2015

October 2nd, 2015 — Data management

Strata+Hadoop World special

For @451Research clients: @Cloudera introduces #Kudu, a new #Hadoop storage engine for fast analytics http://t.co/lbsRPCM027

— Matt Aslett (@maslett) September 28, 2015

For @451Research clients: @MapR adds JSON document support to its Hadoop-native NoSQL database http://t.co/AiZN3tXiRp

— Matt Aslett (@maslett) September 29, 2015

Hortonworks has announced the availability of the Hortonworks DataFlow (HDF) support subscription for Apache NiFI. http://t.co/FgZHFp9N8f

— Matt Aslett (@maslett) September 28, 2015

For @451Research clients: @Talend adds Spark to data integration platform with version 6.0 http://t.co/U3Ljr28I6i

— Matt Aslett (@maslett) September 30, 2015

For @451Research clients: @impetustech tempts data-streaming developers with free @StreamAnalytix http://t.co/3nqThykTe8 By @jasonstamper

— Matt Aslett (@maslett) October 1, 2015

For @451Research clients: In race to Hadoop, @VeristormInc aims no mainframe data gets left behind http://t.co/Ljsg8JpP27 By Jim Curtis

— Matt Aslett (@maslett) September 29, 2015

MapR has announced the developer preview of MapR-DB with native support for JSON http://t.co/ObnGgLmETF

— Matt Aslett (@maslett) September 29, 2015

The ODPi (aka Open Data Platform initiative) becomes a collaborative project of The Linux Foundation http://t.co/F0IjeSweVO

— Matt Aslett (@maslett) September 28, 2015

Microsoft expands Azure Data Lake with Azure Data Lake Analytics http://t.co/PwD14zAlrn

— Matt Aslett (@maslett) September 28, 2015

Apache Drill gurus at Dremio raise more than $10M from Redpoint and Lightspeed http://t.co/flOx5AcoUT

— Matt Aslett (@maslett) September 28, 2015

Infoworks raises $5m series A http://t.co/OZgDKFD6yw launches Dynamic Data Warehousing Platform for Hadoop http://t.co/dJMA7uy6TS

— Matt Aslett (@maslett) September 30, 2015

Talend launches version 6 of its data integration platform, with native support for Apache Spark and Spark Streaming. http://t.co/FxWovxrHml

— Matt Aslett (@maslett) September 30, 2015

Pivotal contributes HAWQ analytics engine and MADlib data science tools to the Apache Software Foundation. http://t.co/Vy3uHLadfu

— Matt Aslett (@maslett) September 29, 2015

Pentaho (a Hitachi Data Systems company) previews version 6.0 of its big data integration and analytics platform. http://t.co/YmHJTvNSWC

— Matt Aslett (@maslett) September 29, 2015

Aerospike releases version 3.6 of its NoSQL database with enhanced support for Spark and Hadoop. http://t.co/stVSrgVrV9

— Matt Aslett (@maslett) September 29, 2015

Amazon Web Services launches Amazon Elasticsearch Service http://t.co/lWQFXinv11

— Matt Aslett (@maslett) October 2, 2015

SAP updates Adaptive Server Enterprise http://t.co/asjaU6wxWA

— Matt Aslett (@maslett) September 29, 2015

RapidMiner launches Radoop v2.6, with support for incorporating SparkR and PySpark scripts. http://t.co/al0jjziI2f

— Matt Aslett (@maslett) September 29, 2015

Syncsort’s DMX-h data integration software integrates with Apache Kafka and Apache Spark http://t.co/XGWCiuq2IK

— Matt Aslett (@maslett) September 29, 2015

Dataiku integrates its Data Science Studio with Apache Spark. http://t.co/n5IcEx2sHP

— Matt Aslett (@maslett) September 29, 2015

Attunity launches Visibility 7.0 http://t.co/3Uci157tH0 Attunity Replicate Express http://t.co/YEF7K3zd9v

— Matt Aslett (@maslett) September 29, 2015

Bigstep launches Full Metal Data Lake, a data lake as a service offering http://t.co/hRUm4OUYND

— Matt Aslett (@maslett) October 1, 2015

And that’s the data day, today.

Comments Off on The Data Day, A few days: September 26-October 2, 2015

Hadoop (disambiguation)

September 30th, 2015 — Data management

What is Hadoop?

It should be fairly simple: in the beginning there was the Hadoop Distributed File System, Hadoop MapReduce, and the Hadoop Common set of utilities. Even with the addition of Apache YARN in 2013, just four projects officially form the core of Apache Hadoop.

However, this is not what most people refer to when they use the term ‘Hadoop’. Instead most people refer to the combination of Hadoop-related projects that are combined together with the Hadoop core to create Hadoop distributions.

As 451 Research’s Periodic Table of Hadoop illustrates, there are at least 40 projects that could be considered part of the Hadoop ecosystem (our table is comprised of Hadoop-related Apache Software Foundation projects, as well as other open source projects included in more than one Hadoop distribution). So ‘Hadoop’ represents pretty much any combination of more than 40 projects.

Hadoop’s creator Doug Cutting has asserted that Hadoop will evolve over time from a batch-processing engine to encompass a set of replaceable components in a wider distributed data-processing ecosystem. At the same time the word ‘Hadoop’ has evolved to become a catch-all brand for that wider distributed data-processing ecosystem.

That is potentially confusing, especially for for later mainstream adopters as they seek get their heads around what Hadoop is and what it is for. However, that’s not what this blog post is about. I’m less interested in defining what Hadoop is as I am interested in identifying what isn’t Hadoop.

When is Hadoop not Hadoop?

Recent announcements from the original Hadoop commercial supporter, Cloudera, have highlighted the significance of this question. First it anointed Spark as the successor to MapReduce, then it launched Kudu, a new storage engine and potential alternative to the Hadoop Distributed File System (HDFS).

If the company’s plans for Spark and Kudu play out, pretty soon we could see a whole lot of ‘Hadoop deployments’ that make use of neither MapReduce nor HDFS – the primary initial Hadoop core projects. This isn’t just a potential outcome. Already today it is perfectly plausible that a ‘Hadoop deployment’ might not involve MapReduce or HDFS – it could involve Spark accessing data in AWS S3 for example.

Both Spark and Kudu are open source and are clearly part of the wider Hadoop ecosystem, but where do you draw the line in terms of what is and isn’t ‘Hadoop’?

Vendors are increasingly layering additional proprietary components on top of this Hadoop ecosystem for differentiation. MapR has most obviously blurred the lines between Hadoop and not Hadoop, but Cloudera Enterprise could also arguably be put in a ‘Hadoop+’ category along with things like Pivotal Big Data Suite, and IBM BigInsights.

Then there are things that aren’t even claimed to be Hadoop but on closer inspection bear a close resemblance as ‘Hadoop’ evolves beyond its core. For example, the Stratio Platform is based on Apache Spark and other Apache projects including Flume and Kafka. It is isn’t claimed to be Hadoop but it enables data to be stored in the Hadoop Distributed File System (as well as AWS S3, Elasticsearch, MongoDB, Apache Cassandra, Redis, and relational databases) so it is surely part of the same wider family of data platforms.

If not Hadoop, then what?

So what should we call this wider family of data platforms – including Hadoop+ and ‘other’? Due to the pick-and-mix nature of the Hadoop ecosystem there is no easy way to answer that in terms of technology or use-cases. The products and services will be designed specifically to deliver a mix of data processing and storage capabilities, including MapReduce, SQL engines and stream processing, as well as HDFS, HBase, S3 and Kudu, and much more besides, both proprietary and open source.

Indeed it is probably easier to think about this not in terms of technologies but the symbols that represent them. If Hadoop was originally symbolised by an elephant then what symbol best conveys the category of data platforms based on the wider Hadoop ecosystem and beyond?

Given the veritable menagerie of animals (and inanimate objects) that represent the various Hadoop ecosystem projects – elephant, pig, bee, tortoise, falcon, giraffe, orca, squirrel, hippopotamus, antelope, phoenix, kylin, roadrunner, hummingbird – there is surely only one choice: the Chimera.

Source: Wikimedia

For those not acquainted with Greek mythology the Chimera was a fire-breathing, multi-headed hybrid creature composed of the parts of more than one animal. While Chimera was classically composed of the features of a lion, a snake and a goat, the term chimera can be used to describe any animal with parts taken from various animals.

As such it is perfect to symbolise the multi-headed hybrid Hadoop-based data platforms we see evolving. We are therefore tempted to use the term Chimeric Data Platform to describe this wider category of data platforms that are building on and expanding from Hadoop.

The fact that Merriam Webster further defines chimera as “something that exists only in the imagination and is not possible in reality” is an added bonus that appeals to our sense of humour.

Comments Off on Hadoop (disambiguation)

The Data Day: February 3, 2017

The Data Day: September 23, 2016

The Data Day, A few days: November 10-23, 2015

The Data Day, A few days: September 26-October 2, 2015

Hadoop (disambiguation)

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

The Data Day: February 3, 2017

The Data Day: September 23, 2016

The Data Day, A few days: November 10-23, 2015

The Data Day, A few days: September 26-October 2, 2015

Hadoop (disambiguation)

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives