pregel — Too much information

The Data Day, Two days: August 27/28 2012

Citrusleaf. Aerospike. AlchemyDB. Sqrrl. Percolator. Dremel. Pregel. And more.

For 451 Research clients: Citrusleaf becomes Aerospike, ‘acq-hires’ AlchemyDB to create NoSQL/NewSQL database hybrid bit.ly/PXyeQ1

— Matt Aslett (@maslett) August 28, 2012

For 451 Research clients: sqrrl data emerges from NSA to commercialize Accumulo NoSQL database bit.ly/PXya2F

— Matt Aslett (@maslett) August 28, 2012

Percolator, Dremel and Pregel: Alternatives to Hadoop bit.ly/U7z3GQ

— Matt Aslett (@maslett) August 28, 2012

Graphs are Everywhere: Solving the Complexities of Social Connections bit.ly/U7zb9c

— Matt Aslett (@maslett) August 28, 2012

The Elephant in the Cloud – Putting Hadoop on any Cloud bit.ly/U7zxwC

— Matt Aslett (@maslett) August 28, 2012

Splunk previews two new offerings, and an open source project, to integrate with Hadoop. bit.ly/U7zsJ9

— Matt Aslett (@maslett) August 28, 2012

And that’s the Data Day, today.

Comments Off

Hadoop is dead. Long live Hadoop.

GigaOM published an interesting article over the weekend written by Cloudant’s Mike Miller about why the days are numbered for Hadoop as we know it.

Miller argues that while Google’s MapReduce and file system research inspired the rise of the Apache Hadoop project, Google’s subsequent research into areas such as incremental indexing, ad hoc analytics and graph analysis is likely to inspire the next-generation of data management technologies.

We’ve made similar observations ourselves but would caution against assuming, as some people appear to have done, that implementations of Google’s Percolator, Dremel and Pregel projects are likely to lead to Hadoop’s demise. Hadoop’s days are not numbered. Just Hadoop as we know it.

Miller makes this point himself when he writes “it is my opinion that it will require new, non-MapReduce-based architectures that leverage the Hadoop core (HDFS and Zookeeper) to truly compete with Google’s technology.”

As we noted in our 2011 Total Data report:

“it may be that we see more success for distributed data processing technologies that extend beyond Hadoop’s batch processing focus… Advances in the next generation of Hadoop delivered in the 0.23 release will actually enable some of these frameworks to run on the HDFS, alongside or in place of MapReduce.”

With the ongoing development of that 0.23 release (now known as Apache Hadoop 2.0) we are beginning to see that process in action. Hadoop 2.0 includes the delivery of the much-anticipated MapReduce 2.0 (also known as YARN, as well as NextGen MapReduce). Whatever you choose to call it, it is a new architecture that splits the JobTracker into its two major functions: resource management and application lifecycle management. The result is that multiple versions of MapReduce can run in the same cluster, and that MapReduce becomes one of several frameworks that can run on the Hadoop Distributed File System.

The first of these is Apache HAMA – the bulk synchronous parallel computing framework for scientific computations, but we will also see other frameworks supported by Hadoop – thanks to Arun C Murthy for pointing to two of them – and fully expect the likes of incremental indexing, ad hoc analytics and graph analysis to be among them.

As we added in Total Data:

“This supports the concept recently raised by Apache Hadoop creator Doug Cutting that what we currently call ‘Hadoop’ could perhaps be thought of as a set of replaceable components in a wider distributed data processing ecosystem… the definition of Hadoop might therefore evolve over time to encompass some of the technologies that could currently be seen as potential alternatives…”

The future of Hadoop is… Hadoop.

Comments Off

He did win a massive democratic mandate but that didn’t give him carte blanche to lie with impunity. @BorisJohnson… https://t.co/OKCsqKXifc

The Data Day, Two days: August 27/28 2012

Hadoop is dead. Long live Hadoop.

Search

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives

The Data Day, Two days: August 27/28 2012

Hadoop is dead. Long live Hadoop.

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives