Everything you ever wanted to know about big data in 180 minutes, with coffee

On the afternoon of Tuesday May 14 at Enterprise Search Europe in London I will be running a mini-workshop on the Challenges of Big Data.

The format is a little different to the presentations and webinars we usually get involved with, but the three-hour timescale gives us the opportunity to do a really deep dive into the trend of big data, its associated technologies, and the potential impact on the data management landscape. As such the session could effectively be titled ‘everything you ever wanted to know about big data in 180 minutes’.

Attendees will have exclusive access to our latest thoughts and concepts in this fast-moving space. The workshop is aimed at anyone with an interest in the concept of big data, its related technologies and how it will affect their business, and will give you:

  • A thorough, hype-free introduction to the concept of ‘big data’
  • An overview of the technical, business and cultural trends that are changing the way enterprises store, process and analyse data
  • An understanding of how ‘big data’ technologies complement and/or compete with traditional data warehousing and analytics technologies
  • Examples of enterprise deployments of Hadoop and related best practices
  • An overview of how search and machine-learning technologies relate to ‘big data’

There will also be coffee 🙂

I will also be providing An Overview of Big Data during the main Enterprise Search Europe Conference Programme on Thursday at 9am, but those who attend the workshop on Tuesday will get the really good stuff.

For full details of Enterprise Search Europe, and to register for the workshops, please click here.

Forthcoming webinar: Real-time Big Data Analytics, with MemSQL

Next Tuesday, May 7, I’ll be taking part in a webinar with MemSQL at 10am PST to discuss real-time big data analytics.

Although much of the focus of big data relates to volume, the velocity of data – combined with increased frequency of analysis – is driving requirements for real-time analytics so enterprises can drive new business opportunities and create a competitive advantage.

I’ll be discussing the forces driving the emergence of ‘operational intelligence’ as a means of generating real-time insight into how a business is performing, and will be joined by MemSQL CEO, Erik Frenkiel.

Between us we will discuss how to:

  • Scale an in-memory analytics solution for sub-second responses on terabytes of real-time and recent historical data.
  • Reduce time-to-value by using existing SQL skills for real-time analytics.
  • Solve real-time Big Data analytics challenges such as network security, operational analytics, risk management and marketing campaign optimization.
  • Leverage real-time operational intelligence to make data-driven decisions.
  • Minimize costs with horizontal scale-out on commodity hardware.

For full details, and to register, click here.

Big Data and the Cloud: A Perfect Storm? – expanded edition

451 Research will be hosting its annual HCTS EU event in London on April 9/10. The event includes presentations from 451 Research, Uptime Institute and Yankee Group analysts, as well as representatives from vendors and enterprises – such as HCBS, The BBC, Google, Morgan Stanley, Greenpeace, News International, BNP Paribas, and ING.

We also have a guest speaker in Tim Harford author of “The Undercover Economist”.

As if that wasn’t enough, I’ll be presenting the expanded version of my “Big Data and the Cloud: A Perfect Storm?” presentation.

As I previously wrote ahead of presenting the shortened version at Cloud Expo Europe, many people seem to believe that cloud computing and big data have the potential to create a perfect storm of disruption.

However, 451 Research has been tracking the adoption of data management technologies on the cloud – and the lack of it – since relational databases became available on AWS in 2008, and the effect of the confluence of big data and the cloud would perhaps better be described as dead calm, rather than a perfect storm. Other than development and test environments, adoption has been limited.

In my presentation I’ll take a look at the factors that have restricted adoption of databases in the cloud to date – including some exclusive results from our recent database survey – explain why we see the potential for cloud database growth in the coming years, and examine how the strategies of emerging Hadoop- and database-as-a-service providers are evolving to ensure that big data and the cloud combine to fulfil their potential to disruptive the IT landscape as we know it.

For full details of the event, and to register, click here.

Forthcoming webinar: Searching for Value in Big Data

On March 14th at 10:00am PT I’ll be taking part in a webinar in association with LucidWorks on the role search has to play in big data.

I’ll be joined by Grant Ingersoll, Chief Technology Officer for LucidWorks, who will provide a brief overview of LucidWorks Big Data, a development platform designed specifically for building these new applications.

Also presenting will be Tony Jewitt, Vice President of Big Data Solutions for Avalon Consulting LLC, who will demonstrate how Avalon’s Unified Search and Analytics platform leverages the LucidWorks Big Data platform to discover and analyze data maintained in Hadoop.

Between us we’ll also be discussing:

  • How Big Data is changing the database landscape
  • Total data – new approaches to accessing and analyzing data
  • Search and analytics – two sides of the same coin
  • The role that search plays in discovering new insights and generating value

For more details, and to register, go to http://programs.lucidworks.com/451Group032013_signuppage.html

Big Data and the Cloud: A Perfect Storm?

Next week at Cloud Expo Europe in London I’ll be giving a presentation – at 12.05 on January 29 to be precise – on the potential confluence of bog data and cloud computing.

Cloud computing is all about enabling frictionless adoption of low-cost, flexible compute and storage, while big data technologies such as Apache Hadoop enable low-cost, flexible data storage and processing. Hence many people seem to believe that cloud computing and big data have the potential to create a perfect storm of disruption.

However, 451 Research has been tracking the adoption of data management technologies on the cloud – and the lack of it – since relational databases became available on AWS in 2008, and the effect of the confluence of big data and the cloud would perhaps better be described as dead calm, rather than a perfect storm. Other than development and test environments, adoption has been limited.

In our presentation “Big Data and the Cloud: A Perfect Storm?” we will take a look at the factors that have restricted adoption of databases in the cloud to date, explain why we see the potential for cloud database growth in the coming years, and examine how the strategies of emerging Hadoop- and database-as-a-service providers are evolving to ensure that big data and the cloud combine to fulfil their potential to disruptive the IT landscape as we know it.

The Data Day, Two days: January 7/8, 2013

SAP’s HANA – a floor wax *and* a dessert topping?

And that’s the Data Day, today.

New 451 Research report: Total Data Analytics

451 Research’s Information Management practice has published its latest long-format report: Total Data Analytics. Written by Krishna Roy, Analyst, BI and Analytics, along with myself, it examines the impact of ‘big data’ on business intelligence and analytics.

The growing emphasis on ‘big data’ has focused unprecedented attention on the potential of enterprises to gain competitive advantage from their data, helping to drive adoption of BI/analytics beyond the retail, financial services, insurance and telecom sectors.

In 2011 we introduced the concept of ‘Total Data‘ to reflect the path from the volume, velocity and variety of big data to the all-important endgame of deriving maximum value from that data. Analytics plays a key role in deriving meaningful insight – and therefore, real-world business benefits – from Total Data.

In short, big data and Total Data are changing the face of the analytics market. Advanced analytics technologies are no longer the preserve of MBAs and ‘stats geeks,’ as line-of-business managers and others increasingly require this type of analysis to do their jobs.

Total Data Analytics outlines the key drivers in the analytics sector today and in the coming years, highlighting the technologies and vendors poised to shape a future of increased reliance on offerings that deliver on the promise of analyzing structured, semi-structured and unstructured data.

The report also takes a look at M&A activity in the analytics sector in 2012, as well as the history of investment funding involving Hadoop, NoSQL and Hadoop-based analytics specialists. It also contains a list of 40 vendors we believe have the greatest potential to shape the market in the coming years.

The report is available now to 451 Research clients, here. Non-clients can get more information and download an executive summary from the same link.

Weird Science – Darwinian theory and emerging Hadoop vendor business strategies

Dan Woods recently opined that Apache Hadoop has had a weird beginning thanks to its “Three Headed Open Core” model and warned that there is a danger than it will fragment – Ă  la Unix – thanks to competing commercial forces.

There are a couple of points to address here. The first is the assumption that the vendor community developing Hadoop is in some way ‘weird’. Not for those of us that have studied the evolution of open source-related business strategies it isn’t.

In fact, Hadoop’s multi-vendor community is a prime example of the corporate-dominated development communities we saw emerging as the fourth stage of commercial open source back in 2010.

Some people still have trouble understanding, as I wrote two years ago, that

being successful is about sharing your code development with the competition via multi-vendor open source projects in order to benefit from improved code quality and lower research and development costs for non-differentiating features AND beating your competition with proprietary complementary technologies.

This isn’t weird. I firmly believe in the not-too-distant future this will be seen as entirely normal.

Another issue to address is the suggestion that these competing vendors pose a danger to the core project. In the blog linked above I argued that the contrary is true: comparing the various competing players in collaborative communities as having a similar impact on the development of a project as various competing factors – climate, habitat, existence or dearth of predators etc – do in Darwin’s evolutionary process: i.e. making it stronger.

I would be much more concerned about the potential fragmentation of Hadoop if we were looking at four or five different competing implementations of Google’s MapReduce and file system research. Instead, you could compare the differentiating features that Cloudera, Hortonworks, MapR, IBM and EMC have introduced to the result of natural selection based on a need to evolve to certain conditions.

So long as there remains a single core Apache Hadoop project upon which these differentiating features are based I believe Hadoop will not only survive, but will thrive. If I may quote myself again: “As long as they continue to collaborate on the non-differentiating code, the project should benefit from being stretched in multiple directions.”

I believe that, as with Linux, the vendors involved have learned the lessons of the Unix wars and understand that it is in their best interests – let alone everyone else’s – not to repeat them.

Another key point when we look at the Hadoop ecosystem is that we see multiple vendors building on others’ differentiating features and often supporting multiple distributions. It’s not a case of a herd of individually differentiated Hadoops, but more like a stack of Russian Hadoop dolls.

To my mind there are (currently) eight main Hadoop business strategies, each of which has the potential to build on those before it:

  • Hadoop distributors
  • e.g. Cloudera, Hortonworks, MapR, EMC, IBM

  • Hadoop cloud services
  • e.g. Amazon EMR, Google Compute Engine

  • Hadoop-based deployment services
  • e.g. Infochimps, Metascale

  • Hadoop-based deployment stack/appliances
  • e.g. Zettaset, Oracle BDA, Dell

  • Hadoop-based development services
  • e.g. Continuuity, Mortar Data

  • Hadoop-based application stacks
  • e.g. NGDATA, Guavus

  • Hadoop-based database stacks
  • e.g. Drawn to Scale, Splice Machine

  • Hadoop-based analytic services
  • e.g. Treasure Data, Qubole

    The Data Day, Two days: November 28/29 2012

    Amazon and BitYota launch DWaaSes (DWaaSi?) Continuuity’s funding and plans. And more.

    And that’s the Data Day, today.

    Forthcoming webinar: Big Data Best Practices with NGDATA

    On December 13 at 1pm EDT/10AM PDT I’ll be taking part in a webinar to discuss Big Data Best Practices – Realizing True Business Value from Your Big Data.

    Big Data has rapidly become a transformational business trend. Most business leaders understand that not being able to tap into the power of their Big Data could mean losing business to the competition. However, most organizations are not fully aware of how to embrace it.

    I’ll discuss how you can overcome these hurdles and tap into your Big Data to transform your business, while Naren Patil, SVP of Product Marketing, NGDATA will provide some real-life examples of successful deployment projects.

    To register, click here.