Entries from June 2011 ↓

The cathedral in the bazaar: the future of the EDW

Kalido’s Winston Chen published an interesting post this week comparing Enterprise Data Warehouse projects to the building of a cathedral, notably Gaudi’s Sagrada Familia: “A timeless edifice cast in stone, for a client who’s not in a hurry.”

The requirement of many data warehousing projects to deliver on an immutable vision is one of their most likely failings. As we noted in our 2009 report on considerations for building a data warehouse:

“One of the most significant inefficiencies of data warehousing is that users have traditionally had to design their data-warehouse models to match their planned queries, and it can often be difficult to change schema once the data warehouse has been created. This approach is too rigid in a world of rapidly changing business requirements and real-time decision-making.”

Not only is it too rigid, as we added in our 2010 data warehousing market sizing report:

“It is also self-defeating, since a business analyst or executive that is unable to get the answers to queries they require from the EDW is likely to find their own ways to answer these queries – resulting in data silos and the exact redundancy and duplication issues the EDW was apparently designed to avoid.”

Given my dual focus on open source software, whenever I hear the term ‘cathedral’ used in the context of software, I can’t help but think of Eric Raymond’s seminal essay The Cathedral and the Bazaar, in which he made the case for collaborative open source development – the bazaar model – as an alternative to proprietary software development approaches, where software is carefully crafted, like cathedrals, to an immutable plan.

Mixing metaphors, I realised that the comparison between the cathedral and the bazaar can also be used to explain the previously discussed changing role of the enterprise data warehouse.

Whereas traditional approaches to analytics focused on building the EDW as the single source of the truth, and the timeless data cathedral Winston describes, today companies are focused more on taking advantage of multiple data storage and processing technologies in what would better be described as a data bazaar.

However, it is not a matter of choosing between the cathedral and the bazaar. What we are seeing is the EDW becoming part of a broader data analytics architecture, retaining the data-quality and security rules and schema applied to core enterprise data while other technologies such as Hadoop, specialist analytic appliances, and online repositories are deployed for more flexible ad hoc analytic use cases and analyzing alternative sources of data – including log and other machine-generated data.

The cathedral, in this instance, is part of the bazaar.

Managing this data bazaar is essentially what our total data concept is all about: selecting the most appropriate data storage/processing technology for a particular use case, while enabling access to any data that might be applicable to the query at hand, whether that data is structured or unstructured, whether it resides in the data warehouse, or Hadoop, or archived systems, or any operational data source – SQL or NoSQL – and whether it is on-premises or in the cloud.

It is also essentially what IBM’s recently disclosed Smart Consolidation approach is all about: providing multiple technologies for operational analytics, ad hoc analytics, stream and batch processing, queryable archives, all connected by an “enterprise data hub”, and choosing the most appropriate query processing technology for the specific workload (so after polyglot programming and polyglot persistence comes polyglot analytics).

Two of my fellow database analysts, Curt Monash and Jim Kobelius, have recently been kicking around the question of what will be the “nucleus of the next-generation cloud EDW”.

While the data bazaar will rely on a core data integration/virtualization/federation hub, it seems to me that the idea that future data management architectures require a nucleus is a remnant of ‘cathedral thinking’.

Like Curt I think it is most likely that there will be no nucleus – or to put it another way, that each user will have a different perspective of the nucleus based on their role. For some Hadoop will be that nucleus, for others it will be the regional or departmental data mart. For others it will be an ad hoc analytics database. For some, it will remain the enterprise data warehouse.

I will be presenting more details about our total data concept and the various elements of the data bazaar, at The 451 Group’s client event in London on June 27.

Two upcoming 451 Group conferences in London

The 451 Group is holding two conference in London later this month.

The first is our European client event, which is being held on Monday June 27 and features three analyst presentations and one from Steve O’Connor, Director of Technology for Parliamentary ICT at the UK Houses of Parliament. The full agenda is here.

Two of the presentations are specifically  focused on information management. Matt Aslett is presenting on Total Data, our take on the the increasing volume and variety of data, combined with a greater understanding about its potential value. I’ll be preceding Matt with an overview of information risk management as we see it, focusing on how the increase in information volume and variety heightens the risk environment and what some companies have done to tackle it. Clients of 451 Group can come to the conference at no extra charge as it is included in the price of their annual relationships with us. Non-clients can also come for a fee, please email me for details

The following two days – Tuesday June 28 and Wednesday June 29 are focused on hosting and cloud issues with our Hosting and Cloud Transformation Summit (HCTS). The agenda features a wide variety of speaker, both from 451 Group and from numerous end users in markets including financial services, government, media, telecommunications and transportation.

The highlight for many will likely be listening to, and asking questions of, Professor Brian Cox, the Professor of Particle Physics at Manchester University and one of the leaders on the ATLAS experiment at the Large Hadron Collider at CERN in Geneva. In the UK he is well known for two massively popular science programmes, Wonders of the Solar System and, in 2011, Wonders of the Universe, the first of which is also now on in the US on the Science Channel. He will be talking about all the things that interest him, and there will be ample time for Q&A.

Seats are selling fast and 451 clients who attend the client event get a discount to HCTS. We also have some discount codes avaiable, so if you’re not a client and would like to attend, please get in touch via email or Twitter (@NickPatience).

I look forward to seeing some of you there!