Entries from January 2011 ↓

Data cloud, datastructure, and the end of the EDW

There have been a spate of reports and blog posts recently postulating about the potential demise of the enterprise data warehouse (EDW) in the light of big data and evolving approaches to data management.

There are a number of connected themes that have led the likes of Colin White and Barry Devlin to ponder the future of the EDW, and as it happens I’ll be talking about these during our 451 Client event in San Francisco on Wednesday.

While my presentation doesn’t speak directly to the future of the EDW, it does cover the trends that are driving the reconsideration of the assumption that the EDW is, and should be, the central source of business intelligence in the enterprise.

As Colin points out, this is an assumption based on historical deficiencies with alternative data sources that evolved into best practices. “Although BI and BPM applications typically process data in a data warehouse, this is only because of… issues… concerning direct access [to] business transaction data. If these issues could be resolved then there would be no need for a data warehouse.”

The massive improvements in processing performance seen since the advent of data warehousing means that it is now more practical to process data where it resides, or is generated rather than forcing data to be held in a central data warehouse.

For example, while distributed caching was initially adopted to improve the performance of Web and financial applications, it also provides an opportunity to perform real-time analytics on application performance and user behaviour (enabling targeted ads for example) long before the data get anywhere near the data warehouse.

While the central EDW approach has some advantages for data control, security and reliability, this has always been more theoretical than practical, as there is the need for regional and departmental data marts, and users continue to use local copies of data.

As we put it in last year’s Data Warehousing 2009-2013 report:

“The approach of many users now is not to stop those distributed systems from being created, but rather to ensure that they can be managed according to the same data-quality and security rules as the EDW.

With the application of cloud computing capabilities to on-premises infrastructure, users now have the promise of distributed pools of enterprise data that marry central management with distributed use and control, empowering business users to create elastic and temporary data marts without the risk of data-mart proliferation.”

The concept of the “data cloud” is nascent, but companies such as eBay are pushing in that direction, while also making use of data storage and processing technologies above and beyond traditional databases.

Hadoop is a prime example, but so too are the infrastructure components that are generating vast amounts of data that can be used by the enterprise to better understand how the infrastructure is helping or hindering the business in responding to changing demands.

For the 451 client event we have come up with the term ‘datastruture’ to describe these infrastructure elements. What is ‘datastructure’? It’s the machines that are responsible for generating machine-generated data.

While that may sound like we’ve just slapped a new label on existing technology we believe that those data-generating machines will evolve over time to take advantage of improved available processing power with embedded data analytics capabilities.

Just as in-database analytics has enabled users to reduce data processing latency by taking the analytics to the data in the database, it seems likely that users will look to do the same for machine-generated data by taking the analytics to the data in the ‘datastructure’.

This ‘datastructure’ with embedded database and analytics capabilties therefore becomes part of the wider ‘data cloud’, alongside regional and departmental data marts, and the central business application data warehouse, as well as the ability to spin up and provision virtual data marts.

As Barry Devlin puts it: “A single logical storehouse is required with both a well-defined, consistent and integrated physical core and a loose federation of data whose diversity, timeliness and even inconsistency is valued.”

Making this work will require new data cloud management capabilities, as well as an approach to data management that we have called “total data”. As we previously explained:

“Total data is about more than data volumes. It’s about taking a broad view of available data sources and processing and analytic technologies, bringing in data from multiple sources, and having the flexibility to respond to changing business requirements…

Total data involves processing any data that might be applicable to the query at hand, whether that data is structured or unstructured, and whether it resides in the data warehouse, or a distributed Hadoop file system, or archived systems, or any operational data source – SQL or NoSQL – and whether it is on-premises or in the cloud.”

As for the end of the EDW, both Colin and Barry argue, and I agree, that what we are seeing does not portend the end of the EDW but recognition that the EDW is a component of business intelligence, rather than the source of all business intelligence itself.

The SharePoint ecosystem and the cloud

Looking further into the growing ecosystem of vendors that extend and support Microsoft SharePoint, we get to the question of where ISVs fit when SharePoint is in the cloud.  The short answer, really, is that they don’t.  OK, that’s an oversimplification of course, but there is currently a far more limited role for third parties looking to extend SharePoint if it is run in a shared cloud environment.  And this points to some contradictions in Microsoft’s strategy.  On the one hand, we see a big push around SharePoint as a platform and this growing ecosystem of third parties.  On the other hand, Microsoft is touting SharePoint Online as part of the upcoming Office 365 cloud-based service (to replace the existing Business Productivty Online Suite, aka BPOS), which really has very little support for third parties.

BPOS, which bundles SharePoint Online along with Exchange and a few other services, is currently offered in both Standard and Dedicated versions.   In the Standard version, customers have multi-tenant infrastructure that is shared across customers.  With the Dedicated version (or BPOS D), they have (obviously) dedicated infrastructure, which pretty much traditional application hosting; with this BPOS D configuration, Microsoft is the hosting provider, though this scenario would really not be much different from having another hosting provider run your SharePoint deployment on dedicated servers.  Office 365 will also be made available on either shared or dedicated infrastructure.

There is currently no support for trusted third-party code in the Standard version of BPOS (aka BPOS S), nor will there be in the Office 365 Standard version.  Customers that want to extend their SharePoint deployment with, say, workflow tools from Nintex or imaging capabilities from KnowledgeLake (or any of their own custom code), will have to run their SharePoint deployments on prem or in a dedicated environment, hosted by either Microsoft or another hosting provider.

That isn’t to say that integration with BPOS / Office 365 is impossible — web services-based integration that requires no server-side installs on the SharePoint servers isn’t an issue.  So, for example, Metavis Technologies has migration tools that can move data to / from BPOS without installing anything on the SharePoint servers and so can work with SharePoint as part of BPOS S (and Office 365 presumably).  Similarly, on the Exchange side of BPOS, email archiving to a cloud provider like LiveOffice works via a data export function that doesn’t touch the cloud-based Exchange servers.

Maybe the argument is that orgs don’t want to run more sophisticated content management apps in pure cloud environments.  That may be an ok way to segment the market today but it will be limiting in the future.   One of the advantages Microsoft has today over an upstart cloud player, like Box.net for example, is the growing ecosystem of extensions that can help fit SharePoint into a broad array of use cases.  But these aren’t there in the cloud. If Box (or another player) could grow and support an ecosystem in the cloud (and support custom code and in-house developers), it might get some advantage; this is the strategy SpringCM has been attempting, with some, limited success, with its platform approach to ECM in the cloud.  Salesforce has also been more aggressively building its social software offering, Chatter (see recent acquisition of Dimdim as case in point).  This doesn’t meet a plethora of content management requirements yet but is potentially competitive to SharePoint as a social software service for internal use.

There are clear limitations to the approach Microsoft is currently taking with the SharePoint ecosystem and BPOS / Office 365 and it seems this will be something that Microsoft will have to ultimately address if it wants to be serious about offering SharePoint as cloud services.  This isn’t the only issue that might keep organizations away from the Standard version of Office 365 (i.e., how much SharePoint functionality will it include and how often will it rev?), but it could be a big one.

SharePoint at center of growing ecosystem in content management

Over the past few quarters, I’ve fielded a number of inquiries from IT, investor and vendor clients about an emerging “SharePoint ecosystem.”  Questions range from “We want to extend our SharePoint deployment to support a transactional app.  What third-party tools should we look at?” to “What are the gaps in SharePoint where there are opportunities for investment?” In response to some of these queries, I’ve put together a new report for 451 Group clients that shares a title with this blog post.

It’s hardly a secret that SharePoint has had and will continue to have a tremendous impact on the content management market.  Organizations really started taking SharePoint seriously as a content management platform after the release of Microsoft Office SharePoint Server in 2007 (affectionately known as MOSS 2007).  We’ve seen a few trends since that time that affect this idea of a SharePoint ecosystem:

  • With many organizations investing significant amounts of time, money and effort in their SharePoint deployments, there is a good deal of interest in expanding SharePoint’s use beyond some of the more basic content sharing uses and intranet apps where it mostly started.
  • Along with that however is a better understanding by many in IT and in business units of where SharePoint works well and where it falls short.  This isn’t true across the board, as there is still a great deal of variation in terms of the sophistication of SharePoint deployments (i.e., the more an org uses SharePoint, the more they are likely to see its limitations).
  • Microsoft’s own attitude towards SharePoint seems to have shifted to some degree since the 2007 MOSS launch.  At that point, Microsoft positioned SharePoint more as the end-all, be-all of content management.  That positioning seemed to fade pretty quickly in the face of the realities of content management realized by Microsoft’s field organizations and partners.  Today there is more subtlety in how Microsoft defines its own content management capabilities (foundational) vs. the areas it leaves to partners (supplemental).  I’m not claiming this new, Microsoft has been positioning SharePoint as a dev’t platform for ISVs for some time, but it is worth highlighting as an ongoing trend as it relates to the ISV ecosystem.

So those three trends taken together and separately can point to significant opportunities for ISVs that are extending SharePoint.  There is also really not much in the new SharePoint 2010 release to derail many of these players; there is still lots of room for extension and complementary capabilities.

Some of these are smaller players really dedicating their businesses to SharePoint (e.g., Nintex, KnowledgeLake) and some are much larger businesses that have either invested heavily in SharePoint tools (e.g., Quest Software).  Existing large ECM vendors fit into this ecosystem as well, as they have adjusted strategies to both coexist and compete with SharePoint (e.g., Open Text, EMC).  We cover most of these vendors in some depth in our regular Market Insight Service and look at them together and at some of the competitive dynamics in the segment in this Spotlight report.

NoSQL – consolidating and proliferating in 2011

Among the numerous prediction pieces during the rounds at the moment, Bradford Stephens, founder of Drawn to Scale suggested we could be in for continued proliferation of NoSQL database technologies in 2011, while Redmonk’s Stephen O’Grady predicted consolidation. I agree with both of them.

To understand how NoSQL could both proliferate and consolidate in 2011 it’s important to look at the small print. Bradford was talking specifically about open source tools, while Stephen was writing about commercially successful projects.

Given the levels of interest in NoSQL database technologies, the vast array of use cases, and the various interfaces and development languages – most of which are open source – I predict we’ll continue to see cross-pollination and the emergence of new projects as developers (corporate and individual) continue to scratch their own data-based itches.

However, I think we are also beginning to see the a narrowing of the commercial focus on those projects and companies that have enough traction to generate significant business opportunities and revenue, and that a few clear leaders will emerge in the various NoSQL sub-categories (key-value stores, document stores, graph databases and distributed column stores).

We can see previous evidence of the dual impact of proliferation and consolidation in the Linux market. While commercial opportunities are dominated by Red Hat, Novell and Canonical, that has not stopped the continued proliferation of Linux distributions.

The main difference between NoSQL and Linux markets, of course, is that the various Linux distributions all have a common core, and the diversity in the NoSQL space means that we are unlikely to see proliferation on the scale of Linux.

However, I think we’ll see a similar two-tier market emerge with a large number of technically interesting and differentiated open source projects, and a small number of commercially-viable general-purpose category leaders.