Entries from November 2010 ↓

Autonomy’s ‘will it, won’t it’ M&A dance

I knew that as soon as I wrote my updated look on which company Autonomy might buy next something would come along to diminish the value of all my hard work. 😉 I had expected it to be an acquisition, but instead Autonomy surprised me yesterday afternoon with an announcement that after being in talks regarding an acquisition for “several months,”

Recent developments within these talks have given rise to an additional opportunity that warrants further examination which could give rise to an acquisition process that exceeds our original planned time scale.

In other words, it’s not going to happen any time soon.

As an announcement it’s a bit opaque and strangely structured, coming as the second part of an announcement that also told announced its Capital Markets Day would be on Monday, November 29.

It could mean:

  • Autonomy is attempting to buy part of a business and instead is now looking to expand into another part or perhaps all of the business. For that to be the case ther first part must have been quite a small deal as Autonomy’s kitty is limited to about $1bn in total.
  • Autonomy has identified a different company that it wants to buy.
  • Autonomy itself has had an offer to be acquired.

This is all of course speculation. But one thing’s a bit more certain: it will probably make a dent in the company’s Q4 results, which were expected to include a boost from an acquisition that now doesn’t look like happening. Whether that means the company will miss the revised expectations it set in the Q3 call, we’ll have to wait and see.

In our report we wondered whether the company’s health care announcement, including a new product called Auminence may be used as some sort of alternative kicker in the quarter, in lieu of an acquisition. As we said, the product came out of the blue, as do many Autonomy products and appears to us to be a repackaging of IDOL with some added diagnosis checklists on top.

The shares fell 6% yesterday after the announcement came out, all of that coming late in the day as the announcement was made at 3.51pm London time. Again, that’s slightly strange timing. The shares were creeping back up this morning.

UPDATE: Autonomy clarified its statement with this Q&A:

Would it be right to interpret from today’s announcement regarding the acquisition timetable (Update on Acquisition) that the deal you are negotiating has got larger?

No, the deal remains the same size. The statement clearly refers to the timescale.

So it’s the same company, it will just take longer, which makes most sense, I guess.

Google Trends: Hadoop versus Big Data versus MapReduce

I was just looking through some slides handed to me by Cloudera’s John Kreisa and Mike Olson during our meeting last week and one of them jumped out at me.

It contains a graphic showing a Google Trends result for searches for Hadoop and “Big Data”. See for yourself why the graphic stood out:

In case you hadn’t guessed, Hadoop is in blue and “Big Data” is in red.

Even taken with a pinch of salt it’s a huge validation of the level of interest in Hadoop. Removing the quotes to search for Big Data (red) doesn’t change the overall picture.

See also Hadoop (blue) versus MapReduce (red).

UPDATE: eBay’s Oliver Ratzesberger puts the comparisons above in perspective somewhat by comparing Joomla vx Hadoop.

Data as a natural energy source

A number of analogies have arisen in recent years to describe the importance of data and its role in shaping new business models and business strategies. Among these is the concept of the “data factory”, recently highlighted by Abhi Mehta of Bank of America to describe businesses that have realized that their greatest asset is data.

WalMart, Google and Facebook are good examples of data factories, according to Mehta, who is working to ensure that BofA joins the list as the first data factory for financial services.

Mehta lists three key concepts that are central to building a data factory:

  • Believe that your core asset is data
  • Be able to automate the data pipeline
  • Know how to monetize your data assets

The idea of the data factory is useful in describing the organizations that we see driving the adoption of new data management, management and analytics concepts (Mehta has also referred to this as the “birth of the next industrial revolution”) but it has some less useful connotations.

In particular, the focus on data as something that is produced or manufactured encourages the obsession with data volume and capacity that has put the Big in Big Data.

Size isn’t everything, and the ability to store vast amounts of data is only really impressive if you also have the ability to process and analyze that data and gain valuable business insight from it.

While the focus in 2010 has been on Big Data, we expect the focus to shift in 2011 towards big data analytics. While the data factory concept describes what these organizations are, it does not describe what it is that they do to gain analytical insight from their data.

Another analogy that has been kicking around for a few years is the idea of data as the new oil. There are a number of parallels that can be drawn between oil and gas companies exploring the landscape in search of pockets of crude, and businesses exploring their data landscape in search of pockets of useable data.

A good example of this is eBay’s Singularity platform for deep contextual analysis, one use of which was to combined transactional data from the company’s data warehouse with behavioural data on its buyers and sellers, and enabled identification of top sellers, driving increased revenue from those sellers.

By exploring information from multiple sources in a single platform the company was able to gain a better perspective over its data than would be possible using data sampling techniques, revealing a pocket of data that could be used to improve business performance.

However, exploring data within the organization is only scratching the surface of what eBay has achieved. The real secret to eBay’s success has been in harnessing that retail data in the first place.

This is a concept I have begun to explore recently in the context of turning data into products. It occurs to me that the companies that represent the most success in this regard are those that are not producing data, but harnessing naturally occurring information streams to capture the raw data that can be turned into usable data via analytics.

There is perhaps no greater example of this than Facebook, now home to over 500 million people using it to communicate, share information and photos, and join groups. While Facebook is often cited as an example of new data production,, that description is inaccurate.

Consider what these 500 million people did before Facebook. The answer, of course, is that they communicated, shared information and photos, and joined groups. The real genius of Facebook is that it harnesses a naturally occurring information stream and accelerates it.

Natural sources of data are everywhere, from the retail data that has been harnessed by the likes of eBay and Amazon, to the Internet search data that has been harnessed by Google, but also the data being produced by the millions of sensors in manufacturing facilities, data centres and office buildings around the world.

Harnessing that data is the first problem to solve, applying the data analytics techniques to that, automating the data pipeline, and knowing how to monetize the data assets completes the picture.

Mike Loukides of O’Reilly recently noted: “the future belongs to companies and people that turn data into products.” The companies and people that stand to gain the most are not those who focus on data as something to be produced and stockpiled, but as a natural energy source to be processed and analyzed.

Keeping up with the regulators: where is the e-discovery news you can use?

The regulatory climate has been a hot topic, even eclipsing litigation for some companies as a driver for e-discovery and litigation readiness. In this post I’ll round up some recent and upcoming coverage and events for those looking to stay on top of the rising regulatory tide (or at least avoid falling victim to it).

  • Those attending Nick’s webinar last week with ZyLAB will note that regulatory investigations are a global problem, leading companies worldwide to adopt new standards of readiness in order to comply with potential data retrieval for initiatives like the UK Bribery Act, among others.  Failure to meet these standards can result in penalties, in some cases, even if the company is innocent of malfeasance.
  • As business becomes increasingly global, different national data privacy and security standards come into play as well, often bedeviling efforts to collect and transfer data between countries.  I moderated BrightTalk‘s webinar last week on Cross-Border E-discovery, which offers a good idea of the issues and best practices for this problem.  In the marketplace, we’ve recently seen vendors like FTI Technology marketing more on-the-ground services with their on-site Investigate offering to fill this need, particularly around investigations stemming from the Foreign Corrupt Practices Act (FCPA).
  • In the US, we’ve seen a number of regulators given new autonomy in pursuing investigations in the wake of the financial crisis – a recent law.com article detailed new developments at the FTC, SEC, Commodities Futures Trading Commission, and new Bureau of Consumer Financial Protection, taken from a panel I attended this fall organized by Recommind, featuring David Shonka, Principle Deputy General Counsel of the FTC, John Davis, partner at Pillsbury Winthrop Shaw Pittman LLP, and Mark Racanelli, partner at O’Melveny & Myers.
  • Mr. Shonka also appeared at last month’s Masters Conference, speaking on the subject of how investigations are evolving in response to the cloud and data outsourcing.  He is scheduled to present at IQPC’s upcoming 4th E-Discovery for Financial Services Conference in February, as well – my write-up of the 2010 event is here.  The 2011 event will highlight The Dodd-Frank Act, plus the challenges of social media and exposure from the cloud, including best practices in developing an action plan as the landscape of e-discovery continues to change.  As a special offer, Too Much Information readers can receive 20% off the standard all access price to the 2011 event by entering the code EDFTMINFO when registering online. For more info, call 212-885-2738.
  • Lastly, I’ll plug our latest reports on cloud e-discovery and cloud archiving, both of which touch on how enterprise customers are meeting compliance and regulatory demands proactively with cloud offerings.  They are available for download now, or as part of the 451’s Information Management and CloudScape subscriptions.

e-Disclosure – cooperation, questionnaires and cloud

Yesterday I attended the 6th Annual e-Disclosure Forum at Canary Wharf in London, organized by the globe-trotting triumvirate of Chris Dale, Browning Marean and George Socha. It was a good program, with an audience comprising a mix of lawyers, litigation support professionals, IT practitioners, tech software and service providers and other assorted folks, like myself. It’s the second year I’ve attended and these were the key themes I picked up on:

  • Practice Direction 31B – not surprisingly this was a major issue throughout the day, considering may of those present for instrumental in drafting it, including Chris Dale and Senior Master Steven Whitaker (among others) and it only passed into the rules on October 1.  For those that don’t know, 31B amended the rues of civil procedure in the UK (the rough equivalent of the Federal Rules of Civil Procedure in the US), as they pertain to the disclosure of electronic documents (which can of course include email and other forms of communications). One aspect of the changes is a questionnaire to be used in more complex cases that involve a large number of documents. Not only does it sound to us like a sensible way of helping to to contain and get parties prepared for the case management conference (meet and confer in US parlance), but quite frankly it could be useful starting point  for organizations simply to looking to get their house in order to get prepared for future litigation.
  • Another key theme was the effect on recent UK cases on the way parties are now cooperating in case management meetings. One speaker, Jeremy Marshall, head of commercial litigation at Irwin Mitchell said that in his experience there’s a vast difference in terms of what happened before landmark cases such as Earles vs Barclays Bank in 2009 and the Digicel vs Cable & Wireless case in 2008 and what happens now. Companies know that if they don’t cooperate to make sure the necessary documents are disclosed, they could be penalized by the court, even if they win the case. For more on the Earles case and what it means regarding the destruction of documents see Chris Dale here.
  • Cloud. I had a lot of conversations with IT and legal people at the conference and they’re still not seeing the necessary granularity in service level agreements (SLAs) from cloud service providers. If you need to search your data for the purposes of e-Disclosure, it’s not clear in what format the data will come back to you or even if such a search is possible. That’s a bit of a deal-breaker, over and above any trepidation firms might feel about using cloud for any perceived security issues.
  • In general I detected a much clearer understanding on the part of US attendees of the issues in the UK market. Gone are the days it seems of assuming that the exhaustive e-Discovery process in the US is suitable without any alteration in the UK. The two countries obviously share a common law tradition, but like so many other things, there are distinct differences in the way litigation is done and that – aided in part by Chris Dale et al’s work – is now getting through to US vendors, which after all, dominate the market from the technology point of view.
  • Tips for next year to the organizers?
    • come up with a hashtag so we don’t write out ‘6th annual #eDisclosure conference’ in our tweets 😉
    • make the sessions a tad shorter
    • get a couple of additional panelists to mix it up a bit

But overall it’s the best way I know for taking the pulse of the UK e-Disclosure market in a single day.

We’ve also been active in this area ourselves recently with webinars on litigation readiness with Zylab and Katey’s participation on a Brighttalk webinar on cross-border eDiscovery. But most importantly, we have new e-Discovery research out in the shape of our cloud e-discovery [PDF]and cloud archiving [PDF] reports.

The beginning of the end of NoSQL

CouchOne has become the first of the major NoSQL database vendors to publicly distance itself from the term NoSQL, something we have been expecting for some time.

While the term NoSQL enabled the likes of 10gen, Basho, CouchOne, Membase, Neo Technologies and Riptano to generate significant attention for their various database projects/products it was always something of a flag of convenience.

Somewhat less convenient is the fact that grouping the key-value, document, graph and column family data stores together under the NoSQL banner masked their differentiating features and potential use cases.

As Mikael notes in the post: “The term ‘NoSQL’ continues to lump all the companies together and drowns out the real differences in the problems we try to tackle and the challenges we face.”

It was inevitable, therefore, that as the products and vendors matured the focus would shift towards specific use cases and the NoSQL movement would fragment.

CouchOne is by no means the only vendor thinking about distancing itself from NoSQL, especially since some of them are working on SQL interfaces. Again, we would see this fragmentation as a sign of maturity, rather than crisis.

The ongoing differentiation is something we plan to cover in depth with a report looking at the specific use cases of the “database alternatives” early in 2011.

It is also interesting that CouchOne is distancing itself from NoSQL in part due to the conflation of the term with Big Data. We have observed this ourselves and would agree that it is a mistake.

While some of the use cases for some of the NoSQL databases do involve large distributed data sets not all of them do, and we had noted that the launch of the CouchOne Mobile development environment was designed to play to the specific strengths of Apache CouchDB: peer-based bidirectional replication, including disconnected mode, and a crash-only design.

Incidentally, Big Data is another term we expect to diminish in usage in 2011, since Bigdata is a trademark of a company called SYSTAP.

Witness the fact that the Data Analytics Summit, which I’ll be attending next week, was previously the Big Data Summit. We assume that is also the reason Big Data News has been upgraded to Massive Data News.

The focus on big data sets and solving big data problems will continue, of course, but expect much less use of Big Data as a brand.

Similarly, while we expect many of the “NoSQL” databases have a bright future, expect much less focus on the term NoSQL.

Data Analytics Summit NYC – Nov 18

Next week I will be attending and giving a presentation at the Data Analytics Summit in New York. The event has been put together by Aster Data to discuss advancements in big data management and analytic processing.

I’ll be providing an introductory overview with a “motivational” guide to big data analytics, having some fun with some of the clichés involved with big data, and also presenting our view of the main trends driving business and technological innovation in data analytics.

I’ll also be introducing some thoughts about the emergence of new business models based on turning data into new product opportunities, examining the idea of the data factory, and “data as the new oil”, as a well as data as a renewable energy source.

The event will also include presentations from Aster Data, as well as Barnes & Noble, comScore, Amazon Web Services, Sungard, MicroStrategy and Dell, with tracks focusing on financial services and Internet and retail.

For more details about the event, and to register, visit http://www.dataanalyticssummit.com/2010/nyc/.

Cloud e-discovery – examining the evidence

This week we publish a new long-form report, Cloud e-discovery: litigation comes down to earth – download an executive summary here.

In cloud e-discovery we see two major market shifts: corporations in-sourcing e-discovery to lower costs, while outsourcing IT infrastructure and services around it through hosting.  Still in early adoption, it is a leap of faith on some level, and carries both risks and benefits.  While most users in our 2010 e-discovery survey were bringing the e-discovery process in-house, only 16% were using cloud to do it, for a variety of reasons including security, data loss, regulatory concerns, and ease of retrieval.

But consider that hosted e-discovery has actually been around for over 20 years. What’s more, while some enterprises are resisting the cloud, their law firms, service providers, and other outsourcers entrusted with their data are not.

Witness this month’s 2010 Am Law tech survey – 80% of law firms are using hosted technology, 60% of those for e-discovery.  In fact, e-discovery tops all hosted software usage, far surpassing HR (21%), spam filter/email (21%), storage (6%) or document management (5%).  And while 79% report a positive experience, 30% said the savings were not what they expected.  Limited customization, diminished data control and security were even greater concerns.

And what of the bigger-picture risks?  Cloud topped the agenda last month at the Masters Conference as well: the growth of public and private cloud data from mobile use and social media, potential regulatory pitfalls, the benefits and risks of hosted e-discovery, and growing cross-border issues.  No blue-sky thinking here, just hard truths on the cloud from those on the front lines.

From e-discovery lawyers and consultants:

  • “[Public] cloud providers can’t meet the needs [of e-discovery] today.”
  • “Your data, your problem.”
  • “Data privacy in the EU is like free speech or freedom of religion in the US. . . they will give up the cloud before they give this up.”

From Microsoft General Counsel, speaking on cloud regulation:

  • “Things will move quickly, and if something bad happens, things will move faster still.”

From an enterprise buyer on procurement:

  • “It will take 19 months to work out e-discovery issues once you start talking about it.”
  • “Every dollar they save on cloud will be three dollars in legal.”
  • “I hate when people say ‘it’s not gonna stop – it’s already there.’ It makes customers think there is no choice but to comply.  But maybe ‘cloud’ will go away?”

And for the last word, a characteristically common-sense admonition from UK expert Chris Dale (speaking on ECA):

So, how to navigate it all?  For a succinct analysis of the cloud e-discovery market, our report is available to 451 CloudScape or Information Management subscribers, or get an executive summary here.  It offers a market overview, benefits and risks of cloud e-discovery, adoption trends and inhibitors, market drivers, current vendor and service-provider offerings, and the future direction of the market, particularly for enterprise customers.

Also note a complementary report, Cloud archiving: a new model for enterprise data retention, by Simon Robinson and Kathleen Reidy.  They estimate the market will generate around $193m in revenues in 2010, growing at a CAGR of 36% to reach $664m by 2014.  This report covers growth drivers, the competitive landscape and the outlook for consolidation, featuring detailed vendor profiles and end-user case studies.

Webinar: navigating the changing landscape of open source databases

When we published our 2008 report on the impact of open source on the database market the overall conclusion was that adoption had been widespread but shallow.

Since then we’ve seen increased adoption of open source software, as well as the acquisition of MySQL by Oracle. Perhaps the most significant shift in the market since early 2008 has been the explosion in the number of open source database and data management projects, including the various NoSQL data stores, and of course Hadoop and its associated projects.

On Tuesday, November 9, 2010 at 11:00 am EST I’ll be joining Robin Schumacher, Director of Product Strategy from EnterpriseDB to present a webinar on navigating the changing landscape of open source databases.

Among the topics to be discussed are:

· the needs of organizations with hybrid mixed-workload environments

· how to choose the right tool for the job

· the involvement of user corporations (for better or for worse) in open source projects today.

You can find further details about the event and register here.