Autonomy buys Interwoven

Release is here. Autonomy is paying $775m cash, including a new loan.

Main drivers as we see it right now having just listened to the call:

  • eDiscovery and increasingly regulated environment.
  • Access to Interwoven’s rich customer base in the legal sector.
  • Adding automation to the content management process – think auto-tagging rather than manual tagging.
  • FRCP changes in 2006 forced companies to consider all their data and you can’ manage all your data manually.
  • Autonomy has changed its mind about content management for the reasons above.
  • Reward for Interwoven’s turnaround and refocusing efforts including in eDiscovery via the Discovery Mining acquisition.
  • Leaves other standalone content management players in an even worse position (e.g. Vignette).
  • Autonomy acquisition engine gets some more fuel; it’s looking more & more like a mini-Oracle every day, in all senses of that phrase.

More considered and deep analysis coming to 451 customers later today.

451 Group client event last week

Later than I intended, I wanted to give you a quick update of last week’s client event and information management’s presence at it. Kathleen, Simon, Henry, Matt and me were engaged in many 1:1s – I had 15 over the two days, which were very useful for me and more importantly, from feedback we’ve had, useful to the other person as well. Some of our analysts were booked back to back, doing 20+ meetings; that level of engagement is one of the main values we deliver at our conferences.

On the presentation and panels front, Kathleen did a great job of laying out her vision of how collaboration and social software are finally impacting content technologies, moving beyond just things that enable you to create content, to enable organizations to better handle the risks that can create. Some people who weren’t able to hear her live have asked to hear it by way of a followup – if you do, please get in touch.

My panel was great, comprising Sid Probstein, CTO of Attivio, Stephen Whetstone of Iron Mountain-Stratify and Nicole Eagan, CMO of Autonomy. We were in the after lunch slot but given we were talking mainly about eDiscovery, the future of search and the effects of the credit crunch on information management, we still got people’s attention.

Anyway, don’t take my word for it, listen to what Sid says about it, plus his thoughts on other aspect of the event here and here. I couldn’t have put it better myself!

See you in Boston next year, I hope.

Autonomy and eDiscovery

Amidst the usual explanations of margins, day sales outstanding, average deal sizes, organic growth rates and other financial minutiae (which we like, btw), Autonomy used the following slide during its Q3 earnings call yesterday, ramming home the importance to it and other software companies like it of eDiscovery and the Electronic Discovery Reference Model (EDRM), from which this is adapted:

And these ducks in a row sat there while the management took questions from the financial analysts, while berating a few of them in the process for questioning its organic growth model, which Autonomy laid out for all to see. Our quick take on Autonomy’s earnings is here for 451 clients.

EDRM also, in part the basis of our upcoming eDiscovery report, which will take a thorough look at the current future states of the eDiscovery and eDisclosure (as it’s known in the UK) software and services market.

Please get in touch with me if you would like to know more about that.

Google’s enterprise search: in the cloud & in a box

Google has changed the name the scope of its Website search it offers to Website owners that want a little more than simply to know that their site is being indexed by Google, but don’t want to go as far as buying one of its blue or yellow search appliances. 451 clients can read what we thought of it here.

Google has three levels of Website search to offer organizations – completely free but with no control as to which parts of your website are indexed and when, known as Custom Search Edition/AdSense for Search (CSE/AFS); the newly rebranded Google Site Search; and  the Google search appliances, which it sells in Mini and Search Appliance form factors, which can be used both for external-facing Website search as well as intranet search.

Google stopped issuing customer numbers for its appliances in October 2007. The number of organizations it had sold to at that point was about 10,000 customers. I suspect that number is around 11,500 now, though I don’t have any great methodology to back that up, I’m just extrapolating from previously-issued growth figures. That’s an extraordinary amount of organizations with a Google box.

To give some perspective, Autonomy has ~17,000 customers now. But the vast majority came from Verity. When Autonomy bought Verity in November 2005, Verity had about 15,000 customers (and Autonomy had about 1,000). But Verity got about 8,000 of those customers via its acquisition of Cardiff Software in February 2004. So in about 2.5 years Autonomy has added about 1,000 customer, but of course has done of lot of up-selling to its base and doesn’t play in the low-cost search business anymore (mainly because of Google).

The actual number of Google appliances sold is higher of course as many organizations have multiple appliances. I’ll never forget 18 months or so ago standing in  a room of a top 3 Wall Street investment bank with its top ~25 technologists gathered in a room and seeing about 6 of them put up their hands when asked who has a Google appliance – most of those weren’t known about to their boss or to each other.

But Google appliance proliferation is commonplace in large organizations. The things are so cheap and so relatively easy to install they are bought often under the radar of IT . The problem comes when times get tough (as they are in investment banking IT, that’s for sure) the organization wants to ring more out of the assets it has – even if it didn’t know it had those assets until relatively recently.

That’s why we strongly expect Google to come out with some sort of management layer this year to handle this sort of unintended (by the customer that is) proliferation. Watch this space.

Text analysis + content management = insight

We have long wondered why more content management vendors don’t fully embrace text analysis (or even enterprise search for that matter).

These guardians of most organizations unstructured data were beaten to the punch in terms of exploiting text by business intelligence companies, which are more accustomed to manipulating structured data. It’s great that the BI companies are starting (slowly) to embrace the idea of unlocking the value locked within unstructured text, it’s somewhat bizarre why content management vendors didn’t get there first.

We said this many years ago, in the most coherent form in mid 2005 with our report called Text-aware applications: the endgame for unstructured data (the clue’s in the title).

In report that we said:

“…while the penetration of content management systems is relatively high when compared with other ways of managing unstructured data, these systems do little at present to help analyze that unstructured data.”

and somewhat optimistically:

“Indeed, despite the CMS’s [content management systems] ability to organize, most implementations rarely attempt to push into anything that could be considered a semantic understanding of the content. This may be set to change, however, with some vendors, such as EMC, making headway in automatically parsing documents at a deeper level than just file-level metadata.”

That was a tad premature on our part.

Think about the main players and what they do to understand what resides in the documents they ‘manage.’

EMC Documentum – it has its content intelligence services classification engine sure, and it bought a federated search product many moons ago, but neither are exactly front and central to its product strategy. And ILM (try searching on that now at EMC and see what you get) only dealt with file-level metadata, not semantic metadata. However the X-Hive acquisition was an interesting one from this standpoint (see below for more on XML databases).

Vignette – bar an OEM relationship with Autonomy (which most vendors have) nothing much doing here despite the need for Web content management to increase its understanding of the text its managing to make websites more attractive to advertisers (think of using text analysis to build links to other content automatically to keep visitors on the site longer).

Interwoven – Metatagger isn’t exactly at the bleeding edge any more, although the idea is sound.

IBM Filenet – here there is hope. IBM has taken a classifier it got from its iPhrase acquisition and used it to do initial classification to help determine what should or should not be deemed a record. IBM has all sorts of text analysis toys to play with and we expect more from it in the future.

Open Text – it once had five search engines, and was a pioneer in that space. But I’m not aware of anything it does to extract meaning from the content it manages.

Autonomy – Its tagline is ‘Meaning-based computing.’ It owns a powerful classification engine but now also owns records management and a bunch of other stuff. It’s the one company that checks most of the boxes here (but isn’t a document or Web content management vendor). But as the company currently refuses to talk to us, we’re in the dark as to which bit fits where and are unable to tell our clients what benefits Autonomy could bring them as a result. If the company cares to get in touch with me, I’m here.

This post was prompted partly by a recent conversation I had with Nstein . It is morphing from being a struggling text analysis vendor laden with debt (it’s publicly traded in Canada, so the numbers don’t lie) to a fast-growing combination of Web content management, digital asset management (via acquisitions in 2006 and 2007) and text analysis, built atop an XML database licensed from IxiaSoft. Its focusing exclusively on the largest publishing companies, using the text analysis to automatically create links between new and archived content (thus pushing it up Google rankings). It competes with Mark Logic and Interwoven, mainly.

Any Gmail user that looks in their spam folder and see ads for “Spam Swiss Pie – Bake 45-55 minutes or until eggs are set,” can appreciate how crude keyword matching against content is next to useless.

There’s so much more that can be done here and so much insight being left on the table, whether it be in better website management to attract readers, voice of the customer analysis tied to BI, or government intelligence.

Tools that manage content need to understand that content – its language, its meaning, its sentiment. Otherwise, they are missing a trick.

Our take on M&A in enterprise search

I’ve gathered all my current thinking on potential M&A in enterprise search in a SectorIQ that we published earlier this week to our customers. In it, I look at four main potential targets plus a few other small ones and look at a few of the likely acquirers. (This is the way we write all our Sector IQs, btw and they’re a great way of getting a quick grasp on what might be coming down the pike in any particular sector of the IT industry)

Fortunately those of you that are not our customers (yet!) are able to read it via our arrangement with the New York Times DealBook section. Click here to see the NY Times posting or go here to go straight to the report – and while you’re there, sign up for a trial of our M&A KnowledgeBase, where we’ve been collecting details of every IT, internet and telecoms deal since the start of 2002!

Finally, a quick word about the headline. We like to have some fun here at 451 with these things and while I appreciate that this one might have been pushing things a little in terms of clearly explaining what the report was about, when else would I be able to use it? 😉