Entries Tagged 'Archiving' ↓

Autonomy pops up to pronounce an RDBMS revolution is afoot

In one of those Autonomy announcements that seemingly appear out of nowhere, the company has declared its intention to “transform” the relational database market by applying its text analysis technology to content stored within database. The tool is called IDOL Structured Probabilistic Engine (SPE), as it uses the same Bayesian-based probabilistic inferencing technology that IDOL uses on unstructured information.

The quote from CEO Mike Lynch grandly proclaims this to be Autonomy’s “second fundamental technology” – IDOL itself being the first. That’s quite a claim and we’re endeavoring to find out more and will report back as to exactly how it works and what it can do.

Overall though this is part of a push by companies like Autonomy, but also Attivio, Endeca, Exalead and some others into the search-based application market. The underlying premise of that market is database offloading; the idea of using a search engine rather than a relational database to sort and query information. It holds great promise, partly because it is the bridge between enterprise search and business intelligence but also because of the prospect of cost savings for customers as they can either freeze their investments in relational database licenses, reduce them, or even eliminate them.

Of course if the enterprise search licenses then get so expensive as to nullify the cost benefit, then customers will reject the idea, which is something of which search vendors need to be wary.

Users can apply to joint the beta program at a very non-Autonomy looking website.

The rise of information governance

Our lengthy report that shares a title with this blog post hit the wire yesterday (a high-level exec overview is available here for all).  I’ve blogged before about our efforts on this.  It has been quite a project, with several months of listening, reading and talking with lots IT managers, attorneys, integrators, consultants and vendors.  Oh and writing — the final doc weighs in at 57 pages…

I noted before that I wasn’t sure “information governance” was a specific or real enough sector to warrant this kind of market analysis.  Aren’t we really just talking about archiving?  Or e-discovery?  Or ECM?  In the end, I found we’re talking about all these things, but what is different is that we’re talking about them all together. How do we ensure consistent retention policy across different stores?  How do we safely pursue more aggressive disposition?  How do we include all that “in-the-wild” content in centrally managed policies?

Is “information governance” really the right tag for this?  I don’t know, but I never came across anything better (I did toy with “information retention management” for awhile).  We  might be calling it something else in a couple of years, but the underlying issues are very real.

From the report intro:

What is information governance? There’s no single answer to that question. At a high level, information governance encompasses the policies and technologies meant to dictate and manage what corporate information is retained, where and for how long, and also how it is retained (e.g., protected, replicated and secured). Information governance spans retention, security and lifecycle management issues. For the purposes of this report, we’re focusing specifically on unstructured (or semi-structured,
like email) information and governance as it relates primarily to litigation readiness.

In the report, we look at why organizations are investigating more holistic information governance practices:

  • to be better prepared for litigation
  • to ensure compliance
  • to reduce risks and costs of unmanaged or inconsistently managed information

Then we go into the market with analysis of:

  • the rise of email (and broader) archiving for litigation readiness
  • the relationship of the ECM and records management market
  • Autonomy and other vendors advocating “in-place” approaches to governance

There are also sections on adoption issues, market consolidation and areas for technology innovation.  And profiles of 15 vendors (each with a SWOT analysis) active in this market.

Expect lots more on this topic moving forward.

Autonomy & three phases of eDiscovery/information governance

451 clients will have seen my report of Autonomy’s Q2 results last night, so I’m not talking too much out of school here, but one of the more interesting things for the longer term from its conference call was the identification of three phases of evolution from basic eDiscovery through information governance.

The spot in the call where this was examined was given over to COO Andrew Kanter, who is a lawyer. He didn’t elaborate on it as we has clearly reading from a script (so much so that he said “click,” at the end of each slide ;)), but nevertheless I though it was interesting to note and pass on.

The three phases, which the company believes will encompass roughly five years at most large organizations are:

  1. Archiving and basic e-discovery as companies deal with litigation or are not in compliance
  2. Legal hold and early case assessment – part of what it calls advanced e-discovery – when companies come to the conclusion that manual methods of legal hold – sending emails out to the employees saying not to delete things – don’t work.
  3. The third phase is information governance, i.e. the policies and technologies meant to dictate and manage what corporate information is retained, where and for how long. 

At the moment, the company is seeing ongoing work in phase one and the start of work in phase two. It has one unnamed client doing phase-two work – a Wall Street institution – with 70,000 desktops and 490TB of data to manage across six geographies. Autonomy says the number of potential deals in its pipeline for phase two has increased in the last quarter, but its timelines are still a bit fuzzy. But it seems like Autonomy is not seeing any phase three, i.e. full-on, enterprise-wide information governance work at the moment.

We have seen this movement from e-discovery to information governance in our own research, but we’ve also noticed how early we are in that process. In fact Kathleen Reidy is about to publish our report on information governance that picks up directly from where our December 2008 report on e-Discovery and e-Disclosure left off. In this new report we will examine various approaches to information governance and how it will impact the market for archiving, content management, search and e-Discovery going forward. Kathleen or I can provide more detail should you require it.

A report on information governance – is that what we call it?

As something of a follow-up to the special report we did last fall on the market for eDiscovery tools and technologies, we’ve begun work on a similar report meant to look more deeply at that first process phase in the EDRM — Information Management.

Information management sounds like a nice manageable topic, doesn’t it?

We’re looking specifically at the market for technologies meant to help organizations manage unstructured info (often ad-hoc, like email and unmanaged docs) more effectively so that eDiscovery won’t be such a firedrill if and when it occurs.

eDiscovery isn’t the only reason to get a better handle on this ad-hoc, unstructured info — there are compliance-related reasons in some cases and the costs and risks associated with storing lots of stuff for long periods of time when it should have been culled or deleted.  Conversely, not retaining information or at least having a documented retention and disposition plan is also risky.

As we’ve noted before, some are calling this “information governance.”  So is this a report on the information governance market?  Is there such a thing?

Here are some of the things we’re learning so far with our research:

  • There’s no question that governance is a hot issue with many organizations.  Getting a better handle on email is the biggest pain point.  Check out this recent AIIM survey for some interesting data on this.
  • Better preparedness for eDiscovery is the biggest driver, followed by the complexity of compliance, the need to reduce costs, and security concerns (security-related governance is really a separate market and not one we’re looking at here).
  • One of the fundamental questions seems to come down to whether organizations want to take an archive-based approach to governance or one that is tied to an ECM platform.
  • Since email is the big problem, email archives are a big part of the solution for many companies.
  • Email archives are expanding to handle more diverse content types with more sophisticated retention, classification, legal holds and eDisco tools.
  • The disconnect with this approach seems to be when emails or other content actually are records and need to be managed as such.  How data moves from one system (e.g., archive to records management system) or is managed in-place in an archive by an RM system seems to be mostly an unexplored issue for most organizatins at this point.
  • Because of this, ECM vendors paint archive-only vendors as “point tools.”  ECM vendors see governance as an ECM problem and come at with platforms that generally include both archiving and records management.  But the archives from ECM vendors are generally newer or not traditionally as competitive in pure archiving scenarios.

All of the above makes for quite an interesting, if difficult to label, market.  We’re not really writing a report on the ECM market, since the archives are so critical to handling email especially, the major problem area, and most of the leading email archiving vendors are not full ECM vendors.  But there is definitely an ECM and records management component to this so we’re not just profiling the email archiving market.  In fact, we’re trying to only profile those vendors that can manage multiple content types and, ideally, do so across repositories.

Which I think leaves us talking about the information governance market.  This concerns me a little bit, as I worry that “information governance” is a vague tag and not really an identifiable sector.  But I see no other easy way to describe the intersection of vendors and technologies we see coming at this problem from different areas of strength.

I’d love any comments on what others think about this – is information governance a market?

Microsoft sheds more light on Office 14

Microsoft has begun to share information on what it calls the “waves” of Office 14 products set to hit the market this year and next. Most of the information at this point is on Microsoft Exchange 2010, which has entered public beta. General availability is expected in the second half of this year.

There’s also some info for SharePoint, though little detail. Microsoft SharePoint Server 2010 will go into technical preview in Q3 2009 and be generally available in the first half of 2010.  Beyond that, we still don’t know what will and won’t be in SharePoint.next (though we don’t have to call it that anymore).

The part of the Exchange 2010 announcement that caught my attention is the reference to an integrated e-mail archive.  Did Microsoft just enter the email archiving market?  That would certainly be noteworthy, given that much of the hot email archiving market involves archiving Exchange email.  Since Microsoft hasn’t had a horse in this race, this has been the realm of third-party providers like Symantec and Mimosa Systems to date.

On the analyst telebriefing held today by Microsoft on this announcement, I asked about this and the role for Microsoft’s email archiving partners going forward.  Michael Atalla, Group Product Manager for Exchange at Microft told me that Microsoft is out to meet the needs of the 80% of its customers that don’t yet have any email archiving technology and that existing email archiving products serve a “niche” of the market at the high end for customers that have to meet regulatory requirements for email archiving.

While I agree there is still a lot of opportunity in the email archiving space, describing existing adoption as limited to those in regulated industries isn’t exactly accurate.

I’ve tried to dig deeper into what this integrated archive includes.  Not easy, as there is no mention of archiving at all in the TechNet docs on Exchange 2010 (though there’s quite a bit of interesting detail on records and retention management).

Best I can tell, Exchange 2010 lets you create individual or “personal archives.”  This page from Microsoft explains that a personal archive is:

an additional mailbox associated with a user’s primary mailbox.  It appears alongside the primary mailbox folders in Outlook. In this way, the user has direct access to e-mail within the archive just as they would their primary mailbox. Users can drag and drop PST files into the Personal Archive, for easier online access – and more efficient discovery by the organization. Mail items from the primary archive can also be offloaded to the Personal Archive automatically, using Retention Polices…

So it moves the PST file from the desktop to the server, which makes it more available for online searching and discovery purposes.  But is that really email archiving?  I can see how that would be attractive to end users that want an easier way to access archived emails, but it seems like it would increase the load on the mail server and not handle things like de-duping, which archiving is generally meant to address.

I’m not an expert on email archiving though.  I’d love to hear from anyone who has comments.

Autonomy buys Interwoven

Release is here. Autonomy is paying $775m cash, including a new loan.

Main drivers as we see it right now having just listened to the call:

  • eDiscovery and increasingly regulated environment.
  • Access to Interwoven’s rich customer base in the legal sector.
  • Adding automation to the content management process – think auto-tagging rather than manual tagging.
  • FRCP changes in 2006 forced companies to consider all their data and you can’ manage all your data manually.
  • Autonomy has changed its mind about content management for the reasons above.
  • Reward for Interwoven’s turnaround and refocusing efforts including in eDiscovery via the Discovery Mining acquisition.
  • Leaves other standalone content management players in an even worse position (e.g. Vignette).
  • Autonomy acquisition engine gets some more fuel; it’s looking more & more like a mini-Oracle every day, in all senses of that phrase.

More considered and deep analysis coming to 451 customers later today.

ECM deconstructs to TCM, IG and WCM?

We wouldn’t want to be left out of the new year preview craze and we do publish fairly lengthy end-of-year reviews and year-ahead previews, along with an M&A Outlook, for 451 clients — the full text of the information management reports are here and here and the M&A Outlook for Software starts here (451 Group client log in required for these).

One of my thoughts in our 2009 preview on information management is the title of this post.

I don’t think ECM (enterprise content management) has ever been a particularly well defined market.  It started out earlier in this decade as an idea, a way to talk about the need to rationalize repositories and content apps.  Then it became a market category, a way to talk about content management vendors (mostly those focused on document management really) whether there was really an “enterprise” component to deployments or not.

I think the “ECM” moniker may be nearing the end of its usefulness now (if it was ever apt or useful in the first place).  WCM (web content management) has already splintered off as it became clear that web content is really not just another type of content to be managed by a central repository.  Today WCM is more about online marketing and often ties at least as much to marketing automation and CRM products as it does to other document management apps in an enterprise.

Other “ECM” vendors are focused on TCM (transactional content management), the business process apps (claims processing, loan origination and so forth) that have been the bread and butter for ECM vendors like EMC Documentum and IBM FileNet for years.  We’re seeing more sophistication here, more ties to enterprise business apps (e.g., HR, financial) and more attempts at end-to-end offerings that include capture and document output/presentment.

The other, perhaps bigger, trend for the year ahead is the focus on ‘information governance’ (the IG in the title above) the term many vendors are applying to efforts and product lines aimed at proactive information  management for compliance and eDiscovery purposes.  Information governance from a product perspective generally includes archiving (mostly email), records/retention management and eDiscovery tools.  Here we find ECM vendors like EMC, IBM and Open Text, as well as CA, Symantec, Autonomy and others that have no stake in “ECM” of the TCM variety at all.

What do we mean when we say “ECM” these days?  Vendors like Autonomy and Symantec don’t generally claim to be in the ECM business, but yet they will be increasingly competing with the likes of IBM FileNet, EMC and Open Text for ‘information governance’ business.  It will be interesting to watch how the competitive dynamics (and nomenclature) shakes out in the year ahead.

What we are learning about eDiscovery

My posting here has been light because we’re head-down writing  a major report on eDiscovery which will arrive in November, followed by a webinar. Here’s a few of the things we’ve learned along the way, some of which we suspected in advance, some of which were totally new to us:

  • This is a highly fragmented market – there is no clear leader.
  • The market has been shaped as much by US civil procedure rules and US privacy laws – or lack thereof –  than any technology innovation.
  • However, technology innovation still has a big part to play in this market’s future direction.
  • End users are growing tired of paying by the gigabyte – new models will emerge.
  • Purchasing influence is shifting rapidly from law firms to the corporate legal departments (those large bills have focused their mind in a hurry).
  • End users are very reluctant to talk publicly about what they’re doing (but boy, are we trying to persuade them to!)
  • Some (but not all) of the large information management vendors that should have a strategy in this area don’t have anything of the sort (see first point).

Anyway there will be more where that came from when the report is out, and we’ll make sure the webinar details are posted here ahead of time. Plus we’ll be talking about this at our annual client event, which is November 10-11 in Boston, MA. See you there!

E-discovery discovery

We’ve been covering the e-discovery big guns and usual suspects here at The 451 Group in one way or another for about five years now. But we’re looking to get more systematic about it in part in preparation for a long-form market overview of this sector to come this fall. There are certainly no shortage of vendors targeting this market, as anyone attending the LegalTech conference this year would tell you.

We currently have several analysts looking at this market from different angles: Nick and Katey cover the search and text analytics vendors, Simon and Henry keep track of storage and archiving, and Kathleen looks after records management and content management aspects.

But with this approach, we wonder who we’re missing. Where are the up-and-comers? Are there any start-ups or new emerging companies you’ve had your eye on? Let us know in the comments or via email so we can make sure our e-discovery coverage is more comprehensive.

Text Analytics 2008 Redux

You’ve had Nick’s take, now here’s mine, with a little overlap – great minds think alike, right? 😉 We were not expecting the 40 attendees for the pre-conference workshops during prime Sunday TV viewing time. Seth Grimes laid out “Text Analytics for Dummies,” while Nick gave a market overview. But the attendance (and the long Q&A sessions) were good indicators of user enthusiasm and the desire for real, practicable advice about the field.

Some of the other memorable moments:

  • Best of the vendor panel: Seth Grimes’s challenge to say something nice about a fellow vendor’s offerings. And the vendors’ response to an audience question about incorporating UIMA, which was uniformly that it wasn’t necessary or in demand.
  • The Facebook presentation on trend-tracking through users’ “Wall” posts was brought back for an encore by popular demand. The crowd in my session was a little confrontational about the amount of analysis being done on the available information (never enough!), but as far as quick and dirty zeitgeist goes, it was unbeatable, and a lot of fun.
  • The Clarabridge 1-hour deployment was good sport, with at least one customer’s testimony that once the system is learned, it can actually be configured with speed approaching that of CTO Justin Langseth. You have to hand it to Clarabridge: they make it look easy.

Some thoughts on the users’ takes:

  • In presentations and in private chats, frequently recurring themes among vendors was eDiscovery and social media – some of the drivers for the market. The user questions I heard were mostly about sentiment analysis, deployment time and ROI. Specifically, information on how to judge all of the offerings – is sentiment analysis accurate enough? What is the expected deployment time, what is the ROI?
  • Precision and recall went back and forth again, but the hard truth is that the edge depends on the application. For patents or PubMed searches or eDiscovery, you need recall. For other applications, precision is paramount. Some users I spoke with mistook this as a lack of accuracy – it’s more of a sliding scale of usefulness.
  • Accuracy was a recurring issue, both because text analytics is an emerging technology, and, of course, text is messy and imprecise. Partly it’s a matter of maturation. But the “fast / cheap / or good – pick any two” truism about software development is equally true here. Even with built in taxonomies and dictionaries or domain-specific knowledge, any text analytics software needs configuration to increase accuracy for its application and user, which takes time.
  • “Win fast and win often” – great words from Tony Bodoh of Gaylord Hotels, on the user panel. Because of the financial investment, the fact that text analysis software can automate (obsolete) some employee work, the time it takes to configure, and general resistance to change, it is important to gain both executive and user buy-in early in the process. Chris Jones of Intuit echoed the sentiment, adding that it’s not advisable to go after your largest (and most time-consuming) problem first – come up with a number of smaller successes to prove the concept to users and higher-ups. Incidentally, both of these are Clarabridge users.
  • Jones also noted that one of his “lessons learned” was to avoid over-configuring or too much tinkering with the analytics. He advised after a prudent amount of configuration to treat it more or less like a black box, and not worry about what is going on under the hood, just let it do its job and leave it to the professionals.
  • Some more wisdom from the user panel: you can’t go into a text analytics deployment expecting quantifiable ROI. “You don’t know what you don’t know” – which is what the tool is there to solve. In many cases, the real potential isn’t obvious until you can see how it works with your business. At that point it’s possible to come up with applications that not even its creators could have thought up.
  • Lastly (and this is not a new sentiment, but it meant more coming from school Superintendent Chris Bowman, who looked like he had my parents on speed-dial): the text analytics field is emerging, and will become integrated with larger applications. This will eventually render a conference like this obsolete, but it also means a great chance to get a leg up as an early adopter.

Looking forward to next year!