Entries Tagged 'Text analysis' ↓

Enterprise search & text analysis market sizing report

I’m pleased to announced that the first market sizing report from our Information Management practice here at 451 has been published. It covers the enterprise search and text analysis markets, providing revenues figures from 2009-2013 and our growth expectations for those years.

We look at the reasons for that projected growth, identifying 10 drivers overall, one of which is the rise of search-based applications. At some point in the future we’d like to try and size that market, although it’s too nascent to put a number on it just yet.

You can download an executive summary or find out more about the report here.

Suffice to say I’m very excited about this new addition to our coverage, adding the quantitative element to our many years of analyzing the market on a qualitative basis.

This report will be updated every six months with new figures and every 12 months with new analysis an figures. We provide analysis of the industry throughout the year through our Market Insight Service in shorter, more regular form.

This is not only the fist in a series of reports on the enterprise search business, but also the first in a series of market sizing reports within information management. The next will be on the data warehousing business, due in early 2010, written by Matt Aslett.

Autonomy pops up to pronounce an RDBMS revolution is afoot

In one of those Autonomy announcements that seemingly appear out of nowhere, the company has declared its intention to “transform” the relational database market by applying its text analysis technology to content stored within database. The tool is called IDOL Structured Probabilistic Engine (SPE), as it uses the same Bayesian-based probabilistic inferencing technology that IDOL uses on unstructured information.

The quote from CEO Mike Lynch grandly proclaims this to be Autonomy’s “second fundamental technology” – IDOL itself being the first. That’s quite a claim and we’re endeavoring to find out more and will report back as to exactly how it works and what it can do.

Overall though this is part of a push by companies like Autonomy, but also Attivio, Endeca, Exalead and some others into the search-based application market. The underlying premise of that market is database offloading; the idea of using a search engine rather than a relational database to sort and query information. It holds great promise, partly because it is the bridge between enterprise search and business intelligence but also because of the prospect of cost savings for customers as they can either freeze their investments in relational database licenses, reduce them, or even eliminate them.

Of course if the enterprise search licenses then get so expensive as to nullify the cost benefit, then customers will reject the idea, which is something of which search vendors need to be wary.

Users can apply to joint the beta program at a very non-Autonomy looking website.

Quick thoughts on IBM-SPSS

Quick thoughts on the deal. We will have a full report for clients tonight. This is mainly thoughts about the text analytics part and I haven’t had a chance to speak with either company at the time of writing, so bear that in mind.

  • This is long-predicted, by us and many others. I recall a chat with SAS founder and CEO Jim Goodnight a couple of years ago and he said it me – and I’m slightly paraphrasing –  in so many words, “why doesn’t IBM just buy them, I don’t understand why they haven’t already?” Well IBM finally has, or at least has made the initial move. And for $50 per share or almost $1.2bn.
  • Of course like almost every IBM deal in recent years, the two are partners, IBM signed an OEM deal for the SPSS’ PASW statistics software in Q2 and has had other deals with it going back many years.
  • IBM has text anlaytics tools, of course but they really are just that; tools. It is not a major player in text analytics applications at this juncture. The vast majority of its engagements tend to be very large, custom-based ones and are still few and far between, as far as we can gather, mostly in financial services and telecommunications.
  • SPSS, on the other hand has tools, workbenches and applications and has found  some hot spots in this area, including analyzing customer feedback surveys, in particular the open-ended questions that can provide some of the richest material in such surveys but are often ignored because they’re too manual-intensive to analyze by hand.
  • SAS Institute now has a much bigger analytics competitor. Goodnight didn’t rate SPSS much as a competitor, but IBM? That’s a bit different.
  • SAP-Business Objects must be thinking of making a move too.

More considered thoughts from myself and my fellow 451 analysts later on today.

Text Analytics Summit 2009

The 2009 Text Analytics Conference was a great time, congratulations to the organizers for once again putting on a terrific event. I heard from one of them that attendance was down 20% from last year, which sounds about right given the economic situation and travel budgets right now, but it didn’t put a damper on the festivities.

Voice of the customer was once again the application that got the most play, from vendors and speakers. However reputation analysis/opinion mining/buzz monitoring – or what was sometimes called social media analysis – was a close second this year, with an eye to the lower-cost offerings springing up in this area to mine blogs and internet forums. Some related points:

  • Twitter came up several times (it’s everywhere this year of course), but prevailing opinion was that it’s not a great resource for text mining – too many misspellings, abbreviations, and just plain not enough text per tweet to be able to get a good read on the content.
  • Facebook’s Roddy Lindsay was back to offer an overview of some of the projects underway to mine popular topics on the site for insight on its users and how their age, gender and regional demographics affect their views. Unfortunately as data on Facebook is private to its users and their network of friends, this was kind of a tease for those of us who would love a bigger peek at it.
  • In non-social media, another sentiment analysis-focused site, the Financial Times’ recently launched meaning-based news search Newssift, also got some mentions (in part because two of the vendors present, Lexalytics and Endeca, were involved in the project along with NStein and Reel2).

End users were well-represented this year, and I was even fortunate enough to get to moderate the end user panel, featuring former school superintendent Chris Bowman, Mike House of Maritz Research, Bryan Jeppsen of JetBlue, John Lehto of Monster and Rick Lewis of AOL. The gentlemen weighed in on everything from technical problems (they overwhelmingly chose SaaS to avoid issues) to variations on the inevitable ROI question, and provided some much-needed perspective to what end users expect out of the vendors and their products. Response has been good, and for anyone wanting more, be aware that the ever-quotable Mr. Bowman is now on Twitter and may very well be watching your every move.

Text Analytics Summit 2009

With the 5th annual Text Analytics Summit now in the bag, here are my thoughts on the event.

My talk on which vendor options to choose on Sunday night was, I think at least, well received. Probably only about 30 people in the room but all bar about 5 of them were end users, which is good. The slides are available to anyone who drops me a note, and for those that were there on Sunday, I will get them to you very soon.

That end-user theme carried on to the main conference, whereby there was a higher proportion of end users this year than last year without a doubt. The overall attendance was down slightly and when I saw the list on Monday morning I was concerned, but more than a third of them were users, which was much better than last year when there was often a feeling of vendors pitching to other vendors, which doesn’t help anybody.

A fair few of the end users present were at a very early stage of their assessment, too. Many were merely aware that text analytics can do something for them, but hadn’t engaged properly with any of the vendors. I will be following up with those and the other users I met during the conference as we look to help them evaluate their vendor options.

The end-user panel, moderated well by our own Katey Wood was interesting as ever. Jon Lehto of Monster.com had some rich insight and Bryan Jeppsen at JetBlue, now two years into its use of Attensity explained how it had changed its customer surveys from 1 open-ended question in 40 (and 39 structured questions) to mostly open-ended as it now has the power to analyze that text and get insight it would have never had received had it had to work out in advance what sort of answer it wants. Both AOL and JetBlue were able to bypass their IT departments and go with the SaaS versions of their vendors’ products.

The analyst panel, if I’m being honest, was probably a bit flat from the audience’s perspective as we were agreeing too much. I tried to disagree at one point but then didn’t quite clarify what I meant, so I did it in an earlier post. We had a question from the audience from someone at Whirlpool about ROI which we all struggled with a bit. That’s because ROI on text analytics apps is tricky because

  • quite often you’re doing something completely new that you’ve never been able to think of doing before, such as automatically parsing customer’s comments on blogs
  • many text analytics apps are quite small and thus don’t often require such an ROI measure
  • they’re often part of some sort of competitive or customer intelligence effort that’s much larger and thus the text analytics element itself isn’t subject to ROI.

But clearly for a company with the size of investment Whirlpool has made with text analytics, it’s a valid question and made us all ponder the ROI question a bit more deeply.

Things I thought I’d hear more about but didn’t: cloud and eDiscovery. There were SaaS-based representatives there in the shape of Clarabridge and Attensity for sure and Clarabrige in particular has some great reference customers willing to speak on its behalf, notably AOL and Intuit. But in terms of true cloud-based text analytics, it’s still too early, and may even been so next year.

I was more surprised not to hear much about eDiscovery. What little I did hear (apart from the listening to the sound of my own voice, of course) was from Ernst & Young and its proactive fraud detection work, plus some of which has been parlayed from previous successful eDiscovery work with clients, which is exactly what we thought would be happening (always good to hear end user validations of predictions made in research).

Things I though I’d hear about and did: sentiment analysis. Last year it was the undercurrent of the conference. This year it came very much to the surface. There wasn’t too much difference between a lot of the offerings and some of the presentations (but by no means all) were a bit too down in the weeds. But there’s tons of interesting implementations out there now, although a fair amount of work still to be done.

Anyway overall it was well worth it and I recommend the conference next year to anyone interested in how to leverage text for insight into customers, competitors, risk exposure or all sorts of other business and organizational issues.

Text Analytics startups

I made a comment on the analyst panel at the end of day 1 about the emergence of startups in this space that I wanted to qualify, as it’s caused a bit of confusion here at the Text Analytics Summit. The other three panel members said they are seeing startups while I said I’m not and nor are VC customers asking about them in the way they did a few years back. I said that partly to shake up the panel a bit as we were agreeing on everything until then, which isn’t that interesting for the audience ;), but I meant it in a specific way.

The main area where text analytics-based startups have emerged in the last few years is in sentiment analysis, in areas such as opinion mining, buzz, product/service reviews and advertising targeting. Many of these apps are being used by enterprises for sure.

But what I was referring to is that I’m not seeing companies offering text analytics tools (whether on-premise or on a SaaS or cloud basis) that can be used as the basis of text-aware or search-based applications. I am seeing a lot of demand and interest in those apps from enterprises (our main focus here at 451) but  the tools to build them are not coming from startups.

Instead they’re coming mainly from more established search, content management and eDiscovery-focused companies (with one or two notable exceptions, such as Attivio and Digital Reef in the past two years). There is probably room for more startups in this space, that’s for sure.

More on what has been a great conference so far later.

Upcoming Enterprise Search & Text Analytics summits

We have two ‘summit’s coming up in the next few weeks on the east coast that we’ll be attending.

We’ll be at the Enterprise Search Summit in New York May 12-13 at the Hilton on 6th Avenue. We have a bunch of meetings already but still have room for more, so if you’re attending and would like to meet (end users in particular, but vendors too), please get in touch with myself or Katey.

And just a few weeks later we’ll be in Boston where I’ll be at the 5th annual Text Analytics Summit. I’m doing the Sunday night graveyard slot once again on May 31, laying out my assessment of vendors fo but last year (it’s called “Top Tips on Vendor Choices” in the agenda). I recall it was enjoyable and we ended up taking the conversation to the bar afterward; a tradition I intend to continue this year. I’m also on a panel at the end of Day 1 (June1), right before cocktails (I’m seeing a trend here). Likewise, please get in touch if you want to meet up. I’m staying in Boston June 3 to meet clients, then back to London.

Brief thoughts on Attensity Group

451 clients will be getting our fill report on this deal today, so I won’t be spilling all our thoughts on the deal (or all the details of the structure) here. But here’s a few initial thoughts on the news:

  • The new entity will comprise a bigger threat against the many text analysis competitors recently acquired by much larger companies, notably Teragram by SAS, ClearForest by Reuters and Inxight by Business Objects (and subsequently by SAP), as well as SPSS, which got into this business via acquisition back in 2003.
  • Such deals aren’t usually done from a position of dominance and it’s fair to say that Attensity wasn’t growing as fast as it used to be. They’re driven in part by investors who either want a payout now or see the potential for one by adding heft to a company and thus getting some economies of scale.
  • Attensity remains in the voice of the customer business, but adds a few more, including customer self-service.
  • CEO Ian Bonner and CTO Ian Hersey are back together almost two years since selling Inxight.
  • Hersey now has a major software integration job on his hands for the next couple of years.

Autonomy buys Interwoven

Release is here. Autonomy is paying $775m cash, including a new loan.

Main drivers as we see it right now having just listened to the call:

  • eDiscovery and increasingly regulated environment.
  • Access to Interwoven’s rich customer base in the legal sector.
  • Adding automation to the content management process – think auto-tagging rather than manual tagging.
  • FRCP changes in 2006 forced companies to consider all their data and you can’ manage all your data manually.
  • Autonomy has changed its mind about content management for the reasons above.
  • Reward for Interwoven’s turnaround and refocusing efforts including in eDiscovery via the Discovery Mining acquisition.
  • Leaves other standalone content management players in an even worse position (e.g. Vignette).
  • Autonomy acquisition engine gets some more fuel; it’s looking more & more like a mini-Oracle every day, in all senses of that phrase.

More considered and deep analysis coming to 451 customers later today.

451 Group client event last week

Later than I intended, I wanted to give you a quick update of last week’s client event and information management’s presence at it. Kathleen, Simon, Henry, Matt and me were engaged in many 1:1s – I had 15 over the two days, which were very useful for me and more importantly, from feedback we’ve had, useful to the other person as well. Some of our analysts were booked back to back, doing 20+ meetings; that level of engagement is one of the main values we deliver at our conferences.

On the presentation and panels front, Kathleen did a great job of laying out her vision of how collaboration and social software are finally impacting content technologies, moving beyond just things that enable you to create content, to enable organizations to better handle the risks that can create. Some people who weren’t able to hear her live have asked to hear it by way of a followup – if you do, please get in touch.

My panel was great, comprising Sid Probstein, CTO of Attivio, Stephen Whetstone of Iron Mountain-Stratify and Nicole Eagan, CMO of Autonomy. We were in the after lunch slot but given we were talking mainly about eDiscovery, the future of search and the effects of the credit crunch on information management, we still got people’s attention.

Anyway, don’t take my word for it, listen to what Sid says about it, plus his thoughts on other aspect of the event here and here. I couldn’t have put it better myself!

See you in Boston next year, I hope.