June 8th, 2009 — Text analysis
The 2009 Text Analytics Conference was a great time, congratulations to the organizers for once again putting on a terrific event. I heard from one of them that attendance was down 20% from last year, which sounds about right given the economic situation and travel budgets right now, but it didn’t put a damper on the festivities.
Voice of the customer was once again the application that got the most play, from vendors and speakers. However reputation analysis/opinion mining/buzz monitoring – or what was sometimes called social media analysis – was a close second this year, with an eye to the lower-cost offerings springing up in this area to mine blogs and internet forums. Some related points:
-
Twitter came up several times (it’s everywhere this year of course), but prevailing opinion was that it’s not a great resource for text mining – too many misspellings, abbreviations, and just plain not enough text per tweet to be able to get a good read on the content.
-
Facebook’s Roddy Lindsay was back to offer an overview of some of the projects underway to mine popular topics on the site for insight on its users and how their age, gender and regional demographics affect their views. Unfortunately as data on Facebook is private to its users and their network of friends, this was kind of a tease for those of us who would love a bigger peek at it.
-
In non-social media, another sentiment analysis-focused site, the
Financial Times’ recently launched meaning-based news search
Newssift, also got some mentions (in part because two of the vendors present,
Lexalytics and
Endeca, were involved in the project along with
NStein and Reel2).
End users were well-represented this year, and I was even fortunate enough to get to moderate the end user panel, featuring former school superintendent Chris Bowman, Mike House of Maritz Research, Bryan Jeppsen of JetBlue, John Lehto of Monster and Rick Lewis of AOL. The gentlemen weighed in on everything from technical problems (they overwhelmingly chose SaaS to avoid issues) to variations on the inevitable ROI question, and provided some much-needed perspective to what end users expect out of the vendors and their products. Response has been good, and for anyone wanting more, be aware that the ever-quotable Mr. Bowman is now on Twitter and may very well be watching your every move.
June 4th, 2009 — Text analysis
With the 5th annual Text Analytics Summit now in the bag, here are my thoughts on the event.
My talk on which vendor options to choose on Sunday night was, I think at least, well received. Probably only about 30 people in the room but all bar about 5 of them were end users, which is good. The slides are available to anyone who drops me a note, and for those that were there on Sunday, I will get them to you very soon.
That end-user theme carried on to the main conference, whereby there was a higher proportion of end users this year than last year without a doubt. The overall attendance was down slightly and when I saw the list on Monday morning I was concerned, but more than a third of them were users, which was much better than last year when there was often a feeling of vendors pitching to other vendors, which doesn’t help anybody.
A fair few of the end users present were at a very early stage of their assessment, too. Many were merely aware that text analytics can do something for them, but hadn’t engaged properly with any of the vendors. I will be following up with those and the other users I met during the conference as we look to help them evaluate their vendor options.
The end-user panel, moderated well by our own Katey Wood was interesting as ever. Jon Lehto of Monster.com had some rich insight and Bryan Jeppsen at JetBlue, now two years into its use of Attensity explained how it had changed its customer surveys from 1 open-ended question in 40 (and 39 structured questions) to mostly open-ended as it now has the power to analyze that text and get insight it would have never had received had it had to work out in advance what sort of answer it wants. Both AOL and JetBlue were able to bypass their IT departments and go with the SaaS versions of their vendors’ products.
The analyst panel, if I’m being honest, was probably a bit flat from the audience’s perspective as we were agreeing too much. I tried to disagree at one point but then didn’t quite clarify what I meant, so I did it in an earlier post. We had a question from the audience from someone at Whirlpool about ROI which we all struggled with a bit. That’s because ROI on text analytics apps is tricky because
- quite often you’re doing something completely new that you’ve never been able to think of doing before, such as automatically parsing customer’s comments on blogs
- many text analytics apps are quite small and thus don’t often require such an ROI measure
- they’re often part of some sort of competitive or customer intelligence effort that’s much larger and thus the text analytics element itself isn’t subject to ROI.
But clearly for a company with the size of investment Whirlpool has made with text analytics, it’s a valid question and made us all ponder the ROI question a bit more deeply.
Things I thought I’d hear more about but didn’t: cloud and eDiscovery. There were SaaS-based representatives there in the shape of Clarabridge and Attensity for sure and Clarabrige in particular has some great reference customers willing to speak on its behalf, notably AOL and Intuit. But in terms of true cloud-based text analytics, it’s still too early, and may even been so next year.
I was more surprised not to hear much about eDiscovery. What little I did hear (apart from the listening to the sound of my own voice, of course) was from Ernst & Young and its proactive fraud detection work, plus some of which has been parlayed from previous successful eDiscovery work with clients, which is exactly what we thought would be happening (always good to hear end user validations of predictions made in research).
Things I though I’d hear about and did: sentiment analysis. Last year it was the undercurrent of the conference. This year it came very much to the surface. There wasn’t too much difference between a lot of the offerings and some of the presentations (but by no means all) were a bit too down in the weeds. But there’s tons of interesting implementations out there now, although a fair amount of work still to be done.
Anyway overall it was well worth it and I recommend the conference next year to anyone interested in how to leverage text for insight into customers, competitors, risk exposure or all sorts of other business and organizational issues.
June 2nd, 2009 — Text analysis
I made a comment on the analyst panel at the end of day 1 about the emergence of startups in this space that I wanted to qualify, as it’s caused a bit of confusion here at the Text Analytics Summit. The other three panel members said they are seeing startups while I said I’m not and nor are VC customers asking about them in the way they did a few years back. I said that partly to shake up the panel a bit as we were agreeing on everything until then, which isn’t that interesting for the audience ;), but I meant it in a specific way.
The main area where text analytics-based startups have emerged in the last few years is in sentiment analysis, in areas such as opinion mining, buzz, product/service reviews and advertising targeting. Many of these apps are being used by enterprises for sure.
But what I was referring to is that I’m not seeing companies offering text analytics tools (whether on-premise or on a SaaS or cloud basis) that can be used as the basis of text-aware or search-based applications. I am seeing a lot of demand and interest in those apps from enterprises (our main focus here at 451) but the tools to build them are not coming from startups.
Instead they’re coming mainly from more established search, content management and eDiscovery-focused companies (with one or two notable exceptions, such as Attivio and Digital Reef in the past two years). There is probably room for more startups in this space, that’s for sure.
More on what has been a great conference so far later.
May 5th, 2009 — Search, Text analysis
We have two ‘summit’s coming up in the next few weeks on the east coast that we’ll be attending.
We’ll be at the Enterprise Search Summit in New York May 12-13 at the Hilton on 6th Avenue. We have a bunch of meetings already but still have room for more, so if you’re attending and would like to meet (end users in particular, but vendors too), please get in touch with myself or Katey.
And just a few weeks later we’ll be in Boston where I’ll be at the 5th annual Text Analytics Summit. I’m doing the Sunday night graveyard slot once again on May 31, laying out my assessment of vendors fo but last year (it’s called “Top Tips on Vendor Choices” in the agenda). I recall it was enjoyable and we ended up taking the conversation to the bar afterward; a tradition I intend to continue this year. I’m also on a panel at the end of Day 1 (June1), right before cocktails (I’m seeing a trend here). Likewise, please get in touch if you want to meet up. I’m staying in Boston June 3 to meet clients, then back to London.
June 20th, 2008 — Archiving, Search, Text analysis
You’ve had Nick’s take, now here’s mine, with a little overlap – great minds think alike, right? 😉 We were not expecting the 40 attendees for the pre-conference workshops during prime Sunday TV viewing time. Seth Grimes laid out “Text Analytics for Dummies,” while Nick gave a market overview. But the attendance (and the long Q&A sessions) were good indicators of user enthusiasm and the desire for real, practicable advice about the field.
Some of the other memorable moments:
- Best of the vendor panel: Seth Grimes’s challenge to say something nice about a fellow vendor’s offerings. And the vendors’ response to an audience question about incorporating UIMA, which was uniformly that it wasn’t necessary or in demand.
- The Facebook presentation on trend-tracking through users’ “Wall” posts was brought back for an encore by popular demand. The crowd in my session was a little confrontational about the amount of analysis being done on the available information (never enough!), but as far as quick and dirty zeitgeist goes, it was unbeatable, and a lot of fun.
- The Clarabridge 1-hour deployment was good sport, with at least one customer’s testimony that once the system is learned, it can actually be configured with speed approaching that of CTO Justin Langseth. You have to hand it to Clarabridge: they make it look easy.
Some thoughts on the users’ takes:
- In presentations and in private chats, frequently recurring themes among vendors was eDiscovery and social media – some of the drivers for the market. The user questions I heard were mostly about sentiment analysis, deployment time and ROI. Specifically, information on how to judge all of the offerings – is sentiment analysis accurate enough? What is the expected deployment time, what is the ROI?
- Precision and recall went back and forth again, but the hard truth is that the edge depends on the application. For patents or PubMed searches or eDiscovery, you need recall. For other applications, precision is paramount. Some users I spoke with mistook this as a lack of accuracy – it’s more of a sliding scale of usefulness.
- Accuracy was a recurring issue, both because text analytics is an emerging technology, and, of course, text is messy and imprecise. Partly it’s a matter of maturation. But the “fast / cheap / or good – pick any two” truism about software development is equally true here. Even with built in taxonomies and dictionaries or domain-specific knowledge, any text analytics software needs configuration to increase accuracy for its application and user, which takes time.
- “Win fast and win often” – great words from Tony Bodoh of Gaylord Hotels, on the user panel. Because of the financial investment, the fact that text analysis software can automate (obsolete) some employee work, the time it takes to configure, and general resistance to change, it is important to gain both executive and user buy-in early in the process. Chris Jones of Intuit echoed the sentiment, adding that it’s not advisable to go after your largest (and most time-consuming) problem first – come up with a number of smaller successes to prove the concept to users and higher-ups. Incidentally, both of these are Clarabridge users.
- Jones also noted that one of his “lessons learned” was to avoid over-configuring or too much tinkering with the analytics. He advised after a prudent amount of configuration to treat it more or less like a black box, and not worry about what is going on under the hood, just let it do its job and leave it to the professionals.
- Some more wisdom from the user panel: you can’t go into a text analytics deployment expecting quantifiable ROI. “You don’t know what you don’t know” – which is what the tool is there to solve. In many cases, the real potential isn’t obvious until you can see how it works with your business. At that point it’s possible to come up with applications that not even its creators could have thought up.
- Lastly (and this is not a new sentiment, but it meant more coming from school Superintendent Chris Bowman, who looked like he had my parents on speed-dial): the text analytics field is emerging, and will become integrated with larger applications. This will eventually render a conference like this obsolete, but it also means a great chance to get a leg up as an early adopter.
Looking forward to next year!
June 20th, 2008 — Text analysis
Overall, the conference was very interesting and well worth attending, from our perspective. Good attendance – 190 versus 140 last year – and a better mix of users and vendors, though not clear how many of those users came because of vendor sponsorships. Nevertheless it made for better discussions rather than vendors talking to themselves.
- There wasn’t a lot to set one vendor apart from another. A lot of the vendor presentations were quite similar. This was summed up by a presentation from Ernest & Young at the end of Day 2, where ostensibly there as an Autonomy customer, the presentation actually showed tools from Megaputer and Seagate’s Metalincs, as well as Autonomy and he said ‘any of those guys next door [in the vendor exhibit space] could do this.” Quite.
- SaaS will be one area where vendors can differentiate themselves from the pack, at least in the dshort term – Clarabridge and Attensity are leading the way there.
- The thing most people wanted to know about was sentiment analysis. There’s a lot of vendors out there as we’ve noted before and a fair amount of confusion on the part of prospective users as to what it is, why it’s useful and what it might do for them. There will definitely be vendor consolidation over the next year in this space
- The pre-conference workshop where I presented my overview of the vendor landscape was much more fun than I’d anticipated – I think my vendor overview went well – thought I wasn’t clear in what way it was a workshop, really just a couple of presentations from Seth Grimes and myself with questions afterwards, although we did some great follow-up in the bar, I guess that counts towards it! We had about 40 people there and over the next two days I met a few who said they didn’t show up because it was Father’s Day (word to the wise when organizing conferences ;)). For anyone waiting for my slides, I will send them to you, just drop me a line.
- Everyone wanted to know what Facebok was doing and although it was interesting, I found it a little underwhelming, even if it was quite fun. Still they clearly have some very smart people there. What a corpus on which to experiment – the writing on everyone’s walls. I’m sure they’ll come up with more interesting applications of text analysis over time; indeed the presenter acknowldted that things like term disambiguation and sentiment analysis are on the roadmap, which is where things will ge interesting.
- The EU government intelligence market may prove lucrative for some of these vendors, we intend to investigate that further ourselves.
- As with any relatively obscure technology area, those implementing text analysis need to get some quick wins under their belt rather than go for the hardest problem first – Gaylord Hotels and Intuit (both Clarabridge customers, Clarabridge was the main sponsor) both emphasized that, as did others.
- There was very little talk of semantic technologies, despite my best efforts to drum some up. I think that will change as text analysis and semantic tech are much more closely related than the players therein seem to want to admit.
- There was perhaps too much content, a lot of presentations which fed the problem I mentioned earlier, of many vendors sounding very similar to one another.
- There was not enough to be heard from the large vendors that have built or bought their way into this market – notably SAP-Business Objects (they was one person there from the Inxight team, but not presenting); SAS Insitiutite had a lot of people on the attendee list but most of them didn’t show for some reason, and although IBM had one presentation, I would have liked to have seen more. Microsoft’s presence was Matthew Hurst, who is clearly thinking pretty far ahead in terms of social media analysis and got a lot of people’s attention, including mine.
I’ll definitely be back next year.
June 2nd, 2008 — Text analysis
I will be speaking twice at the forthcoming Text Analytics Summit in Boston June 15-17.
I am presenting a market overview of all the text analytics vendors on the night before the conference proper starts. Yes, that’ll be a Sunday, at 7pm [UPDATE: it’s been moved to 6pm] for those of you at a loss for something to do in Boston’s Back Bay area on a Sunday evening! I’m hoping there will be a lot of interesting customers in the audience to critique my analysis and provoke some discussion. I’ll be covering a dozen or so vendors, following on immediately from a presentation given by conference chair Seth Grimes, that aims to answer the ‘what can text anlaytics do for me?’ type question.
And on the afternoon of the first day (June 16) I’m one of five – count ’em! – analysts on a panel moderated by Olivier Jouve of SPSS. The others are Sue Feldman of IDC, Fern Halper of Hurwitz, Lyndsay Wise of Wise Analytics and Seth, again. It should be interesting and well try and say something original, despite there being five of us all talking abot rouhgly the same subject!
It’s the fourth year of the conference (this’ll be my second year, I did the anlayst panel last year) and it’s the main US gathering of those selling, using (or looking to use) or investing in text anlaytics. If you read this and attend the conference, please introduce yourself, I’d love to meet our readers