Entries Tagged 'Text analysis' ↓
October 20th, 2008 — Content management, Text analysis
Our survey partner ChangeWave Research released its latest corporate software survey late last week with new data from its alliance group on software purchasing. The ChangeWave Alliance Research Network is a group of about 20,000 business, technology, and medical professionals that participate in ChangeWave’s surveys as part of the company’s primary research efforts.
The title of this month’s report is Corporate Software Spending: 90 Day Outlook, Sharpest Decline for Software Purchasing on Record. Sounds cheery, doesn’t it? The report notes that “the spending decline is now hitting all software categories – the first time this has occurred in a ChangeWave survey.”
You’ll have contact ChangeWave for all the data, but I wanted to pull out the stats on the ECM specifically. 31% of those surveyed expect to decrease spending on document and enterprise content management software in the next 90 days versus only 5% expecting an increase, for a net decrease of 26%. This was the largest decrease of any one software category included in the survey.
This surprises me somewhat as ECM generally includes a good deal of compliance, governance and risk management-related technologies, along with core business process enablement, not things that can be easily cut or postponed. But perhaps other content management areas, like a customer website overhaul or intranet / internal collab platform do-over, are being seen as non-critical and put off.
October 16th, 2008 — Search, Text analysis
Amidst the usual explanations of margins, day sales outstanding, average deal sizes, organic growth rates and other financial minutiae (which we like, btw), Autonomy used the following slide during its Q3 earnings call yesterday, ramming home the importance to it and other software companies like it of eDiscovery and the Electronic Discovery Reference Model (EDRM), from which this is adapted:
And these ducks in a row sat there while the management took questions from the financial analysts, while berating a few of them in the process for questioning its organic growth model, which Autonomy laid out for all to see. Our quick take on Autonomy’s earnings is here for 451 clients.
EDRM also, in part the basis of our upcoming eDiscovery report, which will take a thorough look at the current future states of the eDiscovery and eDisclosure (as it’s known in the UK) software and services market.
Please get in touch with me if you would like to know more about that.
September 30th, 2008 — Archiving, Content management, Search, Text analysis
My posting here has been light because we’re head-down writing a major report on eDiscovery which will arrive in November, followed by a webinar. Here’s a few of the things we’ve learned along the way, some of which we suspected in advance, some of which were totally new to us:
- This is a highly fragmented market – there is no clear leader.
- The market has been shaped as much by US civil procedure rules and US privacy laws – or lack thereof – than any technology innovation.
- However, technology innovation still has a big part to play in this market’s future direction.
- End users are growing tired of paying by the gigabyte – new models will emerge.
- Purchasing influence is shifting rapidly from law firms to the corporate legal departments (those large bills have focused their mind in a hurry).
- End users are very reluctant to talk publicly about what they’re doing (but boy, are we trying to persuade them to!)
- Some (but not all) of the large information management vendors that should have a strategy in this area don’t have anything of the sort (see first point).
Anyway there will be more where that came from when the report is out, and we’ll make sure the webinar details are posted here ahead of time. Plus we’ll be talking about this at our annual client event, which is November 10-11 in Boston, MA. See you there!
July 29th, 2008 — eDiscovery, Search, Text analysis
I’ll be relocating from New York City to London as of August 1, but continuing in more or less in my same role here at The 451 Group.
After more than a dozen years in Manhattan, I’m moving back to the old country (those of you that know me will know I more or less managed to retain my accent) and as such, I’ll obviously be closer to the European scene. I hope to unearth more European vendors and customers than we’re already covering, although we will continue to actively cover all the US vendors we already follow and continue to dig for more there as well.
I expect to hear more about semantic technology than I do in the US (although that’s already quite a lot) and I expect the drivers for using text technologies of various types to be slightly different in Europe.
August will be a bit hectic sorting things out, but I’ll be fully up to speed by the start of September, that’s for sure.
The only contact that will change is my office number, which I’ll be sending around to key contacts.
Katey Wood, my research associate remains in the 451 Group’s New York City HQ, helping me cover this fascinating market.
July 17th, 2008 — Text analysis
We’ve just produced a mid-year look at the state of the text analysis market, available to 451 customers here. In it we look at the major drivers now and in the future while giving assessments of the current state of the vendors we cover in this market.
The vendors discussed include Attensity, Autonomy Corp, Basis Technology, Business Objects (including Inxight), Clarabridge, IBM (including Cognos), Infonic, Intelligenxia, IxReveal , Lexalytics, Megaputer Intelligence, Microsoft (including FAST), Nstein Technologies, SAP, SAS Institute (including Teragram), SPSS, Temis, Teradata, Thomson Reuters (including ClearForest), & Viziant.
For those of you not yet customers (and why on earth not?!), its title reflects our belief that technically speaking, there’s not an enormous gulf between what the government intelligence folks have been doing to gather and understand text for years and the opportunities that text analysis can now address in all sorts of other, non-government markets.
In fact if we’d had room, we could have gone on to add ‘voice of the plaintiff, voice of the defendant, voice of the mechanic, voice of of the researcher, voice of the doctor,’ and so on.
But that wouldn’t have made for such a snappy title now, would it?
June 20th, 2008 — Archiving, Search, Text analysis
You’ve had Nick’s take, now here’s mine, with a little overlap – great minds think alike, right? 😉 We were not expecting the 40 attendees for the pre-conference workshops during prime Sunday TV viewing time. Seth Grimes laid out “Text Analytics for Dummies,” while Nick gave a market overview. But the attendance (and the long Q&A sessions) were good indicators of user enthusiasm and the desire for real, practicable advice about the field.
Some of the other memorable moments:
- Best of the vendor panel: Seth Grimes’s challenge to say something nice about a fellow vendor’s offerings. And the vendors’ response to an audience question about incorporating UIMA, which was uniformly that it wasn’t necessary or in demand.
- The Facebook presentation on trend-tracking through users’ “Wall” posts was brought back for an encore by popular demand. The crowd in my session was a little confrontational about the amount of analysis being done on the available information (never enough!), but as far as quick and dirty zeitgeist goes, it was unbeatable, and a lot of fun.
- The Clarabridge 1-hour deployment was good sport, with at least one customer’s testimony that once the system is learned, it can actually be configured with speed approaching that of CTO Justin Langseth. You have to hand it to Clarabridge: they make it look easy.
Some thoughts on the users’ takes:
- In presentations and in private chats, frequently recurring themes among vendors was eDiscovery and social media – some of the drivers for the market. The user questions I heard were mostly about sentiment analysis, deployment time and ROI. Specifically, information on how to judge all of the offerings – is sentiment analysis accurate enough? What is the expected deployment time, what is the ROI?
- Precision and recall went back and forth again, but the hard truth is that the edge depends on the application. For patents or PubMed searches or eDiscovery, you need recall. For other applications, precision is paramount. Some users I spoke with mistook this as a lack of accuracy – it’s more of a sliding scale of usefulness.
- Accuracy was a recurring issue, both because text analytics is an emerging technology, and, of course, text is messy and imprecise. Partly it’s a matter of maturation. But the “fast / cheap / or good – pick any two” truism about software development is equally true here. Even with built in taxonomies and dictionaries or domain-specific knowledge, any text analytics software needs configuration to increase accuracy for its application and user, which takes time.
- “Win fast and win often” – great words from Tony Bodoh of Gaylord Hotels, on the user panel. Because of the financial investment, the fact that text analysis software can automate (obsolete) some employee work, the time it takes to configure, and general resistance to change, it is important to gain both executive and user buy-in early in the process. Chris Jones of Intuit echoed the sentiment, adding that it’s not advisable to go after your largest (and most time-consuming) problem first – come up with a number of smaller successes to prove the concept to users and higher-ups. Incidentally, both of these are Clarabridge users.
- Jones also noted that one of his “lessons learned” was to avoid over-configuring or too much tinkering with the analytics. He advised after a prudent amount of configuration to treat it more or less like a black box, and not worry about what is going on under the hood, just let it do its job and leave it to the professionals.
- Some more wisdom from the user panel: you can’t go into a text analytics deployment expecting quantifiable ROI. “You don’t know what you don’t know” – which is what the tool is there to solve. In many cases, the real potential isn’t obvious until you can see how it works with your business. At that point it’s possible to come up with applications that not even its creators could have thought up.
- Lastly (and this is not a new sentiment, but it meant more coming from school Superintendent Chris Bowman, who looked like he had my parents on speed-dial): the text analytics field is emerging, and will become integrated with larger applications. This will eventually render a conference like this obsolete, but it also means a great chance to get a leg up as an early adopter.
Looking forward to next year!
June 20th, 2008 — Text analysis
Overall, the conference was very interesting and well worth attending, from our perspective. Good attendance – 190 versus 140 last year – and a better mix of users and vendors, though not clear how many of those users came because of vendor sponsorships. Nevertheless it made for better discussions rather than vendors talking to themselves.
- There wasn’t a lot to set one vendor apart from another. A lot of the vendor presentations were quite similar. This was summed up by a presentation from Ernest & Young at the end of Day 2, where ostensibly there as an Autonomy customer, the presentation actually showed tools from Megaputer and Seagate’s Metalincs, as well as Autonomy and he said ‘any of those guys next door [in the vendor exhibit space] could do this.” Quite.
- SaaS will be one area where vendors can differentiate themselves from the pack, at least in the dshort term – Clarabridge and Attensity are leading the way there.
- The thing most people wanted to know about was sentiment analysis. There’s a lot of vendors out there as we’ve noted before and a fair amount of confusion on the part of prospective users as to what it is, why it’s useful and what it might do for them. There will definitely be vendor consolidation over the next year in this space
- The pre-conference workshop where I presented my overview of the vendor landscape was much more fun than I’d anticipated – I think my vendor overview went well – thought I wasn’t clear in what way it was a workshop, really just a couple of presentations from Seth Grimes and myself with questions afterwards, although we did some great follow-up in the bar, I guess that counts towards it! We had about 40 people there and over the next two days I met a few who said they didn’t show up because it was Father’s Day (word to the wise when organizing conferences ;)). For anyone waiting for my slides, I will send them to you, just drop me a line.
- Everyone wanted to know what Facebok was doing and although it was interesting, I found it a little underwhelming, even if it was quite fun. Still they clearly have some very smart people there. What a corpus on which to experiment – the writing on everyone’s walls. I’m sure they’ll come up with more interesting applications of text analysis over time; indeed the presenter acknowldted that things like term disambiguation and sentiment analysis are on the roadmap, which is where things will ge interesting.
- The EU government intelligence market may prove lucrative for some of these vendors, we intend to investigate that further ourselves.
- As with any relatively obscure technology area, those implementing text analysis need to get some quick wins under their belt rather than go for the hardest problem first – Gaylord Hotels and Intuit (both Clarabridge customers, Clarabridge was the main sponsor) both emphasized that, as did others.
- There was very little talk of semantic technologies, despite my best efforts to drum some up. I think that will change as text analysis and semantic tech are much more closely related than the players therein seem to want to admit.
- There was perhaps too much content, a lot of presentations which fed the problem I mentioned earlier, of many vendors sounding very similar to one another.
- There was not enough to be heard from the large vendors that have built or bought their way into this market – notably SAP-Business Objects (they was one person there from the Inxight team, but not presenting); SAS Insitiutite had a lot of people on the attendee list but most of them didn’t show for some reason, and although IBM had one presentation, I would have liked to have seen more. Microsoft’s presence was Matthew Hurst, who is clearly thinking pretty far ahead in terms of social media analysis and got a lot of people’s attention, including mine.
I’ll definitely be back next year.
June 2nd, 2008 — Text analysis
I will be speaking twice at the forthcoming Text Analytics Summit in Boston June 15-17.
I am presenting a market overview of all the text analytics vendors on the night before the conference proper starts. Yes, that’ll be a Sunday, at 7pm [UPDATE: it’s been moved to 6pm] for those of you at a loss for something to do in Boston’s Back Bay area on a Sunday evening! I’m hoping there will be a lot of interesting customers in the audience to critique my analysis and provoke some discussion. I’ll be covering a dozen or so vendors, following on immediately from a presentation given by conference chair Seth Grimes, that aims to answer the ‘what can text anlaytics do for me?’ type question.
And on the afternoon of the first day (June 16) I’m one of five – count ’em! – analysts on a panel moderated by Olivier Jouve of SPSS. The others are Sue Feldman of IDC, Fern Halper of Hurwitz, Lyndsay Wise of Wise Analytics and Seth, again. It should be interesting and well try and say something original, despite there being five of us all talking abot rouhgly the same subject!
It’s the fourth year of the conference (this’ll be my second year, I did the anlayst panel last year) and it’s the main US gathering of those selling, using (or looking to use) or investing in text anlaytics. If you read this and attend the conference, please introduce yourself, I’d love to meet our readers
May 21st, 2008 — Text analysis
This blog’s title alludes to the well documented problem of people not being able to find things, mainly within an enterprise environment. Semantic technology has the potential to change that. And having just spent a day and a half at the Semantic Technology conference in San Jose (it goes on for another three days), I can verify that there’s plenty of people who think that it’s on the verge of doing so.
The promise of semantic technology, as Jeff Pollock of Oracle put it at the end of his presentation, is that “finding stuff just got easier.”
I spoke to a lot of people and will be talking to numerous vendors, customers and investors in the coming months, but here’s my initial take:
- Semantic technology will succeed if it’s led out by the consumer market, followed by the enterprise. This is despite the widespread skepticism about the Semantic Web, you know, the bit about it being an untenable, top-down approach to apply meaning to web pages, which I think is itself misunderstood. There’s a couple of major attempts at bring sem tech to the consumer right now: Powerset, a web search engine that launch this month, initiall searching Wikipedia articles and Twine, a social network based on shared interests built by Radar Networks, which will launch fully in the fall, but is in private beta now. Should either of those succeed in terms of usage and Web-scalability, VC funding would follow if proven in a web-scale consumer environment.
- Semantic technology isn’t a market, it’s an enabling technology.Pollock asserted that and I’d agree with him, it’s much like text analysis in that regard (see below).
- Standards are baked – RDF/OWL/XML. There may be some fiddling around the edges, but judging by the number of standards bodies endorsing or adopting those two (OASIS, ISO, W3C and OMG), the job seems to be largely done for now.
- Sem tech vendors and users need to understand that text analysis using NLP or statistical methods isn’t the enemy here, and if you can fix search rather than scoff at it, you might have a winner. I saw too much berating of Google as ‘not getting it’ and text analysis as being ‘shallow’ for my liking.
- Finally, 1,000 people is a lot of people to attract to a conference about semantic technology. It was 600 last year and although I wasn’t there last year, those that were told me it was the first year that people started to move beyond the theoretical (semantic technology has the potential to do this!) to the actual. And given that there were a very large number of European there, a similar conference on that continent would seem to make sense to me – attendees I spoke with didn’t know of such a thing, but if you do, please let me know in the comments.
April 30th, 2008 — Collaboration, Content management, Text analysis
This blog post led us to GroupSwim, a company we met with the other day. I found GroupSwim to be a particularly interesting example of the value text analysis can lend to content management, something Nick wrote about the other day.
GroupSwim isn’t selling content management software in the classic sense. It’s SaaS offering is for collaboration, either for internal teams or externally-facing communities. It actually reminds me most of Koral, which Salesforce.com acquired a year ago and has since become Salesforce Content.
There’s a bit more meat to what GroupSwim offers though as it uses natural language processing to recommend tags, auto tag content added to this system and recommend related content. We spoke to an early GroupSwim customer yesterday who just raved about the system’s ability to auto-categorize emails and other docs, making it easier to get content into the system in an organized way and to find content on particular topic or customer account (this customer is using the service as a collab tool for sales and marketing).
Applying this sort of text analysis in a group collaboration / social software tool isn’t something I’ve heard much about lately. It will be this sort of thing that will differentiate vendors from the increasingly large pack moving forward. GroupSwim is still tiny and with its service not generally available until this past December, it’s perhaps a little late to this party. It will need to ramp up its own sales and marketing efforts significantly — 451 group clients can expect a full write up on GroupSwim in the coming days.