Entries Tagged 'Search' ↓

Moving to London

I’ll be relocating from New York City to London as of August 1, but continuing in more or less in my same role here at The 451 Group.

After more than a dozen years in Manhattan, I’m moving back to the old country (those of you that know me will know I more or less managed to retain my accent) and as such, I’ll obviously be closer to the European scene. I hope to unearth more European vendors and customers than we’re already covering, although we will continue to actively cover all the US vendors we already follow and continue to dig for more there as well.

I expect to hear more about semantic technology than I do in the US (although that’s already quite a lot) and I expect the drivers for using text technologies of various types to be slightly different in Europe.

August will be a bit hectic sorting things out, but I’ll be fully up to speed by the start of September, that’s for sure.

The only contact that will change is my office number, which I’ll be sending around to key contacts.

Katey Wood, my research associate remains in the 451 Group’s New York City HQ, helping me cover this fascinating market.

E-discovery discovery

We’ve been covering the e-discovery big guns and usual suspects here at The 451 Group in one way or another for about five years now. But we’re looking to get more systematic about it in part in preparation for a long-form market overview of this sector to come this fall. There are certainly no shortage of vendors targeting this market, as anyone attending the LegalTech conference this year would tell you.

We currently have several analysts looking at this market from different angles: Nick and Katey cover the search and text analytics vendors, Simon and Henry keep track of storage and archiving, and Kathleen looks after records management and content management aspects.

But with this approach, we wonder who we’re missing. Where are the up-and-comers? Are there any start-ups or new emerging companies you’ve had your eye on? Let us know in the comments or via email so we can make sure our e-discovery coverage is more comprehensive.

Microsoft-PowerSet

Quick thoughts on Microsoft-Powerset:

  • This is about scaling the original Powerset vision and by extension, the vision of the Xerox PARC engineers that developed the technology on which Powerset is built.

  • This isn’t an alternative to buying Yahoo – that’s overly-simplistic apples vs oranges stuff.

  • But this is about improving Live Search and as such, the Powerset guys probably have a greater chance of influencing Microsoft’s search direction than the semantic-technology focused people within Yahoo would have had within MicroHoo.

  • The semantic technology crowd just lost its poster child; will the next one please step forward?

I told you they were quick!. More tomorrow to 451 clients via TechDealmaker.

Text Analytics 2008 Redux

You’ve had Nick’s take, now here’s mine, with a little overlap – great minds think alike, right? 😉 We were not expecting the 40 attendees for the pre-conference workshops during prime Sunday TV viewing time. Seth Grimes laid out “Text Analytics for Dummies,” while Nick gave a market overview. But the attendance (and the long Q&A sessions) were good indicators of user enthusiasm and the desire for real, practicable advice about the field.

Some of the other memorable moments:

  • Best of the vendor panel: Seth Grimes’s challenge to say something nice about a fellow vendor’s offerings. And the vendors’ response to an audience question about incorporating UIMA, which was uniformly that it wasn’t necessary or in demand.
  • The Facebook presentation on trend-tracking through users’ “Wall” posts was brought back for an encore by popular demand. The crowd in my session was a little confrontational about the amount of analysis being done on the available information (never enough!), but as far as quick and dirty zeitgeist goes, it was unbeatable, and a lot of fun.
  • The Clarabridge 1-hour deployment was good sport, with at least one customer’s testimony that once the system is learned, it can actually be configured with speed approaching that of CTO Justin Langseth. You have to hand it to Clarabridge: they make it look easy.

Some thoughts on the users’ takes:

  • In presentations and in private chats, frequently recurring themes among vendors was eDiscovery and social media – some of the drivers for the market. The user questions I heard were mostly about sentiment analysis, deployment time and ROI. Specifically, information on how to judge all of the offerings – is sentiment analysis accurate enough? What is the expected deployment time, what is the ROI?
  • Precision and recall went back and forth again, but the hard truth is that the edge depends on the application. For patents or PubMed searches or eDiscovery, you need recall. For other applications, precision is paramount. Some users I spoke with mistook this as a lack of accuracy – it’s more of a sliding scale of usefulness.
  • Accuracy was a recurring issue, both because text analytics is an emerging technology, and, of course, text is messy and imprecise. Partly it’s a matter of maturation. But the “fast / cheap / or good – pick any two” truism about software development is equally true here. Even with built in taxonomies and dictionaries or domain-specific knowledge, any text analytics software needs configuration to increase accuracy for its application and user, which takes time.
  • “Win fast and win often” – great words from Tony Bodoh of Gaylord Hotels, on the user panel. Because of the financial investment, the fact that text analysis software can automate (obsolete) some employee work, the time it takes to configure, and general resistance to change, it is important to gain both executive and user buy-in early in the process. Chris Jones of Intuit echoed the sentiment, adding that it’s not advisable to go after your largest (and most time-consuming) problem first – come up with a number of smaller successes to prove the concept to users and higher-ups. Incidentally, both of these are Clarabridge users.
  • Jones also noted that one of his “lessons learned” was to avoid over-configuring or too much tinkering with the analytics. He advised after a prudent amount of configuration to treat it more or less like a black box, and not worry about what is going on under the hood, just let it do its job and leave it to the professionals.
  • Some more wisdom from the user panel: you can’t go into a text analytics deployment expecting quantifiable ROI. “You don’t know what you don’t know” – which is what the tool is there to solve. In many cases, the real potential isn’t obvious until you can see how it works with your business. At that point it’s possible to come up with applications that not even its creators could have thought up.
  • Lastly (and this is not a new sentiment, but it meant more coming from school Superintendent Chris Bowman, who looked like he had my parents on speed-dial): the text analytics field is emerging, and will become integrated with larger applications. This will eventually render a conference like this obsolete, but it also means a great chance to get a leg up as an early adopter.

Looking forward to next year!

Google’s enterprise search: in the cloud & in a box

Google has changed the name the scope of its Website search it offers to Website owners that want a little more than simply to know that their site is being indexed by Google, but don’t want to go as far as buying one of its blue or yellow search appliances. 451 clients can read what we thought of it here.

Google has three levels of Website search to offer organizations – completely free but with no control as to which parts of your website are indexed and when, known as Custom Search Edition/AdSense for Search (CSE/AFS); the newly rebranded Google Site Search; and  the Google search appliances, which it sells in Mini and Search Appliance form factors, which can be used both for external-facing Website search as well as intranet search.

Google stopped issuing customer numbers for its appliances in October 2007. The number of organizations it had sold to at that point was about 10,000 customers. I suspect that number is around 11,500 now, though I don’t have any great methodology to back that up, I’m just extrapolating from previously-issued growth figures. That’s an extraordinary amount of organizations with a Google box.

To give some perspective, Autonomy has ~17,000 customers now. But the vast majority came from Verity. When Autonomy bought Verity in November 2005, Verity had about 15,000 customers (and Autonomy had about 1,000). But Verity got about 8,000 of those customers via its acquisition of Cardiff Software in February 2004. So in about 2.5 years Autonomy has added about 1,000 customer, but of course has done of lot of up-selling to its base and doesn’t play in the low-cost search business anymore (mainly because of Google).

The actual number of Google appliances sold is higher of course as many organizations have multiple appliances. I’ll never forget 18 months or so ago standing in  a room of a top 3 Wall Street investment bank with its top ~25 technologists gathered in a room and seeing about 6 of them put up their hands when asked who has a Google appliance – most of those weren’t known about to their boss or to each other.

But Google appliance proliferation is commonplace in large organizations. The things are so cheap and so relatively easy to install they are bought often under the radar of IT . The problem comes when times get tough (as they are in investment banking IT, that’s for sure) the organization wants to ring more out of the assets it has – even if it didn’t know it had those assets until relatively recently.

That’s why we strongly expect Google to come out with some sort of management layer this year to handle this sort of unintended (by the customer that is) proliferation. Watch this space.

FAST-Stellent – what might have been

The combination of search, text analysis and content management is turning into one of the central memes of this blog. This wasn’t deliberate, although it’s something we’ve deliberated internally for a couple of years.

There were plenty of partnerships between search and content management vendors around, but they seemed to us to be either at the press release level, i.e. little more than marketing, or to be as a result of a small handful of one-off projects in the field.

But it turns out others within the industry were thinking about much deeper integrations even if they weren’t saying so publicly.

About a year after Stellent and FAST (both then independent, of course) announced a partnership that resulted in Stellent OEMing FAST’s engine, FAST seriously considered buying Stellent.

I’ve heard from a couple of reliable sources that this was discussed at the highest level within FAST, but it chose not to pursue the deal and instead decided to veer way off its core business and ending up distracting itself to such an extent it got itself tied up in knots. This ended with it being forced to incur about $55m in charges in 2007 that resulted in its share rice plummeting and thus ending up costing Microsoft a lot less than it would have done.

Incidentally, one of those sidebars – Ezmo – a music community site (presented to analysts in February 2007 as a “customer” of FAST, when in fact the phrase that should’ve been used was”‘wholly-owned subsidiary”) was shut down in March.

Of course Stellent went on to be acquired by Oracle in 2007 and we’ve been impressed by the way the database giant has integrated the company so far.

But FAST and Stellent could have made for an interesting combination of the ability to manage and analyze unstructured content, and who knows, FAST-Stellent might’ve been a force to be reckoned with? Now we look to see what Microsoft – something of a toe-dipper when it comes to content management and Oracle, armed with a pretty decent search engine do to prolong this meme.

Microsoft & FAST to exchange rings this week

Well, that’s one of those pesky search acquisitions sorted out anyway.

Microsoft and Fast Search & Transfer (FAST) will consummate (their words, not mine) the acquisition on Thursday (April 24) now that the conditions of the acquisition have been met, according to this. FAST has had the requisite number of shares tendered since February. The time since then has been spent clearing the regulatory hurdles.

I’d grown quite attached to those Oslo Stock Exchange announcements as they provided FAST-watchers like me with a a running commentary on FAST’s progress, listing each major customer win as they happened, along with a whole lot of other stuff, including last year’s major stumble.

The new chapter of Microsoft’s enterprise search business starts this week, which is good timing for us, as I’m speaking with them next week.

The 451 taxonomy

We’ll occasionally use this blog to discuss our own internal taxonomy work. We ask enough vendors if they eat their own dog food, as it were, so it’s only fair we turn the spotlight on ourselves occasionally.

We here at 451, like all industry analyst companies that publish research have our own issues with categorizing our reports and making them easy to find. We started in an ad-hoc fashion when we launched back in 2000 with eight broad sections and some basic metadata but without any real plan. We gradually added to the categories till we came to a point a few years ago when I realized unless we took a more coordinated approach to a taxonomy and categorizing reports across all our products and services we were heading for trouble.

So, with experience gleaned from talking to numerous vendors and users over the years I embarked on developing a single taxonomy for the whole company. Our IT team built our own taxonomy editor and another tool to sort out reconciliations between old and new taxonomies.

But until Kathleen joined in 2006 however it was still something of a side project, but she used her experience of doing a similar project at Giga to propel the project along and I’m pleased to be able to say we’re now satisfied that we have all the bases covered. However, we know a taxonomy is never finished and we are constantly making small revisions as the industry shifts.

How we use the taxonomy still varies from product to product, however. Our M&A KnowledgeBase has always used a taxonomy as the basis of helping customers find deals, and we have just ported that product over to the new taxonomy, meaning it has gone from 300 or so categories to more than 600 (and I know more is not always better, but you’ll have to trust me on this one, or better still sign up for a trial!). It’s a much more balanced representation of the tech, internet and telecoms industry than before.

This means our Market Insight Service, TechDealmaker and M&A KnowledgeBase are all using the same taxonomy, although only the last of those currently exposes it. We use additional themes to group our research in key areas, such as open source, enterprise security or our work with the European Union. We also use the taxonomy to drive internal tools to help us manage our coverage areas and output.

In future posts I’ll talk about specific elements of the taxonomy and also how we’re planning to roll it out across all our research, improve our search engine and overall make it easier for customers to find the research they need.

Our take on M&A in enterprise search

I’ve gathered all my current thinking on potential M&A in enterprise search in a SectorIQ that we published earlier this week to our customers. In it, I look at four main potential targets plus a few other small ones and look at a few of the likely acquirers. (This is the way we write all our Sector IQs, btw and they’re a great way of getting a quick grasp on what might be coming down the pike in any particular sector of the IT industry)

Fortunately those of you that are not our customers (yet!) are able to read it via our arrangement with the New York Times DealBook section. Click here to see the NY Times posting or go here to go straight to the report – and while you’re there, sign up for a trial of our M&A KnowledgeBase, where we’ve been collecting details of every IT, internet and telecoms deal since the start of 2002!

Finally, a quick word about the headline. We like to have some fun here at 451 with these things and while I appreciate that this one might have been pushing things a little in terms of clearly explaining what the report was about, when else would I be able to use it? 😉

Welcome to Too Much Information

Welcome to the new 451 Group blog about information management. What’s information management, you may ask?

It’s the confluence of a variety of strategies organization employ to get their arms and exploit the myriad sources of data and information at their disposal. Specifically this means 451’s coverage of the following areas:

  • Search
  • Collaboration
  • Content management
  • Text analysis
  • eDiscovery
  • Archiving
  • Storage
  • Databases (relational & otherwise)
  • Business intelligence
  • Master & metadata management

It is written mainly by Kathleen Reidy and myself, and both of us will be at the AIIM Expo this week in Boston where we will be taking the temperature of the content management market & talking with a bunch of vendors and end users.

More on that and Drupalcon this week.