The Data Day, Today: Apr 25 2012

Splunk soars on IPO. VMware acquires Cetas. Vertica retain autonomy. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* For 451 Research clients

# Splunk IPO: $3bn and counting M&A Insight

# VMware snaps up Cetas Software for ‘big data’ analytics Deal Analysis

# HP’s Vertica retains its autonomy, continues integration with Autonomy Impact Report

# SAP makes long-awaited predictive analytics move of its own Impact Report

# Sanbolic pitches data management platform for server, desktop and database consolidation Impact Report

* Splunk IPO kills, lives up to expectations

* VMware acquires Cetas Software for Cloud and Big Data Analytics

* Opera Solutions Acquires Procurement Analytics Tools and Services from BIQ and Lexington Analytics

* Terascala Announces $14M Series B Funding Round Led by Strategic Partner Consortium

* Ravel Acquired by W2O Group To Expand Big Data Client Services And Enrich In-House Analytics and Insights Technology

* Teradata Active Data Warehouses Provide Private Cloud Benefits

* Pentaho Introduces New Interactive Visualization and Expanded Big Data Analytics

* Teradata Unveils New Purpose-Built Appliance for SAS High-Performance Analytics

* SAP Establishes Global Managing Board to Lead Company

* Oracle to Hadoop Under OneAppliance: GridIron Introduces First All-Flash Appliance Line With Unprecedented Performance to Tackle Unified Big Data Processing

* Lucid Imagination Technology Integration with SugarCRM Lets Customers Enjoy Improved Global Search Capabilities with Apache Lucene/Solr

* The Apache Software Foundation Announces Apache Cassandra v1.1

* Miso project: how it will help you make your own Guardian-style infographics and data visualisations

And that’s the Data Day, today.

The Data Day, Today: Apr 2 2012

Basho launches cloud storage play. Opera acquisitions. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Basho Unveils Riak CS, Multi-Tenant Cloud Storage Software for Public and Private Clouds

* InsightsOne Secures $4.3 Million in Series A Round of Funding Led by Norwest Venture Partners

* Opera buys Commendo to create predictive analytics powerhouse

* Opera Solutions Increases Procurement Capabilities with Acquisition of Lexington Analytics

* How federal money will spur a new breed of big data

* Another HP org change Vertica no longer under the purview of Autonomy boss Mike Lynch?

* New SAS Visual Analytics Helps Organizations Analyze, Visualize Big Data

* Citrusleaf Delivers Real-Time NoSQL Replication

* NuoDB Launches Open Source Initiative on Github

* Actian Teams up With FlyingBinary and Tableau to Unleash Big Data Potential

* DH2i Launches and Unveils DxConsole Next Generation Virtualization Solution to Enable the Agile, Always-On Enterprise

* Acunu Analytics Ready to Preview!

* SAND Technology Announces Second Quarter Results for Fiscal Year 2012

* Idera Announces VMware Database Performance Monitoring Solution

* Idera Announces SQL Compliance Manager 3.6

* WalmartLabs is building big data tools — and will then open source them

* The three waves of opportunities in big data

* 4 Big Data Myths – Part I

* For 451 Research clients

# Drawn to Scale raises funds for Hadoop-based real-time database Impact report

# ParElastic brings elastic parallelism to relational databases Impact report

# DH2i launches with PolyServe-inspired database-virtualization software Impact report

# Tape industry pins future on ‘big data,’ active archiving and LTFS Spotlight report

# Lucid Imagination dreams up new strategy for enterprise search Market development report

# Pentaho identifies ‘big data’ analytics as investment priority, hooks into DataStax Market development report

# GridGain positions in-memory data grid for real-time analytics Market development report

# Having earned its stripes in HPC, Panasas heads for ‘big data’ Market development report

* Google News Search outlier of the day: Top 10 Dog and Cat Medical Conditions of 2011

And that’s the Data Day, today.

Search by another name: enterprise search starts to mature into ‘application era’

Customers of The 451 Group would have seen my report on the enterprise search market published September 15. If you are a client, you can view it here. I thought it would be useful to provide a condensed version of the report to a wider audience as I think the market is at an important point it in its development and it merits a broader discussion.

The enterprise search market is morphing before our eyes into something new. Portions of it are disappearing, and others are moving into adjacent markets, but a core part of it will remain intact. A few key factors have caused this, we think. Some are historical, by which we mean they had their largest effect in the past, but the ongoing effect is still being felt, whereas the contemporary factors are the ones that we think are having their largest impact now, and will continue to do so in the short-term future (12-18 months).

Historical factors

  • Over-promising and under-delivery of intranet search between the last two US recessions, roughly between 2002 and 2007, resulting in a lot of failed projects.
  • A lack of market awareness and understanding of the value and risk inherent in unstructured data.
  • The entrance of Google into the market in 2002.
  • The lack of vision by certain closely related players in enterprise content management (ECM) and business intelligence (BI).

Contemporary factors

  • The lack of a clear value proposition for enterprise search.
  • The rise of open source, in particular Apache Lucene/Solr.
  • The emergence of big data, or total data.
  • The social media explosion.
  • The rapid spread of SharePoint.
  • The acquisitive growth of Autonomy Corp.
  • Acquisition of fast-growing players by major software vendors, notably Dassault Systemes, Hewlett-Packard and Microsoft.

The result of all this has been a split into roughly four markets, which we refer to as low-end, midmarket, OEM and high-end search-based applications.

Entry-level search

The low-end, or entry-level, enterprise search market has become, if not commodified, then pretty close to it. It is dominated by Google and open source. Other commercial vendors that once played in it have mostly left the market.

The result is that potential entry-level enterprise search customers are left with a dichotomy of choices: Google’s yellow search appliances that have two-year-term licenses and somewhat limited configurability (but are truly plug-and-play options) on the one hand, and open source on the other. It is a closed versus a very open box, and they have different and equally enthusiastic customer bases. Google is a very popular department-level choice, often purchased by line-of-business knowledge workers frustrated at obsolete and over-engineered search engines. Open source is, of course, popular with those that want to configure their search engine themselves or have a service provider do it and, thus, have a lot of control over how the engine works, as well as the results it delivers. Apache Lucene is also part of many commercial, high-end enterprise search products, including those of IBM.

Midmarket search

Mid-market search is a somewhat vague area, where vendors are succeeding in deals of roughly $75,000-250,000 selling intranet search. This area has thinned out as some vendors have tried to move upmarket into the world of search-based applications, but there are still many vendors making a decent living here. However, SharePoint has had a major effect on this part of the market, and if enterprises already have SharePoint – and Microsoft reckons more than 70% have at least bought a license at some point already – then it can be tough to offer a viable alternative. However, if SharePoint isn’t the main focus, then there is still a decent business to be had offering effective enterprise search, often in specific verticals, albeit without a huge amount of vertical customization.

OEM

The OEM search business has become a lot more interesting recently, in part due to which vendors have left it, leaving space for others. Microsoft’s acquisition of FAST in early 2008 meant one of the two major vendors at the time had essentially left the market entirely, since its focus moved almost entirely to SharePoint, as we recently documented. The other major OEM vendor at the time was Autonomy, and while it would still consider itself to be so, we think much of its OEM business, in fact, comes from document filters, rather than the OEMing of the IDOL search engine. Autonomy would strongly dispute that, but it might be moot soon anyway – it now looks as if it will end up as part of Hewlett-Packard following the announcement of its acquisition at a huge valuation, on August 18.

Those exits have left room for the rise of other vendors in the space. Key markets here include archiving, data-loss prevention and e-discovery. Many tools in these areas have old or quite basic search and text analysis functionality embedded in them, and vendors are looking for more powerful alternatives.

Search-based applications

The high end of the enterprise search market has become, in effect, the market for search-based applications (SBA) – that is, applications that are built on top of a search engine, rather than solely a relational database (although they often work alongside a database). These were touted back in the early 2000s by FAST, but it was too early, and FAST was too complex a set of tools to give the notion widespread acceptance. But in the latter part of the last decade and this one, SBAs have emerged as an answer to the problem of generic intranet search engines getting short shrift from users dissatisfied that the search engines don’t deliver what they want, when they want it.

Until recently, SBAs have mainly been a case of the vendors and their implementation partners building one-off custom applications for customers. But they are now moving to the stage where out-of-the-box user interfaces are being supplied for common tasks. In other words, it’s maturing in a similar way to the application software industry 20 years ago, which was built on top of the explosion in the use of relational databases.

We’ve seen examples in manufacturing, banking and customer service, and one of the key characteristics of SBAs is their ability to combine structured and unstructured data together in a single interface. That was also the goal of earlier efforts to combine search with business-intelligence tools, which often simply took the form of adding a search engine to a BI tool. That was too simplistic, and the idea didn’t really take off, in part because search vendors hadn’t paid enough attention to structure data.

But SBAs, which put much more focus on the indexing process than earlier efforts, appear to be gaining traction. If we were to get to the situation where search indexes are considered a better way of manipulating disparate data types than relational databases, that would be a major shift (see big data). Another key element of successful SBAs is that they don’t look like traditional search engines, with a large amount of white space and a search bar in the middle of the screen. Rather, they make use of facets and other navigation techniques to guide users through information, or often simply to present the relevant information to them.

As I mentioned, there’s more in the full report, including more about specific vendors, total (or big) data and the impact of social media. If you’d like to know more about it, please get in touch with me.

ILTA 2011 report: Autonomy taking HP to the e-Discovery cleaners?

Not surprisingly, the biggest topic of conversation at the International Legal Technology Association (ILTA) 2011 convention in Nashville is last week’s announcement by Hewlett-Packard (HP) that it was acquiring Autonomy for $11.8bn. The most common reaction–in addition to the rush out the door to buy HP’s now discontinued TouchPad for 99 bucks–was surprise at the healthy purchase price.  Although some ILTA attendees saw how the deal might make sense logistically, virtually no one thought the deal made any sense at all with such a high price tag for Autonomy.

Cloud computing–and law firms’ reluctant move toward it–is another big topic, but another trend that seems to be developing as the e-discovery industry matures is its move away from law firms. Many vendors are reporting that five years ago, their businesses were 70 percent or more in law firms, with the remaining 30 percent or less of the business with corporate clients. Vendors now report that those ratios have flipped, with corporate clients now making up the vast majority of business.

Although the e-discovery market may be shifting away from law firms, at least one vendor hasn’t forgotten them.  Exterro has announced at ILTA the launch of Fusion LawFirm. As the name implies, the new application is a version of Exterro’s Fusion platform designed especially for law firms.

Other vendors meeting with The 451 Group at ILTA to brief us on their product launches and other announcements are:

  • AccessData, which is launching its new early case assessment application, AD ECA
  • kCura and Nexidia, who announced their alliance where Nexidia’s audio and voice recognition application will be integrated into kCura’s Relativity platform
  • LexisNexis Applied Discovery, which made an ILTA announcement of its new partnership with Equivio to add predictive coding to its platform
  • LexisNexis LAW PreDiscovery with the launch of its new early case assessment (ECA) application, Early Data Analyzer
  • Nuix, which announced a new version of its platform last month
  • Orange Legal Technologies, which did an ILTA launch of PurpleBox, its new collection and ECA tool
  • Recommind, which discussed its predictive coding patent, and may have hosted ILTA’s best party at Nashville’s Country Music Hall of Fame
  • Wave Software, which announced a new version of its Trident e-mail processing application.

Quick HP-Autonomy thoughts

Just after the HP call about its Q3 numbers and the deal, here’s my initial (very) quick take as it’s late here in London:

  • This deal is about getting serious about software under Leo Apotheker. It gives HP a real information management story, greatly boosting its presence in the archiving, e-Discovery and enterprise search businesses.
  • However, company cultures are not complementary, the HP way is a long way from the hyper-aggressive sales and marketing culture at Autonomy. Maintaining Autonomy as a separate entity run by Mike Lynch proves this and calls into question how much real synergy can be had from such a structure. I cannot see that being sustained.
  • This instantly makes HP a bigger e-Discovery player than IBM or any of the major IT firms.
  • Product overlap exists in document and records management but gets HP into the web content management and website optimization markets.
  • Autonomy has resisted deals over the years as its market capitalization ballooned as it went on its own acquisition binge. Autonomy couldn’t have waited much longer as it would have grown too big to be swallowed by even the largest predator.
  • At least Autonomy customers will now have a services organization to call on after they’ve bought the software. Customer support and after sales service has not been a strength of Autonomy.
  • This leaves the FTSE 100 with just one software firm of note.

Iron Mountain & Autonomy – between a rock and a hard place?

Two companies central to our coverage of information management are having their own particular – and distinct – issues with shareholders and equity analysts.

Autonomy has been having its run-ins with London’s equity analysts for some time. Not all of them, but a core and increasingly vocal group of them. Generally they regularly question a few things: how the company calculates organic growth of its core IDOL business; cash conversion; and why it hasn’t bought a company after saying it would do and raising £500m of convertible debt to help it do so, back in February 2010. We’re also weighed in on some of these issues.

Autonomy regularly takes on these doubters on its quarterly calls and also does the same during the quarter on its website, which is at least a refreshing change from companies that stay completely mute on such matters. However the answers are often very simplistic. In a post dated March 30, 2011 entitled, “How should we think about Autonomy’s penetration of its end markets, when we attempt to evaluate the opportunity for growth?” that most of the world’s top software companies OEM IDOL and thus are “building their future products with IDOL deeply embedded and paying Autonomy a royalty.” Are they? Autonomy doesn’t distinguish between its two main OEM product when it announces OEM deals, but there’s a big difference between OEMing IDOL and OEMing its document filters. And as we have discussed before we think a lot of the OEM deals are for the latter, rather than for IDOL itself, although we have no way of proving that, except to say that we speak regularly to these leading software vendors and they don’t appear to be using IDOL as their core search and classification engine nearly as widely as Autonomy claims. Ironically given what Autonomy does for a living, a fair bit of the to and fro on the site is semantic-related, e.g. discussion of what “early spring” or “Winter with snowdrops” scenarios mean in terms of the guidance given by the company to analysts. All will no doubt become clearer when it announces its Q1 results, due Thursday April 28.

Over at Iron Mountain, some dissident shareholders have been putting pressure on the company to take on board its slate of directors and eventually turn itself into a Real Estate Investment Trust (REIT), mainly for its beneficial tax status. We cover what used to be called the digital business – the back up and recovery, e-Discovery, archiving and other software that’s mostly been added via acquisitions over the past few years. But that doesn’t seem to hold any attraction to hedge fund Elliott Management, which owns just less than 5%. It was the company that put forward the slate of directors and advised the company to turn itself into a REIT and in general to focus on its core – non-digital – business. Elliott and even larger shareholder Davis Advisors (it owns a shade less than 20% of the outstanding shares) were annoyed when the company dropped a poison pill on March 23 to guard against a takeover. This week Elliott laid out its grievances in another letter to the board, urging it to reverse the poison pill and generally sit up and take notice of what it has to say.

It’s hard to tel where this will end, but it has already caused disruption to Iron Mountain’s business at a time when it is trying to get some of its digital units – notably e-Discovery – back in track after a very tough 2010. We’ll know if it’s had an effect on its Q1 performance when it announces its results, most likely int he last week of April. The shares, as is common with these sorts of investor challenges have enjoyed a strong run-up, and are currently at or around a 52-week high. The company’s annual shareholders meeting is coming up soon too. Although the date is not yet known, all shareholders on record as of April 12 will be allowed to vote at it. It could get quite lively.

Autonomy’s ‘will it, won’t it’ M&A dance

I knew that as soon as I wrote my updated look on which company Autonomy might buy next something would come along to diminish the value of all my hard work. 😉 I had expected it to be an acquisition, but instead Autonomy surprised me yesterday afternoon with an announcement that after being in talks regarding an acquisition for “several months,”

Recent developments within these talks have given rise to an additional opportunity that warrants further examination which could give rise to an acquisition process that exceeds our original planned time scale.

In other words, it’s not going to happen any time soon.

As an announcement it’s a bit opaque and strangely structured, coming as the second part of an announcement that also told announced its Capital Markets Day would be on Monday, November 29.

It could mean:

  • Autonomy is attempting to buy part of a business and instead is now looking to expand into another part or perhaps all of the business. For that to be the case ther first part must have been quite a small deal as Autonomy’s kitty is limited to about $1bn in total.
  • Autonomy has identified a different company that it wants to buy.
  • Autonomy itself has had an offer to be acquired.

This is all of course speculation. But one thing’s a bit more certain: it will probably make a dent in the company’s Q4 results, which were expected to include a boost from an acquisition that now doesn’t look like happening. Whether that means the company will miss the revised expectations it set in the Q3 call, we’ll have to wait and see.

In our report we wondered whether the company’s health care announcement, including a new product called Auminence may be used as some sort of alternative kicker in the quarter, in lieu of an acquisition. As we said, the product came out of the blue, as do many Autonomy products and appears to us to be a repackaging of IDOL with some added diagnosis checklists on top.

The shares fell 6% yesterday after the announcement came out, all of that coming late in the day as the announcement was made at 3.51pm London time. Again, that’s slightly strange timing. The shares were creeping back up this morning.

UPDATE: Autonomy clarified its statement with this Q&A:

Would it be right to interpret from today’s announcement regarding the acquisition timetable (Update on Acquisition) that the deal you are negotiating has got larger?

No, the deal remains the same size. The statement clearly refers to the timescale.

So it’s the same company, it will just take longer, which makes most sense, I guess.

Sizing and analyzing the cloud-based archiving market

The cloud archiving market will generate around $193m in revenues in 2010, growing at a CAGR of 36% to reach $664m by 2014.

This is a key finding from a new 451 report published this week, which offers an in-depth analysis of the growing opportunity around how the cloud is being utilized to meet enterprise data retention requirements.

As well as sizing the market, the 50-page report – Cloud Archiving; A New Model for Enterprise Data Retention – details market evolution, adoption drivers and benefits, plus potential drawbacks and risks.

These issues are examined in more detail via five case studies offering real world experiences of organizations that have embraced the cloud for archiving purposes. The report also offers a comprehensive overview of the key players from a supplier perspective, with detailed profiles of cloud archive service providers, with discussion of related enabling technologies that will act as a catalyst for adoption, as well as expected future market developments.

Profiled suppliers include:

  • Autonomy
  • Dell
  • Global Relay
  • Google
  • i365
  • Iron Mountain
  • LiveOffice
  • Microsoft
  • Mimecast
  • Nirvanix
  • Proofpoint
  • SMARSH
  • Sonian
  • Zetta

Why a dedicated report on archiving in the cloud, you may ask? It’s a fair question, and one that we encountered internally, since archiving aging data is hardly the most dynamic-sounding application for the cloud.

However, we believe cloud archiving is an important market for a couple of reasons.  First, archiving is a relatively low-risk way of leveraging cloud economics for data storage and retention, and is less affected by the performance/latency limitation that have stymied enterprise adoption of other cloud-storage applications, such as online backup. For this reason, the market is already big enough in revenue terms to sustain a good number of suppliers; a broad spectrum that spans from Internet/IT giants to tiny, VC-backed startups. It is also set to experience continued healthy growth in the coming years as adoption extends from niche, highly regulated markets (such as financial services) to more mainstream organizations. This will pull additional suppliers – including some large players — into the market through a combination of organic development and acquisition.

Second, archiving is establishing itself as a crucial ‘gateway’ application for the cloud that could encourage organizations to embrace the cloud for other IT processes. Though it is still clearly early days, innovative suppliers are looking at ways in which data stored in an archive can be leveraged in other valuable ways.

All of these issues, and more, are examined in much more detail in the report, which is available to CloudScape subscribers here and Information Management subscribers here. An executive summary and table of contents (PDF) can be found here.

Finally, the report should act as an excellent primer for those interested in knowing more about how the cloud can be leveraged to help support ediscovery processes; this will be covered in much more detail in another report to be published soon by Katey Wood.

Webinars & public speaking in next few weeks

Katey and I are doing a few webinars at the moment and I’m also speaking at a conference this week, so I just wanted to round them all up here:

One webinar is already in the bag, which Katey did with Digital Reef & legal service provider Precise-Law, entitled ‘The challenges of a  reactive vs. proactive EDRM in the Enterprise.’ A replay is available here.

I’m speaking at Search Solutions 2010 this week  on Oct 21. It’s a one-day event organized by the British Computer Society, which I attended last year as a non-speaker and it was very good, so I hope to be able to contribute to maintaining that high standard! I’m speaking at 11.45 am on ‘The trends shaping the future of enterprise search 2010-2013’ and then I’m participating on a panel at the end of the day on what search will look like in 2015. As I’m already making predictions through 2013, I’m three-fifths of the way there! Oct 21 is also the day of Autonomy’s Q3 results call so the place should be full of lively discussion regarding that.

Come November I’m doing a couple more webinars:

On Nov 11 I’m participating on one with Zylab, the focus of which will be litigation-readiness, moving beyond just eDiscovery to insuring organizations have their information in a state such that it can be easily searched, accessed, locked down, deleted or produced to an opposing party.

Also in November I’ll be participating in a webinar with Attensity Group, which will be focused on social media and the application of text analytics to that space. Date TBC and links to follow, most likely on my Twitter feed.

Document filters as a search proxy war

Document filters. There’s a phrase to conjure up excitement in any technologist eh? No? Didn’t think so. But look more carefully at what is going on and it does get more interesting, trust me.

I was moved to expand in this by Isys Search Software’s recent attempt at guerilla marketing at Oracle Open World which it tweeted about here:

isyssearch: ISYS goes guerrilla; kicked out of Oracle Open World party after projecting our branding on the Metreon http://tinyurl.com/272fync #oow10

Quite apart from what it says about Isys and how much it’s changed in the last two years – a bit like the nerdy guy in the playground trying to act tough – it shows how important some people – including me – think these filters have become.

There are two main companies selling products that enable the opening and viewing of myriad file formats (400 is a common number cited by both the vendors and their customers). So when a search engine comes across a Word 1997 or even something like  Wordstar 4 file, how does it open it? Usually using one of two products: Oracle’s OutsideIn or Autonomy’s IDOL KeyView.

Both products came to these companies via acquisitions: Autonomy buying Verity in November 2005 and Oracle buying Stellent in 2007, (and Stellent, as it wasn’t known then, buying Inso in 2000). It’s also interesting to note that Isys still refers to them as Inso in its marketing even though the product has been called something else for years.

Like all OEM technology, these filters aren’t easily ripped out and replaced. And that’s what these two vendors like about them. It gives them a a foot in the door at software companies that they can try to expand upon, and quite often they do. The temptation of course is to use the difficulty to remove them as a point of leverage to crank up prices.

And that’s what we’re hearing Autonomy is doing from a number of vendors. We haven’t heard anything similar regarding Oracle, it should be noted. Autonomy has a reasonably significant OEM technology stream and as we have mentioned previously Autonomy regularly brags about its OEM wins, without specifying whether its KeyView or the full IDOL engine being OEMd. Incidentally after that earlier post Autonomy contacted us to say that KeyView isn’t the result of the acquisition of Verity and all it bought was the name. That’s despite what was said at the time, including its own press release shortly after the acquisition bragging about its features. But then Autonomy’s marketing these days increasingly requires a willing suspension of disbelief.

Isys has had this technology for a while but never sold it separately. But now it is finding quite a bit of success among software vendors nervous about having a key piece of technology owned by Autonomy or Oracle because they’re often search and/or content management companies; two markets in which both companies play. dtSearch, another veteran OEM provider also provides similar filters.

So for the first time in a long time, ISVs have a choice beyond the main two in filters and in their close relatives, connectors, the software to connect search engines to databases, content management systems and other repositories. In the often incestuous world of information management software, where vendors both compete and sell to one another, these have become points of leverage that customers may not notice in terms of functionality, but they certainly do in terms of the price they have to pay for their software.