Entries from August 2011 ↓

ILTA 2011 report: Autonomy taking HP to the e-Discovery cleaners?

Not surprisingly, the biggest topic of conversation at the International Legal Technology Association (ILTA) 2011 convention in Nashville is last week’s announcement by Hewlett-Packard (HP) that it was acquiring Autonomy for $11.8bn. The most common reaction–in addition to the rush out the door to buy HP’s now discontinued TouchPad for 99 bucks–was surprise at the healthy purchase price.  Although some ILTA attendees saw how the deal might make sense logistically, virtually no one thought the deal made any sense at all with such a high price tag for Autonomy.

Cloud computing–and law firms’ reluctant move toward it–is another big topic, but another trend that seems to be developing as the e-discovery industry matures is its move away from law firms. Many vendors are reporting that five years ago, their businesses were 70 percent or more in law firms, with the remaining 30 percent or less of the business with corporate clients. Vendors now report that those ratios have flipped, with corporate clients now making up the vast majority of business.

Although the e-discovery market may be shifting away from law firms, at least one vendor hasn’t forgotten them.  Exterro has announced at ILTA the launch of Fusion LawFirm. As the name implies, the new application is a version of Exterro’s Fusion platform designed especially for law firms.

Other vendors meeting with The 451 Group at ILTA to brief us on their product launches and other announcements are:

  • AccessData, which is launching its new early case assessment application, AD ECA
  • kCura and Nexidia, who announced their alliance where Nexidia’s audio and voice recognition application will be integrated into kCura’s Relativity platform
  • LexisNexis Applied Discovery, which made an ILTA announcement of its new partnership with Equivio to add predictive coding to its platform
  • LexisNexis LAW PreDiscovery with the launch of its new early case assessment (ECA) application, Early Data Analyzer
  • Nuix, which announced a new version of its platform last month
  • Orange Legal Technologies, which did an ILTA launch of PurpleBox, its new collection and ECA tool
  • Recommind, which discussed its predictive coding patent, and may have hosted ILTA’s best party at Nashville’s Country Music Hall of Fame
  • Wave Software, which announced a new version of its Trident e-mail processing application.

Red Hat considering NoSQL/Hadoop acquisition

Idle speculation over on our CAOS Theory blog.

Quick HP-Autonomy thoughts

Just after the HP call about its Q3 numbers and the deal, here’s my initial (very) quick take as it’s late here in London:

  • This deal is about getting serious about software under Leo Apotheker. It gives HP a real information management story, greatly boosting its presence in the archiving, e-Discovery and enterprise search businesses.
  • However, company cultures are not complementary, the HP way is a long way from the hyper-aggressive sales and marketing culture at Autonomy. Maintaining Autonomy as a separate entity run by Mike Lynch proves this and calls into question how much real synergy can be had from such a structure. I cannot see that being sustained.
  • This instantly makes HP a bigger e-Discovery player than IBM or any of the major IT firms.
  • Product overlap exists in document and records management but gets HP into the web content management and website optimization markets.
  • Autonomy has resisted deals over the years as its market capitalization ballooned as it went on its own acquisition binge. Autonomy couldn’t have waited much longer as it would have grown too big to be swallowed by even the largest predator.
  • At least Autonomy customers will now have a services organization to call on after they’ve bought the software. Customer support and after sales service has not been a strength of Autonomy.
  • This leaves the FTSE 100 with just one software firm of note.

Beyond ‘big data’

Alistair Croll published an interesting post this week entitled ‘there’s no such thing as big data’ in which he argued, prompted by a friend that “given how much traditional companies put [big data] to work, it might as well not exist.”

Tim O’Reilly continued the theme in his follow-up post, arguing:

“companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue”

There is much to agree with – in fact I have myself argued that when it comes to data, the key issue is not how much you have, but what you do with it. However, there is also a significant change of emphasis here from the underlying principles that have driven the interest in ‘big data’ in the last 12-18 months.

Compare Tim O’Reilly’s statement with the following, from Google’s seminal research paper The Unreasonable Effectiveness of Data:

“invariably, simple models and a lot of data trump more elaborate models based on less data”

While the two statements are not entirely contradictory, they do indicate a change in emphasis related to data. There has been so much emphasis of the ‘big’ in ‘big data’, as if the growing volume, variety and velocity of data itself would deliver improved business insights.

As I have argued in the introduction to our ‘total data’ management concept and the numerous presentations given on the subject this year, in order to deliver value from that data, you have to look beyond the nature of the data and consider what it is that the user wants to do with that data.

Specifically, we believe that one of the key factors in delivering value is companies focusing on storing and processing all of their data (or at least as much as is economically feasible) rather than analysing samples and extrapolating the results.

The other factor is time, and specifically how fast users can get to the results they are looking for. Another way of looking at this is in terms of the rate of query. Again, this is not about the nature of the data, but what the user wants to do with that data.

This focus on the rate of query has implications on the value of the data, as expressed in the following equation:

Value = (Volume ± Variety ± Velocity) x Totality/Time

The rate of query also has significant implications in terms of which technologies are deployed to store and process the data and to actually put the data to use in delivering business insight and value.

Getting back to the points made by Alistair and Tim in relation to the Unreasonable Effectiveness of Data, it would seem that to date there has been more focus on what Google referred to as “a lot of data”, and less on the “simple models” to deliver value from that data.

There is clearly a balance to be struck, and the answer lies not in ‘big data’ but “more clue” and defining and delivering those “simple models”.

Top Issues IT faces with Hadoop MapReduce: a Webinar with Platform Computing

Next Tuesday, August 3, at 8.30 AM PDT I’ll be taking part in a Webinar with Platform Computing to discuss the the benefits and challenges of Hadoop and MapReduce. Here’s the details:

With the explosion of data in the enterprise, especially unstructured data which constitutes about 80% of the total data in the enterprise, new tools and techniques are needed for business intelligence and big data processing. Apache Hadoop MapReduce is fast becoming the preferred solution for the analysis and processing of this data.

The speakers will address the issues facing enterprises deploying open source solutions. They will provide an overview of the solutions available for Big Data, discuss best practices, lessons learned, case studies and actionable plans to move your project forward.

To register for the event please visit the registration page.

Who is hiring Hadoop and MapReduce skills?

Continuing my recent exploration of Indeed.com’s job posting trends and data I have recently been taking a look at which organizations (excluding recruitment firms) are hiring Hadoop and MapReduce skills. The results are pretty interesting.

When it comes to who is hiring Hadoop skills, the answer, put simply, is Amazon, or more generally new media:

Source: Indeed.com Correct as of August 2, 2011

This is indicative of the early stage of adoption, and perhaps reflects the fact that many new media Hadoop adopters have chosen to self-support rather than turn to the Hadoop support providers/distributors.

It is no surprise to see those vendors also listed as they look to staff up to meet the expected levels of enterprise adoption (and it is worth noting that Amazon could also be included in the vendors category, given its Elastic MapReduce service).

Fascinating to see that of the vendors, VMware currently has the most job postings on Indeed.com referencing Hadoop, while Microsoft also makes an appearance.

Meanwhile the appearance of Northrop Grumman and Sears Holdings on this list indicates the potential for adoption in more traditional data management adopters, such as government and retail.

It is interesting to compare the results for Hadoop job postings with those mentioning Teradata, which shows a much more varied selection of retail, health, telecoms, and financial services providers, as well as systems integrators, government contractors, new media and vendors.

It is also interesting to compare Hadoop-related bog postings with those specifying MapReduce skills. There are a lot less of them, for a start, and while new media companies are well-represented, there is much greater interest from government contractors.

Source: Indeed.com Correct as of August 2, 2011

Variety, Velocity, and Volume: a Webinar with Azul Systems

This Wednesday, August 3, at 9 AM PDT I’ll be taking part in a Webinar with Azul Systems to discuss the performance challenges of big data in the enterprise. Here’s the details:

“Big Data” is a hot topic and the concept of “Big Data” is a useful frame for the challenges of scaling petabyte or terabyte data that typically cannot be addressed with traditional technologies. However, Big Data is no longer just a challenge for large social media companies – enterprise can also benefit from understanding when and how to apply these technologies and architectures.

In this Webinar Matthew Aslett of the 451 Group reviews the taxonomy of Big Data and explains how organizations are employing new data management technologies and approaches to ensure that they turn the data deluge into more accurate and efficient operations.

Gil Tene, CTO and co-founder of Azul Systems, will then highlight in greater detail the infrastructure and building block choices for enterprise architects and how to address the performance, scalability, and velocity challenges of Big Data in the enterprise.

Key takeways:

  • New strategies for integrating Big Data applications within your existing infrastructure and operations
  • Tradeoffs between capacity and performance
  • The importance and challenges of Java for Big Data in the enterprise.
  • To register for the event please visit the registration page.