The Data Day, Today: Feb 29 2012

Microsoft and Hortonworks expand Hadoop partnership. Oracle ships Exalytics. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Hortonworks to Bring Apache Hadoop to Millions of New Users Hortonworks and Microsoft expanded their relationship around Apache Hadoop.

See also:
# Big Data for Everyone: Using Microsoft’s Familiar BI Tools with Hadoop
# Microsoft’s Hadoop roadmap reveals new big data deliverables
# Karmasphere Expands Big Data Analytics on Hadoop in the Enterprise
# Datameer to Bring Hadoop Analytics to Windows Azure
# HStreaming Brings Real-Time Analytics to Microsoft’s Hadoop-based Services for Windows Server and Windows Azure

* Oracle Announces Availability of Oracle Exalytics In-Memory Machine

* Fujitsu Releases “Interstage Big Data Parallel Processing Server V1.0” to Help Enterprises Utilize Big Data

* Pentaho and DataStax announce strategic partnership delivering the first complete Apache Cassandra-based big data analytics solution to the market

* Cloudant Names Andy Palmer to its Board of Directors

* R integrated throughout the enterprise analytics stack

* Jaspersoft Announces Big Data Index to Track Demand for Big Data Analytics

* 1010data Enables Companies to Rapidly Model and Predict Individual Consumer Behavior and Social Network Relationships

* Tableau Software Teams with Attivio to Tap Unstructured Content and Deliver Deeper Insight to Business Users

* Infochimps and the Future of Data Marketplaces “This is the clearest indication yet that data marketplaces may be the latest ‘Application Service Provider’ cycle, as in right idea, wrong time.”

* HStreaming and RainStor Partner to Lower the Cost of Big-Data Analytics on Hadoop

* JustOne Database Sets the Stage for Accelerated Growth in 2012 and Beyond

* Big Data investment map

* A group of Google Engineers released “vitess” – a project to help scale MySQL databases.

* For 451 Research clients

# Reassessing the M&A potential of NoSQL and NewSQL Sector IQ report

# Sears Holdings creates Hadoop managed service provider MetaScale Impact Report

# Datawatch turns the corner with focus on report analytics suite Impact Report

# arcplan details growth plan, as it expands into front end for SAP HANA and social BI Impact Report

# Objectivity adds reusable queries to InfiniteGraph NoSQL database Market Development Report

# Host Analytics illuminates cloud performance management growth strategy and roadmap Market Development Report

And that’s the Data Day, today.

The Data Day, Today: Jan 13 2012

Splunk files for IPO. Oracle updates its price list. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Splunk Inc. Files Registration Statement for an Initial Public Offering And here it is.

* Oracle updated its Engineered System price list.

* Comparing Hadoop Appliances Great post from Pythian’s Gwen Shapira.

* What is big data? Edd Dumbill provides an introduction to the big data landscape.

* Why Couchbase? Damien Katz clarifies the reasons behind his preference for Couchbase over Apache CouchDB.

* Jaspersoft First to Develop Business Intelligence for Platform-as-a-Service BI suite now available with Red Hat OpenShift.

* Birst and ParAccel Partner to Deliver Scalable and Agile Big Data Analytics in the Cloud. Leverage.

* Recommind Names 451 Research Cofounder Nick Patience Director of Product Marketing and Strategy Our loss is Recommind’s gain.

* Oracle Unveils Oracle TimesTen In-Memory Database 11g Release 2 Performance and scalability improvements.

* Walkie Talkie App Voxer Soars Past a Billion Operations per Day powered by Basho Riak 10-4 good buddy.

* ISYS Search to Provide Enhanced Text Data Extraction Capabilities for New Generation of SAP Solutions OEM deal.

* Using SQLFire as a read-only cache for MySQL. VMware explains why and how.

* Announcing MySQL Enterprise Backup 3.7.0 Self-explanatory.

* Tableau Software Doubles Sales in 2011, Announces Massive Growth in Customer Roster Worldwide Customer base up by 40 percent in 2011.

* VoltDB Completes 2011 With Significant Market Growth and Company Expansion Including growth in new customer accounts of more than 300%.

* Clarabridge Wins Record Number of New Clients in 2011 More than 60 new Clarabridge Enterprise customers and more than 700 new Clarabridge Professional customers.

* For 451 Research clients

# Oracle selects Cloudera for Hadoop-based Big Data Appliance Market development report

# Microsoft may offer ‘big security data’ for free Analyst note

# Zimory considering virtual independence for cloud database business Market development report

# Jitterbit sheds light on growth strategy, integration business under new CEO Market development report

# SnapLogic snaps into the enterprise, shifts gaze away from midmarket integration Market development report

* Google News Search outlier of the day: My Best Friend’s Hair Launches Nationwide Website to Help You Find the Perfect Hairstylist

And that’s the Data Day, today.

Search by another name: enterprise search starts to mature into ‘application era’

Customers of The 451 Group would have seen my report on the enterprise search market published September 15. If you are a client, you can view it here. I thought it would be useful to provide a condensed version of the report to a wider audience as I think the market is at an important point it in its development and it merits a broader discussion.

The enterprise search market is morphing before our eyes into something new. Portions of it are disappearing, and others are moving into adjacent markets, but a core part of it will remain intact. A few key factors have caused this, we think. Some are historical, by which we mean they had their largest effect in the past, but the ongoing effect is still being felt, whereas the contemporary factors are the ones that we think are having their largest impact now, and will continue to do so in the short-term future (12-18 months).

Historical factors

  • Over-promising and under-delivery of intranet search between the last two US recessions, roughly between 2002 and 2007, resulting in a lot of failed projects.
  • A lack of market awareness and understanding of the value and risk inherent in unstructured data.
  • The entrance of Google into the market in 2002.
  • The lack of vision by certain closely related players in enterprise content management (ECM) and business intelligence (BI).

Contemporary factors

  • The lack of a clear value proposition for enterprise search.
  • The rise of open source, in particular Apache Lucene/Solr.
  • The emergence of big data, or total data.
  • The social media explosion.
  • The rapid spread of SharePoint.
  • The acquisitive growth of Autonomy Corp.
  • Acquisition of fast-growing players by major software vendors, notably Dassault Systemes, Hewlett-Packard and Microsoft.

The result of all this has been a split into roughly four markets, which we refer to as low-end, midmarket, OEM and high-end search-based applications.

Entry-level search

The low-end, or entry-level, enterprise search market has become, if not commodified, then pretty close to it. It is dominated by Google and open source. Other commercial vendors that once played in it have mostly left the market.

The result is that potential entry-level enterprise search customers are left with a dichotomy of choices: Google’s yellow search appliances that have two-year-term licenses and somewhat limited configurability (but are truly plug-and-play options) on the one hand, and open source on the other. It is a closed versus a very open box, and they have different and equally enthusiastic customer bases. Google is a very popular department-level choice, often purchased by line-of-business knowledge workers frustrated at obsolete and over-engineered search engines. Open source is, of course, popular with those that want to configure their search engine themselves or have a service provider do it and, thus, have a lot of control over how the engine works, as well as the results it delivers. Apache Lucene is also part of many commercial, high-end enterprise search products, including those of IBM.

Midmarket search

Mid-market search is a somewhat vague area, where vendors are succeeding in deals of roughly $75,000-250,000 selling intranet search. This area has thinned out as some vendors have tried to move upmarket into the world of search-based applications, but there are still many vendors making a decent living here. However, SharePoint has had a major effect on this part of the market, and if enterprises already have SharePoint – and Microsoft reckons more than 70% have at least bought a license at some point already – then it can be tough to offer a viable alternative. However, if SharePoint isn’t the main focus, then there is still a decent business to be had offering effective enterprise search, often in specific verticals, albeit without a huge amount of vertical customization.

OEM

The OEM search business has become a lot more interesting recently, in part due to which vendors have left it, leaving space for others. Microsoft’s acquisition of FAST in early 2008 meant one of the two major vendors at the time had essentially left the market entirely, since its focus moved almost entirely to SharePoint, as we recently documented. The other major OEM vendor at the time was Autonomy, and while it would still consider itself to be so, we think much of its OEM business, in fact, comes from document filters, rather than the OEMing of the IDOL search engine. Autonomy would strongly dispute that, but it might be moot soon anyway – it now looks as if it will end up as part of Hewlett-Packard following the announcement of its acquisition at a huge valuation, on August 18.

Those exits have left room for the rise of other vendors in the space. Key markets here include archiving, data-loss prevention and e-discovery. Many tools in these areas have old or quite basic search and text analysis functionality embedded in them, and vendors are looking for more powerful alternatives.

Search-based applications

The high end of the enterprise search market has become, in effect, the market for search-based applications (SBA) – that is, applications that are built on top of a search engine, rather than solely a relational database (although they often work alongside a database). These were touted back in the early 2000s by FAST, but it was too early, and FAST was too complex a set of tools to give the notion widespread acceptance. But in the latter part of the last decade and this one, SBAs have emerged as an answer to the problem of generic intranet search engines getting short shrift from users dissatisfied that the search engines don’t deliver what they want, when they want it.

Until recently, SBAs have mainly been a case of the vendors and their implementation partners building one-off custom applications for customers. But they are now moving to the stage where out-of-the-box user interfaces are being supplied for common tasks. In other words, it’s maturing in a similar way to the application software industry 20 years ago, which was built on top of the explosion in the use of relational databases.

We’ve seen examples in manufacturing, banking and customer service, and one of the key characteristics of SBAs is their ability to combine structured and unstructured data together in a single interface. That was also the goal of earlier efforts to combine search with business-intelligence tools, which often simply took the form of adding a search engine to a BI tool. That was too simplistic, and the idea didn’t really take off, in part because search vendors hadn’t paid enough attention to structure data.

But SBAs, which put much more focus on the indexing process than earlier efforts, appear to be gaining traction. If we were to get to the situation where search indexes are considered a better way of manipulating disparate data types than relational databases, that would be a major shift (see big data). Another key element of successful SBAs is that they don’t look like traditional search engines, with a large amount of white space and a search bar in the middle of the screen. Rather, they make use of facets and other navigation techniques to guide users through information, or often simply to present the relevant information to them.

As I mentioned, there’s more in the full report, including more about specific vendors, total (or big) data and the impact of social media. If you’d like to know more about it, please get in touch with me.

Who is hiring Hadoop and MapReduce skills?

Continuing my recent exploration of Indeed.com’s job posting trends and data I have recently been taking a look at which organizations (excluding recruitment firms) are hiring Hadoop and MapReduce skills. The results are pretty interesting.

When it comes to who is hiring Hadoop skills, the answer, put simply, is Amazon, or more generally new media:


Source: Indeed.com Correct as of August 2, 2011

This is indicative of the early stage of adoption, and perhaps reflects the fact that many new media Hadoop adopters have chosen to self-support rather than turn to the Hadoop support providers/distributors.

It is no surprise to see those vendors also listed as they look to staff up to meet the expected levels of enterprise adoption (and it is worth noting that Amazon could also be included in the vendors category, given its Elastic MapReduce service).

Fascinating to see that of the vendors, VMware currently has the most job postings on Indeed.com referencing Hadoop, while Microsoft also makes an appearance.

Meanwhile the appearance of Northrop Grumman and Sears Holdings on this list indicates the potential for adoption in more traditional data management adopters, such as government and retail.

It is interesting to compare the results for Hadoop job postings with those mentioning Teradata, which shows a much more varied selection of retail, health, telecoms, and financial services providers, as well as systems integrators, government contractors, new media and vendors.

It is also interesting to compare Hadoop-related bog postings with those specifying MapReduce skills. There are a lot less of them, for a start, and while new media companies are well-represented, there is much greater interest from government contractors.


Source: Indeed.com Correct as of August 2, 2011

Sizing and analyzing the cloud-based archiving market

The cloud archiving market will generate around $193m in revenues in 2010, growing at a CAGR of 36% to reach $664m by 2014.

This is a key finding from a new 451 report published this week, which offers an in-depth analysis of the growing opportunity around how the cloud is being utilized to meet enterprise data retention requirements.

As well as sizing the market, the 50-page report – Cloud Archiving; A New Model for Enterprise Data Retention – details market evolution, adoption drivers and benefits, plus potential drawbacks and risks.

These issues are examined in more detail via five case studies offering real world experiences of organizations that have embraced the cloud for archiving purposes. The report also offers a comprehensive overview of the key players from a supplier perspective, with detailed profiles of cloud archive service providers, with discussion of related enabling technologies that will act as a catalyst for adoption, as well as expected future market developments.

Profiled suppliers include:

  • Autonomy
  • Dell
  • Global Relay
  • Google
  • i365
  • Iron Mountain
  • LiveOffice
  • Microsoft
  • Mimecast
  • Nirvanix
  • Proofpoint
  • SMARSH
  • Sonian
  • Zetta

Why a dedicated report on archiving in the cloud, you may ask? It’s a fair question, and one that we encountered internally, since archiving aging data is hardly the most dynamic-sounding application for the cloud.

However, we believe cloud archiving is an important market for a couple of reasons.  First, archiving is a relatively low-risk way of leveraging cloud economics for data storage and retention, and is less affected by the performance/latency limitation that have stymied enterprise adoption of other cloud-storage applications, such as online backup. For this reason, the market is already big enough in revenue terms to sustain a good number of suppliers; a broad spectrum that spans from Internet/IT giants to tiny, VC-backed startups. It is also set to experience continued healthy growth in the coming years as adoption extends from niche, highly regulated markets (such as financial services) to more mainstream organizations. This will pull additional suppliers – including some large players — into the market through a combination of organic development and acquisition.

Second, archiving is establishing itself as a crucial ‘gateway’ application for the cloud that could encourage organizations to embrace the cloud for other IT processes. Though it is still clearly early days, innovative suppliers are looking at ways in which data stored in an archive can be leveraged in other valuable ways.

All of these issues, and more, are examined in much more detail in the report, which is available to CloudScape subscribers here and Information Management subscribers here. An executive summary and table of contents (PDF) can be found here.

Finally, the report should act as an excellent primer for those interested in knowing more about how the cloud can be leveraged to help support ediscovery processes; this will be covered in much more detail in another report to be published soon by Katey Wood.

Innovation vs. M&A at Yahoo

Now that the deal between Microsoft and Yahoo is done, it got me thinking more about Yahoo and it’s position as a pioneer of the web. That it was one is not in doubt; its directory was indeed a first and useful. But how much else did it actually pioneer? Have a look at a list of Yahoo’s acquisitions.

  • Search – surely the cornerstone of the business, yet its very first search engine was licensed from Open Text in 1995. Sure, it went on to build its own and do some great work, but now with the deal done with Microsoft, Yahoo has exited the search business.
  • Webmail – I recall getting a great Yahoo email address when that came out in 1997; it was a pioneer, but it got there by purchasing Four11.com and its RocketMail service . Now my account is overrun with spam and unusable (yet the company has the temerity to regularly threaten to cut me off for having too much mail and not using it enough, all of which is spam that its own spam filters can’t control. Hmmm.)
  • Personal publishing – it was an early major player in personal publishing now called blogging, but again it got there largely by purchasing GeoCities for an enormous amount of Yahoo stock.
  • Advertising – it pioneered banner ads on the web but as we all know but caught out by keyword search innovation from Google (which it built, rather than bought). And it had to buy Overture from Idealab.
  • Rich media – the Broadcast.com deal unleashed Mark Cuban on an unsuspecting world.
  • Photos – to its credit and bought Flickr and then largely left it alone
  • Social bookmarking – same goes for De.licio.us

But if the rest that can be said for it is that Yahoo knew when to leave its acquired companies alone, to give them space to grow and continue innovating then all that’s left is a dwindling brand and company with some choice assets left intact for others to pick up over time. As this article pointed out recently, many of these acquired assets have been closed down, some only after a few years.

There is still innovation happening within Yahoo, plenty of it and I wish the folks at Yahoo working in the labs on some great semantic technology, among other things the best of luck. But touting a new home page just last week doesn’t give me much hope that Yahoo really gets the distributed, read/write Web. Who cares about home pages?

And of course Yahoo isn’t going anywhere soon, it has plenty of cash in the bank and a new revenue stream courtesy of Microsoft. But I doubt it will last out the 10 years of this deal.

One of things all this demonstrates is that M&A is different in different parts of the tech industry. In enterprise software quite often a company is buying a customer base and its ongoing maintenance stream – this is how Oracle has grown. But on the web with people not paying you directly and with very low (to nil) switching costs to another search engine, serving up a different set of ads, things are very different and you have to focus on core competences, not run after each new fad just as it’s peaking and buy your way into it.

On the opportunities for cloud-based databases and data warehousing

At last year’s 451 Group client event I presented on the topic of database management trends and databases in the cloud.

At the time there was a lot of interest in cloud-based data management as Oracle and Microsoft had recently made their database management systems available on Amazon Web Services and Microsoft was about to launch the Azure platform.

In the presentation I made the distinction between online distributed databases (BigTable, HBase, Hypertable), simple data query services (SimpleDB, Microsoft SSDS as was), and relational databases in the cloud (Oracle, MySQL, SQL Server on AWS etc) and cautioned that although relational databases were being made available on cloud platforms, there were a number of issues to be overcome, such as licensing, pricing, provisioning and administration.

Since then we have seen very little activity from the major database players with regards to cloud computing (although Microsoft has evolved SQL Data Services to be a full-blown relational database as a service for the cloud, see the 451’s take on that here).

In comparison there has been a lot more activity in the data warehousing space with regards to cloud computing. On the one hand there data warehousing players are later to the cloud, but in another they are more advanced, and for a couple of reasons I believe data warehousing is better suited to cloud deployments than the general purpose database.

  • For one thing most analytical databases are better suited to deployment in the cloud thanks to their massively parallel architectures being a better fit for clustered and virtualized cloud environments.
  • And for another, (some) analytics applications are perhaps better suited to cloud environments since they require large amounts of data to be stored for long periods but processed infrequently.
  • We have therefore seen more progress from analytical than transactional database vendors this year with regards to cloud computing. Vertica Systems launched its Vertica Analytic Database for the Cloud on EC2 in May 2008 (and is wotking on cloud computing services from Sun and Rackspace), while Aster Data followed suit with the launch of Aster nCluster Cloud Edition for Amazon and AppNexus in February this year, while February also saw Netezza partner with AppNexus on a data warehouse cloud service. The likes of Teradata and illuminate are also thinking about, if not talking about, cloud deployments.

    To be clear the early interest in cloud-based data warehousing appears to be in development and test rather than mission critical analytics applications, although there are early adopters and ShareThis, the online information-sharing service, is up and running on Amazon Web Services’ EC2 with Aster Data, while search marketing firm Didit is running nCluster Cloud Edition on AppNexus’ PrivateScale, and Sonian is using the Vertica Analytic Database for the Cloud on EC2.

    Greenplum today launched its take on data warehousing in the cloud, focusing its attention initially on private cloud deployments with its Enterprise Data Cloud initiative and plans to deliver “a new vision for bringing the power of self-service to data warehousing and analytics”.

    That may sound a bit woolly (and we do see the EDC as the first step towards private cloud deployments) but the plan to enable the Greenplum Database to act as a flexible pool of warehoused data from which business users will be able to provision data marts makes sense as enterprises look to replicate the potential benefits of cloud computing in their datacenters.

    Functionality including self-service provisioning and elastic scalability are still to come but version 3.3 does include online data-warehouse expansion capabilities and is available now. Greenplum also notes that it has customers using the Greenplum Database in private cloud environments, including Fox Interactive Media’s MySpace, Zions Bancorporation and Future Group.

    The initiative will also focus on agile development methodologies and an ecosystem of partners, and while we were somewhat surprised by the lack of virtualization and cloud provisioning vendors involved in today’s announcement, we are told they are in the works.

    In the meantime we are confident that Greenplum’s won’t be the last announcement from a data management focused on enabling private cloud computing deployments. While much of the initial focus around cloud-based data management was naturally focused on the likes of SimpleDB the ability to deliver flexible access to, and processing of, enterprise data is more likely to be taking place behind the firewall while users consider what data and which applications are suitable for the public cloud.

    Also worth mentioning while we’re on the subject in RainStor, the new cloud archive service recently launched by Clearpace Software, which enable users to retire data from legacy applications to Amazon S3 while ensuring that the data is available for querying on an ad hoc basis using EC2. Its an idea that resonates thanks to compliance-driven requirements for long-term data storage, combined with the cost of storing and accessing that data.

    451 Group subscribers should stay tuned for our formal take on RainStor, which should be published any day now, while I think it’s probably fair to say you can expect more of this discussion at this year’s client event.

    Enterprise Search Summit 09 perspectives

    I started off this year’s Enterprise Search Summit in New York last week with a dinner sponsored by New Idea Engineering and Attivio on Monday night, which was highly enjoyable, despite my jetlag – having to try and stay up the first night in from London. Thanks to those folks for the invite and the conversation.

    Katey and I were not allowed to sit in any of the session this year from some strange reason. So I can’t tell you first hand about what was interesting or not or the attendance in the sessions. Go figure. It also wasn’t that conducive to meeting end users, which is a main objective of attending these things.

    Katey reckoned attendance overall was slightly down on last year, but not spectacularly so (I was at different conference and so had to miss last year’s).

    So away from those two disappointments, we did have a fairly full docket of meetings with vendors, which were generally lively, with good give and take. Where we say ‘451 research to follow,’ it means our clients can expect a research report on the company in the near future.

    Some of the highlights:

    Attivio – CTO Sid Probstein is always chock-full of ideas and so always good to have a sitdown with him. CEO Ali Riaz is entertaining on a whole different level. The company appears to be going great guns and is at the forefront of the drive to combine structured and unstructured data as we have said before.

    BA-Insight – not really a search company or a text analysis company; more of a piece of information management middleware that aims to increase ‘findability’ within SharePoint. As any SharePoint users, especially those in an environment with multiple SharePoint sites – that can only be a good thing. Connectors to other search engines coming. 451 research to follow.

    Coveo – the company was out in force at this conference having just launched version 6.0 of its search platform featuring better scalability, connectors and mobile functionality. We covered that product update a short while back.

    Endeca – met chief scientist Daniel Tunkelang for the first time. Clearly the owner of an active mind, Daniel presents a different face to the search company. His thoughts on the conference are here.

    Google – the typically on-message briefing from Google. It owns the low end and is increasingly taking chunks out of the mid-tier, but still no sign of the management layer enterprises needed to get their arms around the myriad Google search appliances lying around most large organizations. It will probably appear out of the blue at some point though, this year, I’d imagine.

    MicrosoftNate Treolar was a great evangelist for Fast Search & Transfer while a product manager, and so it seems appropriate that he has the term ‘evangelist’ in his title at Microsoft where he’s working not only on the SharePoint search ecosystem but other programs such as ‘conversational’ and ‘actionable’ search; talking and doing, hey, what else is there? 😉

    PerfectSearch – we don’t usually see too many companies at this conference that we haven’t spoken to before, but PerfectSearch is one of them. It sells a search appliance and some of the founders have a Novell background, hence its Orem, IT HQ. 451 research to follow.

    Vivisimo – from what we’ve heard the company is going well, both in the indirect (OEM) ad direct markets. We’ve noticed how often this company is being bad-mouthed by its competitors (over and above the usual FUD in any tech market) though we’re not sure why. Perhaps because Pittsburgh isn’t as fashionable as Boston or the Valley? Don’t really know, but it seems misplaced based on our experience. It’s making good headway with Lexis-Nexis, which will be important in the eDiscovery market as well with other customers that have demanded confidentiality (pretty common in the eDiscovery market). 451 research to follow.

    Microsoft sheds more light on Office 14

    Microsoft has begun to share information on what it calls the “waves” of Office 14 products set to hit the market this year and next. Most of the information at this point is on Microsoft Exchange 2010, which has entered public beta. General availability is expected in the second half of this year.

    There’s also some info for SharePoint, though little detail. Microsoft SharePoint Server 2010 will go into technical preview in Q3 2009 and be generally available in the first half of 2010.  Beyond that, we still don’t know what will and won’t be in SharePoint.next (though we don’t have to call it that anymore).

    The part of the Exchange 2010 announcement that caught my attention is the reference to an integrated e-mail archive.  Did Microsoft just enter the email archiving market?  That would certainly be noteworthy, given that much of the hot email archiving market involves archiving Exchange email.  Since Microsoft hasn’t had a horse in this race, this has been the realm of third-party providers like Symantec and Mimosa Systems to date.

    On the analyst telebriefing held today by Microsoft on this announcement, I asked about this and the role for Microsoft’s email archiving partners going forward.  Michael Atalla, Group Product Manager for Exchange at Microft told me that Microsoft is out to meet the needs of the 80% of its customers that don’t yet have any email archiving technology and that existing email archiving products serve a “niche” of the market at the high end for customers that have to meet regulatory requirements for email archiving.

    While I agree there is still a lot of opportunity in the email archiving space, describing existing adoption as limited to those in regulated industries isn’t exactly accurate.

    I’ve tried to dig deeper into what this integrated archive includes.  Not easy, as there is no mention of archiving at all in the TechNet docs on Exchange 2010 (though there’s quite a bit of interesting detail on records and retention management).

    Best I can tell, Exchange 2010 lets you create individual or “personal archives.”  This page from Microsoft explains that a personal archive is:

    an additional mailbox associated with a user’s primary mailbox.  It appears alongside the primary mailbox folders in Outlook. In this way, the user has direct access to e-mail within the archive just as they would their primary mailbox. Users can drag and drop PST files into the Personal Archive, for easier online access – and more efficient discovery by the organization. Mail items from the primary archive can also be offloaded to the Personal Archive automatically, using Retention Polices…

    So it moves the PST file from the desktop to the server, which makes it more available for online searching and discovery purposes.  But is that really email archiving?  I can see how that would be attractive to end users that want an easier way to access archived emails, but it seems like it would increase the load on the mail server and not handle things like de-duping, which archiving is generally meant to address.

    I’m not an expert on email archiving though.  I’d love to hear from anyone who has comments.

    What will NOT be in the next version of SharePoint

    I might catch a lot of readers with that title, but of course I don’t really know for sure what will and won’t be in the next version of SharePoint.  Microsoft is still mum on the topic and I suspect will remain so until the SharePoint Conference slated for October.  This event was held in March last year; it seems logical it has been delayed this year to time the event with Office 14 announcements specific to SharePoint.

    I read Guy Creese’s post last week on what he thinks will be in the next version of SharePoint and like Guy, I get a lot of questions in this vein.  I agree with Guy that SharePoint.next will have search improvements (we already know that one) and more sophisticated administration (we all hope). I’ll be surprised to see dramatic improvements in the transition between hosted and on-premise SharePoint in this version, I think the marketing is likely to lead the reality in this area for sometime to come, but perhaps I’ll be surprised.

    I often get questions more specifically (from vendors) around what Microsoft isn’t going to do and reading Guy’s post, I thought it would be interesting to comment on what’s left out.

    On the social software front…

    There’s been some debate of late about whether or not SharePoint is an “Enterprise 2.0” tool at all (or what, in fact, that even means, if anything). But anyone who saw Lawrence Liu pitch SharePoint versus IBM Lotus Connections to a packed room at Enterprise 2.0 last year, would certainly assume Microsoft has ambitions in this area.  It’s worth noting however that Liu left Microsoft not long after that for Telligent Systems, which sells community software as an adjunct to SharePoint.  Liu presumably knows more about the SharePoint roadmap than we do, so looking at Telligent’s roadmap (limited version here) is probably a good indication of where Microsoft won’t go in social software in this next release (think community analytics, bridging internal and external communities, and feed aggregation).

    It’s not about WCM.

    Making SharePoint ubiquitous for content-based collaboration is Microsoft’s number one goal and this means improved admin, search and social software, to my mind.   So what will get left out?   I don’t think we’ll see any major changes on the WCM front.  Microsoft marketed the WCM capabilities in MOSS 2007 when it first came out, as it stopped development on its stand-alone WCM product, Microsoft CMS (which came from its 2001 acquisition of nCompass) in favor of Sharepoint.  But this seems to have died down and vendors like Sitecore are doing well selling more sophisticated WCM with SharePoint integrations, apparently with cooperation from Microsoft.  WCM for large, customer-facing sites, is really not where SharePoint strengths lie and Microsoft will likely let this one stand much as it is as it invests in other areas (Sitecore even sells a bundle for intranets, showing some market opportunity for WCM even in SharePoint’s sweet spot).

    What about records management and archiving?

    There’s some records management today in SharePoint, but it’s limited to SharePoint environments.  Improved admin across server farms could help here but it doesn’t seem likely Microsoft is going to go far beyond this and this doesn’t address the archiving issue at all.  Vendors like Open Text, Symantec and EMC are banking on their products’ abilities to manage and archive content (including email) from multiple repositories including SharePoint.  And this seems like a market that will be relatively immune to changes in SharePoint.next — indeed, changes that make SharePoint more popular are likely only good news to these vendors, at least in the short term.

    I’m sure there are other gaps vendors are filling where they may be some continued opportunity after SharePoint.next, but those are the big ones that jump to my  mind.