Entries from January 2012 ↓
January 31st, 2012 — Data management
As expected, EMC has announced that it is integrating its Greenplum HD distribution of Apache Hadoop with its Isilon scale-out NAS technology. The move coincides with a re-branding of the company’s Hadoop distributions that, while slight, could prove significant.
Specifically, EMC has enabled the Hadoop Distributed File System (HDFS) as a native protocol supported on OneFS in addition to Network File System (NFS) and Common Internet File System (CIFS) support, enabling Isilon systems to provide the underlying storage layer for Hadoop processing, as well as a common storage pool for Hadoop and other systems.
EMC is talking up the benefits of combining Isilon with Greenplum HD. For the record, that’s the Hadoop distribution previously known as Greenplum HD Community Edition, based on the Apache Hadoop 0.20.1 code branch.
Greenplum HD Enterprise Edition, based on MapR Technologies’ M5 distribution, is now known as Greenplum MR, and is not supported by Isilon due to the fact that it replaces HDFS with Direct Access NFS.
EMC notes that Greenplum MR is being positioned as a high-performance Hadoop offering for customers that have failed to achieve their required performance from other distributions.
While EMC is quick to maintain its happiness with the MapR relationship and its commitment to Greenplum MR, it’s clear that tight integration with Isilon, particularly in the EMC Greenplum DCA, will result in an expanded role for Greenplum HD.
Additionally, while the company’s Greenplum Command Center provides unified management for the Greenplum Database, Greenplum HD and Greenplum Chorus as part of the recently announced Unified Analytics Platform (UAP), MapR has its own management and monitoring functionality.
Since we expect EMC to pitch the benefits of integrated software in UAP and software and hardware in DCA, it is now clear that Greenplum HD, rather than the Greenplum MR, is considered the company’s primary Hadoop distribution.
Given Greenplum HD’s starring role in the Unified Analytics Platform (UAP), Data Computing Appliance (DCA) and integration with Isilon, Greenplum MR’s role is likely to become increasingly niche.
January 30th, 2012 — Data management
I put this slide together for my own benefit as I was trying to keep track of the various incarnations of Couchbase’s brands. Looks like I wasn’t the only one, so I thought I’d also make our perspective available.
There are a couple of differences between our slide and Koji Kawamura’s:
Ours contains an extra layer of names (e.g. “Elastic Couchbase”) that were briefly used by Couchbase in discussion and I believe in marketing, although never for shipping product.
Also ours doesn’t mention memcached. It could be on there given that Membase is based on it, and Couchbase Server can still be deployed in “memcached only mode”, but in that sense it is a feature of Membase/Couchbase Server. And anyway, I couldn’t fit it on đ
January 27th, 2012 — Data management
January 27th, 2012 — Data management
451 Research yesterday announced that it has published its 2012 Previews report, an all-encompassing report highlighting the most disruptive and significant trends that our analysts expect to dominate and drive the enterprise IT industry agenda over the coming year.
The 93 page report provides an outlook and assessment across all 451 Research technology sectors and practice areas â including software infrastructure, cloud enablement, hosting, security, datacenter technologies, hardware, information management, mobility, networking and eco-efficient IT â with input from our team of 40+ analysts. The 2012 Previews report is available upon request here.
IM research director Simon Robinson has already provided a taster of our predictions as they relate to the information-centric landscape. Below I have outlined some of our core predictions related to the data-centric ecosystem:
The overall trend predicted for 2012 could best be described as the shifting focus from volume, velocity and velocity, to delivering value. Out concept of Total Data reflects the path from velocity and variety of information sources to the all-important endgame of deriving value from data. We expect to see increased interest in data integration and analytics technologies and approaches designed specifically to exploit the potential benefits of ‘big data’ and mainstream adoption of Hadoop and other new sources of data.
We also anticipate, and are beginning to see, increased focus on technologies that enable access to data in different storage platforms without requiring data movement. We believe there is an emerging role for what we are calling the ‘data hub‘ â an independent platform that is responsible for managing access to data on the various data storage and processing technologies.
Increased understanding of the value of analytics will also increase interest in the integration of analytics into operational applications. Embedded analytics is nothing new, but has the potential to achieve mainstream adoption this year as the dominant purveyors of applications used to run operations are increasingly focused on serving up embedded analytics as a key component within their product portfolios. Equally importantly, many of them now have database platforms capable of uniting previously disparate technologies to deliver true embedded analysis.
There has been a growing recognition over the past year or so that any type of data management project â whether focused on master data management (MDM), data or application integration, or data quality â needs to bring real benefits to business processes. Some may see this assertion as obvious and pretty easy to achieve, but that’s not necessarily the case. However, it is likely to become more so in the next 12-18 months as companies realize a process-driven approach to most data management programs makes sense and vendors deliver capabilities to meet this demand.
While ‘big data’ presents a number of opportunities, it also poses many challenges, not the least of which is the lack of developers, managers, analysts and scientists with analytics skills. The users and investors placing a bet on the opportunities offered by new data management products are unlikely to be laughing if it turns out that they cannot employ people to deploy, manage and run those products, or analysts to make sense of the data they produce. It is not surprising that, therefore, the vendors that supply those technologies are investing in ensuring that there is a competent workforce to support existing and new projects.
Finally, while cloud computing may be one of the technology industry’s hot topics, it has had relatively little impact on the data management sector to date. That is not to say that databases are not available on cloud computing platforms, but we must make a distinction between databases that are deployed in public clouds, and ‘cloud databases‘ that have the potential to fulfil the role of emerging databases in building private and hybrid clouds. The former have been available for many years. The latter are just beginning to come to fruition based on NoSQL databases, as well as a new breed of NewSQL relational databases, designed to meet the performance, scalability and flexibility needs of large-scale data processing.
451 Research clients can get more details of these specific predictions via our 2012 preview â Information Management, Part 2. Non-clients can apply for trial access at the same link, while the entire 2012 Previews report is available here.
Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soonâŠ
January 27th, 2012 — Archiving, Collaboration, Content management, Data management, eDiscovery, Search, Text analysis
Every New Year affords us the opportunity to dust down our collective crystal balls and predict what we think will be the key trends and technologies dominating our respective coverage areas over the coming 12 months.We at 451 Research just published our 2012 Preview report; at almost 100 pages itâs a monster, but offers some great insights across twelve technology subsectors, spanning from managed hosting and the future of cloud to the emergence of software-defined networking and solid state storage; and everything in between. The report is available to both 451Research clients and non-clients (in return for a few details); access the landing page
here.  Thereâs a press release of highlights
here. Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…
Here are a selection of key takeaways from the first part of the Information Management preview, which focuses on information governance, ediscovery, search, collaboration and file sharing. (Matt Aslett will be posting highlights of part 2, which focuses more on data management and analytics, shortly.)
- One of the most obvious common themes that will continue to influence technology spending decisions in the coming year is the impact of continued explosive data and information growth.  This  continues to shape new legal frameworks and technology stacks around information governance and e-discovery, as well as to drive a new breed of applications growing up around what we term the ‘Total Data’ landscape.
- Data volumes and distributed data drive the need for more automation and auto-classification capabilities will continue to emerge more successfully in e-discovery, information governance and data protection veins — indeed, we expect to see more intersection between these, as we noted in a recent post.
- The maturing of the cloud model â especially as it relates to file sharing and collaboration, but also from a more structured database perspective â will drive new opportunities and challenges for IT professionals in the coming year. Â Looks like 2012 may be the year of âDropbox for the enterprise.â
- One of the big emerging issues that rose to the fore in 2011, and is bound to get more attention as the New Year proceeds, is around the dearth of IT and business skills in some of these areas, without which the industry at large will struggle to harness and truly exploit the attendant opportunities.
- The changes in information management in recent years have encouraged (or forced) collaboration between IT departments, as well as between IT and other functions. Although this highlights that many of the issues here are as much about people and processes as they are about technology, the organizations able to leap ahead in 2012 will be those that can most effectively manage the interaction of all three.
- We also see more movement of underlying information management infrastructures into the applications arena. Â This is true with search-based applications, as well as in the Web-experience management vein, which moves beyond pure Web content management. Â And while Microsoft SharePoint continues to gain adoption as a base layer of content-management infrastructure, there is also growth in the ISV community that can extend SharePoint into different areas at the application-level.
There is a lot more in the report about proposed changes in the e-discovery arena, advances of the cloud, enterprise search and impact of mobile devices and bring-your-device-to-work on information management.
January 24th, 2012 — Data management
January 23rd, 2012 — Data management
If you’re a MySQL user, tell us about your adoption plans by taking our current survey.
Back in late 2009, at the height of the concern about Oracle’s imminent acquisition of Sun Microsystems and MySQL, 451 Research conducted a survey of open source software users to assess their database usage and attitudes towards Oracle.
The results provided an interesting snapshot of the potential implications of the acquisition and the concerns of MySQL users and even, so I am told, became part of the European Commission’s hearing into the proposed acquisition (used by both sides, apparently, which says something about both our independence and the malleability of data).
One of the most interesting aspects concerned the apparently imminent decline in the usage of MySQL. Of the 285 MySQL users in our 2009 survey, only 90.2% still expected to be using it two years later, and only 81.8% in 2014.
Other non-MySQL users expected to adopt the open source database after 2009, but the overall prediction was decline. While 82.1% of our sample of 347 open source users were using MySQL in 2009, only 78.7% expected to be using it in 2011, declining to 72.3% in 2014.
This represented an interesting snapshot of sentiment towards MySQL, but the result also had to be taken with a pinch of salt given the significant level of concern regarding MySQL future at the time the survey was conducted.
The survey also showed that only 17% of MySQL users thought that Oracle should be allowed to keep MySQL, while 14% of MySQL users were less likely to use MySQL if Oracle completed the acquisition.
That is why we are asking similar questions again, in our recently launched MySQL/NoSQL/NewSQL survey.
More than two years later Oracle has demonstrated that it did not have nefarious plans for MySQL. While its stewardship has not been without controversial moments, Oracle has also invested in the MySQL development process and improved the performance of the core product significantly. There are undoubtedly users that have turned away from MySQL because of Oracle but we also hear of others that have adopted the open source database specifically because of Oracle’s backing.
That is why we are now asking MySQL users to again tell us about their database usage, as well as attitudes to MySQL following its acquisition by Oracle. Since the database landscape has changed considerably late 2009, we are now also asking about NoSQL and NewSQL adoption plans.
Is MySQL usage really in decline, or was the dip suggested by our 2009 survey the result of a frenzy of uncertainty and doubt given the imminent acquisition. Will our current survey confirm or contradict that result? If you’re a MySQL user, tell us about your adoption plans by taking our current survey.
January 20th, 2012 — Archiving, eDiscovery, M&A
We commented recently on Symantecâs acquisition of cloud archiving specialist LiveOffice. The announcement also afforded Big Yellow an opportunity to unveil what it calls âIntelligent Information Governance;â an over-arching theme that provides the context for some of the product-level integrations it has been working on. For example, it just announced improved integration between its Clearwell eDiscovery suite and its on-premise archive software, EnterpriseVault (stay tuned for more on this following LegalTech later this month).
Thereâs clearly an opportunity to go deeper than product-level âintegration,â however. Â In a blog post, Symantec VP Brian Dye raised an issue that we have been seeing for a while, especially among some of our larger end-user clients. In the post, Brian discusses the fundamental contention that all of us â from individuals to corporations to governments — face around information governance — striking the right balance between control of information and freedom of information.
Software has emerged to help us manage this contention, most typically through data loss prevention (DLP) tools â to control what data does and doesnât leave the organization — and eDiscovery and records management tools, to control what data is retained, and for how long. Brian noted that there is an opportunity to do much more here by linking the two sides of what is in many ways the same coin, for example by sharing the classification schemes used to define and manage critical and confidential information.
This is an idea that we have discussed at length internally, with some of our larger end-user clients, and with a good few security and IM vendors. Notably, many vendors responded by telling us that, though a good idea in principle, in reality organizations are too siloed to get value from such capabilities; DLP is owned and operated by the security team, while eDiscovery is managed by legal, records management and technology teams. While some of the end-users we have discussed this with are certainly siloed to a point, they are also working to address this issue by developing a more collaborative approach, establishing cross-functional teams, and so on.
A cynic would point out that some self interest might be at play here too from a vendor perspective; why sell one integrated product to a company when you can sell them essentially the same technology twice. But of course, weâre not the remotest bit cynical (!) Â There is also the reality that at most large vendors, product portfolios have been put together at least in part by acquisitions. Â Security and e-discovery products may be sold separately because they are, in fact, separate products with little to no integration in terms of products or sales organizations. Â And vendors may not yet be motivated to do the hard integration work (technically, organizationally), if they are not seeing consistent enough demand from consolidated buying teams at large organizations.
Wendy Nather, Research Director of our security practice, notes that such integration is desirable;
– Users don’t WANT to have meta-thoughts about their data; they just want to get their work done, which is why it’s hard to implement a user-driven classification process for DLP or for governance. Â The alternative is a top-down implementation, and that would work even better with only one ‘top’ — that is, the security and legal teams working from the same integrated page.
However, Wendy also notes that such an approach is itself not without complexity;
– Confidential data can be highly contextual in nature (for example, when data samples get small enough to identify individuals, triggering HIPAA or FERPA); you need advanced analytics on top of your DLP to trigger a re-classification when this happens. Â Why, you might even call this Data Event Management (DEM).
Itâs notable that Symantec is now starting to talk up the notion of a unified, or converged approach to data classification. Of course, it is one of the better-positioned vendors to take advantage here, given its acquisitions in both DLP (Vontu in 2007) and eDiscovery (Clearwell in 2011), while LiveOffice adds some intriguing options for doing some of this in the cloud (especially if merged with its hosted security offerings from MessageLabs).
Nonetheless, we look forward to hearing more from Symantec — and others — about progress here through 2012. Indeed, if you are attending LegalTech in New York in a couple of weeks, then our eDiscovery analyst David Horrigan would love to hear your thoughts. Additionally, senior security analyst Steve Coplan will be taking a longer look at the convergence of data management and security in his upcoming report on “The Identities of Data.”
In other words, this is a topic that weâre expending a fair amount of energy on ourselves; watch this space!
January 19th, 2012 — Data management
Amazon launches DynamoDB. Red Hat virtually supports JasperReports. And more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Amazon Web Services Launches Amazon DynamoDB See also blog posts from Werner Vogels and Jeff Barr, as well as reaction from DataStax and Basho.
* Jaspersoft Delivers Analytics for Red Hat Enterprise Virtualization Customers JasperReports Server is embedded in Red Hat Enterprise Virtualization 3.0.
* Tableau 7.0 Brings Simplicity to Business Intelligence Including new Data Server for data sharing and management.
* Hortonworks to Deliver Next-Generation of Apache Hadoop Pre-announcement (emphasis on the pre).
* RainStor Announces First Enterprise Database Running Natively on Hadoop as well as partnerships with Cloudera, Hortonworks, and MapR, and support from Composite Software.
* Talend Platform for Data Services Operationalizes Information and Data A common development, deployment and monitoring environment for both data management and application integration.
* Fujitsu Launches Cloud Services as a Platform for Big Data Data Utilization Platform Services.
* All you wanted to know about Hadoop, but were too afraid to ask A graphic illustration of the various versions of Apache Hadoop.
* Oracle Database or Hadoop? Another good post from Pythian’s Gwen Shapira. See also Aaron Cordova’s Do I need SQL or Hadoop?
* Meet Code 42, Accelâs first Big Data Fund investment GigaOM has the details.
* MapR CEO Sees Big Changes in Big Data in 2012 Predictive.
* Introducing DataFu: an open source collection of useful Apache Pig UDFs LinkedIn launches open source user-defined functions.
* Big Data Needs Data Scientists, Or Quants, Or Excel Jockeys … or something.
* Career of the Future: Data Scientist [INFOGRAPHIC] Infotaining.
* Knives out for Oracle. SAP and IBM offer some perspectives on Exalytics and Big Data Appliance respectively.
* For 451 Research clients
# Information Builders uses Infobright to take BI in-memory, expands SMB reach Market development report
# RainStor launches database complement to Apache Hadoop Market development report
# Heroku’s Postgres is poised for growing interest in database as a service Market development report
* Google News Search outlier of the day: This Spud’s For All of You: “2012 Is the Year of the Potato”
And that’s the Data Day, today.
January 18th, 2012 — Archiving, eDiscovery, M&A
As if to underscore our belief that the cloud is set to play a bigger role in all things Information Management-related in 2012, Symantec announced this week that it had acquired cloud archiving specialist LiveOffice for $115m, its first acquisition in eight months (451 research clients can read the full deal-analysis report here.
Though the deal was not a huge surprise — some of LiveOfficeâs executive team (including CEO and COO) hail from Symantec, which has for the last year been reselling LiveOffice, rebranded as EnterpriseVault.Cloud â it is a significant endorsement of the cloud archiving market; a sub-sector that we have been following closely for a couple of years (we published a detailed, long-form report on the market in late 2010), but has yet to really come to life.
Symantec, which of course dominates the on-premise email archiving market, notes that about half of all archive deployments now go to the cloud. In this respect, cloud archiving is a market that it simply has to participate in more directly. Accordingly, LiveOffice provides Symantec with a better means of serving the smaller organizations that tend to opt for the cloud model, which requires far fewer skills and resources to set up and manage than on-prem models. Of course, it also means Symantec doesnât have to be religious about which model it promotes; whether on-prem, cloud or a hybrid of the two, it now caters to all requirements.
Symantec also made an interesting comment that LiveOffice is at the right point in its own development where the application of Symantecâs huge scale can help in growing the business, rather than be a hindrance. This is a refreshingly honest acknowledgement that it hasnât always got the balance right in the past; buy a company that is too small, and the weight of a giant like Symantec risks starving it of oxygen altogether, rather than fanning the flames that made it successful in the first place.
The question now is whether this move may help spark broader growth of the cloud archiving market. LiveOffice was one of the first cloud providers to archive other data types beyond email, and can now store and index a wide variety of data, including from social media, file servers, SharePoint and  even SaaS applications; as more data, workloads and applications move to the cloud, so cloud-based archiving will become more relevant. One big factor in the cloud playersâ favor is that email is increasingly going the hosted route, especially for SMEs; if you run corporate email as a service, then you arenât going to deploy an email archive on-premise.
All in all, we think this is a good move by Symantec, and one that could drive interest in the other cloud-archiving pure plays out there.