The Data Day: January 27, 2017

Alternative data platforms and analytics facts.

And that’s the data day, today.

7 Hadoop questions. Q2: Hadoop infrastructure choices

What is your preferred infrastructure for Hadoop deployments? That’s one of the primary questions being asked in the 451 Research 2013 Hadoop survey. The answer will have significant implications for the future direction of Hadoop.

hadoop-elephant

While one of the primary benefits of Hadoop – low cost data storage – means that for many organisations the primary infrastructure for Hadoop has been commodity hardware, many systems and storage vendors now offer their own dedicated appliances and/or reference architecture for Hadoop.

We expect to see more of these dedicated Hadoop configurations as the incumbent infrastructure vendors look to cash-in on Hadoop adoption and try to add greater value.

We also see some companies exploring the potential for Hadoop in the cloud, as well as hosted deployments, and on virtual infrastructure – although those are arguably in the early stages of technical maturity, and adoption.

survey

Which infrastructure configurations are most popular? That’s one of the things our survey is designed to find out. The early results perhaps unsurprisingly indicate a greater preference for Hadoop being deployed on commodity hardware. However, cloud and virtual deployments have also scored well.

Interestingly, the early results show the preference for Hadoop on cloud infrastructure is significantly higher among respondents that are still in the development and test stage with Hadoop, which supports our anecdotal evidence about the use-cases for Hadoop in the cloud.

In order to get a little more detail on deployment preferences, the survey also asks about the level of consideration, testing and adoption for dedicated Hadoop hardware and Hadoop-as-a-service offerings respectively.

Among the choices in the dedicated hardware category are offerings from DataDirect Networks, Dell, HP, Oracle, IBM, Pivotal, Teradata, Cisco and NetApp.

The choices in the Hadoop-as-a-service category include Altiscale, Amazon EMR (including MapR), MapR on Google Compute Engine, Microsoft Windows Azure HDInsight Service, Mortar Data, Qubole, Rackspace Big Data, SunGard Unified Analytics Services and Treasure Data.

To give your view on this and other questions related to the adoption of Hadoop, please take our 451 Research 2013 Hadoop survey.

The Data Day, A few days: September 2-6 2013

Where database startups go to die. And more.

And that’s the data day, today.

Forthcoming webinar: Big Data Reconsidered

Next Tuesday, August 13, 2013 at 1:00 PM – 2:00 PM EDT I’ll be presenting a 451 Research webinar entitled Big Data Reconsidered: Separating Hype from Reality for Hosting and Cloud Providers.

Big Data and the Cloud are two of the most hyped terms in the history of the IT industry, and together, would appear to provide opportunities for hosting and cloud providers to offer new revenue-generating services.

In this session I will go back to basics with a hype-free overview of Big Data and the opportunities it provides for hosting and cloud providers, while also previewing the Big Data-related sessions attendees can look forward to at HCTS 2013 in Las Vegas next month.

For more details and to register, click here.

Forthcoming webinar: How to Take Advantage of NewSQL in the Cloud

On February 21, at 10:00am PST / 1:00pm EST, I’ll be taking part in a webinar – How to Take Advantage of NewSQL in the Cloud – in conjunction with Clustrix.

In this free webinar I, along with Mark Sarbiewski, Clustrix CMO, will discuss:

  • The current cloud database inflection point – and how that affects you and your company
  • How to migrate your SQL database to the cloud
  • How to get effortless scale from your database in public or private clouds
  • How to ensure database availability in the cloud for business critical applications

For full details and registration, click here.

The Data Day, Today: October 17 2012

Teradata launches Aster/Hadoop appliance. SAP takes HANA to the cloud.

And that’s the Data Day, today.

The Data Day, Two days: August 27/28 2012

Citrusleaf. Aerospike. AlchemyDB. Sqrrl. Percolator. Dremel. Pregel. And more.

And that’s the Data Day, today.

Previewing Information Management in 2012

Every New Year affords us the opportunity to dust down our collective crystal balls and predict what we think will be the key trends and technologies dominating our respective coverage areas over the coming 12 months.We at 451 Research just published our 2012 Preview report; at almost 100 pages it’s a monster, but offers some great insights across twelve technology subsectors, spanning from managed hosting and the future of cloud to the emergence of software-defined networking and solid state storage; and everything in between. The report is available to both 451Research clients and non-clients (in return for a few details); access the landing page here.  There’s a press release of highlights here. Also, mark your diaries for a webinar discussing report highlights on Thursday Feb 9 at noon ET, which will be open for clients and non-clients to attend. Registration details to follow soon…

Here are a selection of key takeaways from the first part of the Information Management preview, which focuses on information governance, ediscovery, search, collaboration and file sharing. (Matt Aslett will be posting highlights of part 2, which focuses more on data management and analytics, shortly.)

  • One of the most obvious common themes that will continue to influence technology spending decisions in the coming year is the impact of continued explosive data and information growth.  This  continues to shape new legal frameworks and technology stacks around information governance and e-discovery, as well as to drive a new breed of applications growing up around what we term the ‘Total Data’ landscape.
  • Data volumes and distributed data drive the need for more automation and auto-classification capabilities will continue to emerge more successfully in e-discovery, information governance and data protection veins — indeed, we expect to see more intersection between these, as we noted in a recent post.
  • The maturing of the cloud model – especially as it relates to file sharing and collaboration, but also from a more structured database perspective – will drive new opportunities and challenges for IT professionals in the coming year.  Looks like 2012 may be the year of ‘Dropbox for the enterprise.’
  • One of the big emerging issues that rose to the fore in 2011, and is bound to get more attention as the New Year proceeds, is around the dearth of IT and business skills in some of these areas, without which the industry at large will struggle to harness and truly exploit the attendant opportunities.
  • The changes in information management in recent years have encouraged (or forced) collaboration between IT departments, as well as between IT and other functions. Although this highlights that many of the issues here are as much about people and processes as they are about technology, the organizations able to leap ahead in 2012 will be those that can most effectively manage the interaction of all three.
  • We also see more movement of underlying information management infrastructures into the applications arena.  This is true with search-based applications, as well as in the Web-experience management vein, which moves beyond pure Web content management.  And while Microsoft SharePoint continues to gain adoption as a base layer of content-management infrastructure, there is also growth in the ISV community that can extend SharePoint into different areas at the application-level.

There is a lot more in the report about proposed changes in the e-discovery arena, advances of the cloud, enterprise search and impact of mobile devices and bring-your-device-to-work on information management.

DLP and e-discovery: two sides of the same governance coin?

We commented recently on Symantec’s acquisition of cloud archiving specialist LiveOffice. The announcement also afforded Big Yellow an opportunity to unveil what it calls “Intelligent Information Governance;” an over-arching theme that provides the context for some of the product-level integrations it has been working on. For example, it just announced improved integration between its Clearwell eDiscovery suite and its on-premise archive software, EnterpriseVault (stay tuned for more on this following LegalTech later this month).

There’s clearly an opportunity to go deeper than product-level ‘integration,’ however.  In a blog post, Symantec VP Brian Dye raised an issue that we have been seeing for a while, especially among some of our larger end-user clients. In the post, Brian discusses the fundamental contention that all of us – from individuals to corporations to governments — face around information governance — striking the right balance between control of information and freedom of information.

Software has emerged to help us manage this contention, most typically through data loss prevention (DLP) tools – to control what data does and doesn’t leave the organization — and eDiscovery and records management tools, to control what data is retained, and for how long. Brian noted that there is an opportunity to do much more here by linking the two sides of what is in many ways the same coin, for example by sharing the classification schemes used to define and manage critical and confidential information.

This is an idea that we have discussed at length internally, with some of our larger end-user clients, and with a good few security and IM vendors. Notably, many vendors responded by telling us that, though a good idea in principle, in reality organizations are too siloed to get value from such capabilities; DLP is owned and operated by the security team, while eDiscovery is managed by legal, records management and technology teams. While some of the end-users we have discussed this with are certainly siloed to a point, they are also working to address this issue by developing a more collaborative approach, establishing cross-functional teams, and so on.

A cynic would point out that some self interest might be at play here too from a vendor perspective; why sell one integrated product to a company when you can sell them essentially the same technology twice. But of course, we’re not the remotest bit cynical (!)  There is also the reality that at most large vendors, product portfolios have been put together at least in part by acquisitions.  Security and e-discovery products may be sold separately because they are, in fact, separate products with little to no integration in terms of products or sales organizations.  And vendors may not yet be motivated to do the hard integration work (technically, organizationally), if they are not seeing consistent enough demand from consolidated buying teams at large organizations.

Wendy Nather, Research Director of our security practice, notes that such integration is desirable;

– Users don’t WANT to have meta-thoughts about their data; they just want to get their work done, which is why it’s hard to implement a user-driven classification process for DLP or for governance.  The alternative is a top-down implementation, and that would work even better with only one ‘top’ — that is, the security and legal teams working from the same integrated page.

However, Wendy also notes that such an approach is itself not without complexity;

– Confidential data can be highly contextual in nature (for example, when data samples get small enough to identify individuals, triggering HIPAA or FERPA); you need advanced analytics on top of your DLP to trigger a re-classification when this happens.  Why, you might even call this Data Event Management (DEM).

It’s notable that Symantec is now starting to talk up the notion of a unified, or converged approach to data classification. Of course, it is one of the better-positioned vendors to take advantage here, given its acquisitions in both DLP (Vontu in 2007) and eDiscovery (Clearwell in 2011), while LiveOffice adds some intriguing options for doing some of this in the cloud (especially if merged with its hosted security offerings from MessageLabs).

Nonetheless, we look forward to hearing more from Symantec — and others — about progress here through 2012. Indeed, if you are attending LegalTech in New York in a couple of weeks, then our eDiscovery analyst David Horrigan would love to hear your thoughts. Additionally, senior security analyst Steve Coplan will be taking a longer look at the convergence of data management and security in his upcoming report on “The Identities of Data.”

In other words, this is a topic that we’re expending a fair amount of energy on ourselves; watch this space!

Vendors are lining up to get into the cloud file-sharing ‘box’

We recently published a spotlight report on cloud file sharing and sync, file backup and file-oriented collaboration in the cloud – and all the overlaps and intersections between these areas.  The full report is available here for 451 Research subscribers (link requires log-in).

The idea was to shed some light on this sector that often seems to be described by its two best known players – Dropbox and Box.  Despite the similar names, the services offered by these two providers have significant differences.  And each is after a different, though in some cases overlapping, target market.

Dropbox in particular seems to be gaining a lot of attention from enterprise IT departments — and it’s not all good.  As compliance, security, risk and IT folks in general try to get their arms around the fact that corporate data is moving to Dropbox (and other services), a number of providers have started to look at providing alternatives.  All of this largely driven of course by the widespread use of iPads and other mobile devices by business users and their need to access files from these devices and keep them in sync across mobile and desktop systems.  Box exploits this requirement as well, but offers more file-oriented collaboration capabilities, though not full-blown content management in the traditional sense.

Cloud file sharing, sync and mobile support for file collaboration will all be hot topics in 2012.  We feel we might quickly be inundated by the number of providers that want to offer some kind of alternative to Dropbox to appease IT departments and/or better mobile access to existing enterprise content systems, like SharePoint.  Below is our first-stab attempt to start to map some of this to the sub-sectors within this broader and rapidly shifting landscape.  And we know it’s not comprehensive, the players here are changing almost daily.

Cloud file backup, sharing, sync and collaboration providers

Source: 451 Research