The Data Day, Today: August 8 2012

Who loves Hadoop? Who doesn’t?

And that’s the Data Day, today.

The Data Day, Today: May 8 2012

IBM acquires Vivisimo. Funding for Birst, ParAccel, Metamarkets and DataSift. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* For 451 Research clients

# IBM picks up Vivisimo to search for value in ‘big data’ Deal Analysis

# Teradata delivers on analytic cloud vision with Active Data Warehouse Private Cloud Impact Report

# The Big Blue picture for ‘big data’ analytics: IBM sheds light on BigSheets Impact Report

# Oversight Systems’ Continuous Analysis extracts actionable insight from data Impact Report

# Kalido updates MDM offering with business users, operationalizing master data in mind Impact Report

# Delphix reaps reward from agile approach to database virtualization Impact Report

# Automated Insights looks to pitch narrative, visuals and stats to enterprises Impact Report

# myDIALS eyes indirect sales in quest to be Internet access layer for analytics Impact Report

* IBM Advances Big Data Analytics with Acquisition of Vivisimo Also announces support for Cloudera.

* Teradata Announces 2012 First Quarter Results Revenue up 21% (PDF)

* Actuate Reports First Quarter 2012 Financial Results Revenue up 9% (PDF)

* Birst Secures $26 Million in Financing Led By Sequoia Capital

* ParAccel Closes Record Q1 Revenues and $20 Million Investment Round

* Metamarkets Raises $15 Million to Deliver Data Science-as-a-Service

* DataSift adds $7.2M: The story so far and focus for the future

* Teradata to Acquire eCircle (PDF)

* Google BigQuery brings Big Data analytics to all businesses

* TIBCO Spotfire Brings the Power of Data Discovery to Big Data and Extreme Information

* Jaspersoft Teams with VMware To Deliver Business Intelligence for Data-Driven Cloud Applications

* Kalido and Teradata Sign Global Reseller Agreement

* Actuate Announces Cloudera Alliance to Support Apache Hadoop and BIRT Developers in Big Data Integration

* Hortonworks and Kognitio Announce Technical Partnership Driving Apache Hadoop Adoption in Big Data Analytics Implementations

* Tokutek and PalominoDB Partner to Bring Scale, Performance to Database Deployments

* Acunu is pleased to announce v2 of the Acunu Data Platform!

* Is Yahoo really threatening memcached and Open Compute?

* Introducing Zend DBi as a MySQL Replacement on IBM i

* Zettaset and Hyve Solutions Build First Fully Integrated Enterprise OS Hadoop Solution

* Cloudera Announces New Japanese Subsidiary

* Bull Announces the Formation of Database Migration Business Unit

* Couchbase to Run Native with Key-Value API for ioMemory

* The Big Data Value Continuum

* Big Data is Business Intelligence plus Attention Deficit Disorder

* Nokia released Dempsy an open source stream data processing platform.

And that’s the Data Day, today.

The Data Day, Today: Apr 25 2012

Splunk soars on IPO. VMware acquires Cetas. Vertica retain autonomy. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* For 451 Research clients

# Splunk IPO: $3bn and counting M&A Insight

# VMware snaps up Cetas Software for ‘big data’ analytics Deal Analysis

# HP’s Vertica retains its autonomy, continues integration with Autonomy Impact Report

# SAP makes long-awaited predictive analytics move of its own Impact Report

# Sanbolic pitches data management platform for server, desktop and database consolidation Impact Report

* Splunk IPO kills, lives up to expectations

* VMware acquires Cetas Software for Cloud and Big Data Analytics

* Opera Solutions Acquires Procurement Analytics Tools and Services from BIQ and Lexington Analytics

* Terascala Announces $14M Series B Funding Round Led by Strategic Partner Consortium

* Ravel Acquired by W2O Group To Expand Big Data Client Services And Enrich In-House Analytics and Insights Technology

* Teradata Active Data Warehouses Provide Private Cloud Benefits

* Pentaho Introduces New Interactive Visualization and Expanded Big Data Analytics

* Teradata Unveils New Purpose-Built Appliance for SAS High-Performance Analytics

* SAP Establishes Global Managing Board to Lead Company

* Oracle to Hadoop Under OneAppliance: GridIron Introduces First All-Flash Appliance Line With Unprecedented Performance to Tackle Unified Big Data Processing

* Lucid Imagination Technology Integration with SugarCRM Lets Customers Enjoy Improved Global Search Capabilities with Apache Lucene/Solr

* The Apache Software Foundation Announces Apache Cassandra v1.1

* Miso project: how it will help you make your own Guardian-style infographics and data visualisations

And that’s the Data Day, today.

What’s in a name? Analyzing ‘Dropbox for the enterprise’

We’ve been spending a good deal of time lately talking to vendors looking to deliver ‘Dropbox-for-the-enterprise’ alternatives.  By this, providers generally mean that they enable users to sync and share their files across desktops and devices, but in a way that is palatable to corporate IT departments.   I’d say we really started to see this activity in earnest about a year ago, when Box started getting serious about the enterprise market and I began to get a lot of briefing requests from the likes of Accellion, Egnyte and others about their enterprise file sharing and sync offerings.  Things really started heating up later in 2011, as we saw VMWare announce its Dropbox-for-the-enterprise in August, Citrix acquire ShareFile in October; open source play ownCloud set sail in December and we recently initiated coverage on another startup, Germany-based TeamDrive.

These are only a few of the movements in this emerging market. Things will only become more active in 2012. Perhaps one of the more notable features is the broad background of players entering this space – we see vendors from virtualization, security, storage, content management and mobiltity sectors all vying for attention. This is likely to cause an awful lot of noise, and consfusion.

Compounding the matter is that everyone in this market seems to be struggling with what exactly to call it.  “Enterprise-grade Dropbox” neatly encapsulates it, but it’s not really a viable way to refer to a market segment.  We put out a report on ‘cloud file sharing’ late in 2011, but that really is a broader focus and doesn’t really capture what is important and different about this segment in particular.  Dropbox is a obviously a cloud service and many of the players that want to offer Dropbox-like services are as well.  But while the cloud certainly *can* be enabling an enabling technology, it doesn’t have to be.  Indeed, a number of players, such as Accellion, Egnyte, GroupLogic, ownCloud, Oxygen Cloud and, presumably, VMWare when it gets to it, are offering private-cloud or on-premises approaches for file sharing and sync.

So we’ve settled on Mobile File Sharing and Sync Platforms as the way that we are going to refer to this segment, at least for now.   The mobility part of this, as opposed to cloud, is what is really new and disruptive.  That is what drives the need for sync and native apps for specific device types.  We also think it is important to identify these emerging products, including Dropbox itself, as ‘platforms’ since we suspect there will be ample opportunity moving forward for customization and plug-ins to these tools.  We are already seeing some of these in the areas of security, content management and collaboration for Dropbox specifically.

Calling a set of Dropbox-like capabilities a platform is interesting, though we can also flip the conversation on its head and wonder whether sync is really a feature, as others are doing.  The answer may well be that it is both.  In the enterprise, it certainly makes sense as a feature of content management, collaboration and even storage offerings, since business content is generally part of broader business processes and often needs to be retained for compliance reasons.   IT also wants to get the most out of existing investments. We are already seeing sync as a feature from the likes of OpenText and Huddle, and this is arguably Box’s approach as well.  We also have partnerships between the likes of Oxygen Cloud and EMC, to layer a sync service on top of storage infrastructure.

We take a more extensive look at the market for Mobile File Sharing and Sync Platforms in a recent report (login required) for 451 clients.  This report looks at user and IT requirements and provides more detail on the enterprise players we’ve begun to track. How this market plays out exactly over time remains to be seen, but we think it has the potential to be extremely disruptive. For that reason it’s a space we’ll continue to watch closely, and from multiple vantage points.

The Data Day, Today: Jan 24 2012

Thoughts on Splunk’s IPO and DynamoDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Thoughts on the Splunk IPO and S-1 By Dave Kellogg.

* Thoughts on SimpleDB, DynamoDB and Cassandra By Adrian Cockcroft.

* Recommind’s Revenue Leaps 95% in Record-Setting 2011 Predictable.

* Hewlett-Packard Expands to Cambridge via Vertica’s “Big Data” Center Moving.

* Announcing SkySQL Enterprise HA for the MariaDB & MySQL databases

* Membase Server is Now Couchbase Server But not *the* Couchbase Server.

* Cloudera Teams With O’Reilly Media to Merge Hadoop World and Strata Conferences

* Survey results: How businesses are adopting and dealing with data 100 Strata Online Conference attendees.

* Big data market survey: Hadoop solutions

* LinkedIn released SenseiDB, an open source distributed, realtime, semi-structured database.

* For 451 Research clients

# VMware: not your father’s database company Impact Report

# Sparsity Technologies draws up plans for graph database adoption Impact Report

# Amazon launches DynamoDB, an auto-configuring database as a service Market Development report

# NuoDB targets Q2 release for elastic relational database Market Development report

# ADVIZOR illuminates growth strategy, roadmap in data discovery and analysis Market Development report

# Birst adds own analytic engine for BI, OEM agreement with ParAccel Market Development report

* Google News Search outlier of the day: RentAGrandma.com Recruiting Wonderful Grandmas

And that’s the Data Day, today.

The Data Day, Today: Jan 13 2012

Splunk files for IPO. Oracle updates its price list. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Splunk Inc. Files Registration Statement for an Initial Public Offering And here it is.

* Oracle updated its Engineered System price list.

* Comparing Hadoop Appliances Great post from Pythian’s Gwen Shapira.

* What is big data? Edd Dumbill provides an introduction to the big data landscape.

* Why Couchbase? Damien Katz clarifies the reasons behind his preference for Couchbase over Apache CouchDB.

* Jaspersoft First to Develop Business Intelligence for Platform-as-a-Service BI suite now available with Red Hat OpenShift.

* Birst and ParAccel Partner to Deliver Scalable and Agile Big Data Analytics in the Cloud. Leverage.

* Recommind Names 451 Research Cofounder Nick Patience Director of Product Marketing and Strategy Our loss is Recommind’s gain.

* Oracle Unveils Oracle TimesTen In-Memory Database 11g Release 2 Performance and scalability improvements.

* Walkie Talkie App Voxer Soars Past a Billion Operations per Day powered by Basho Riak 10-4 good buddy.

* ISYS Search to Provide Enhanced Text Data Extraction Capabilities for New Generation of SAP Solutions OEM deal.

* Using SQLFire as a read-only cache for MySQL. VMware explains why and how.

* Announcing MySQL Enterprise Backup 3.7.0 Self-explanatory.

* Tableau Software Doubles Sales in 2011, Announces Massive Growth in Customer Roster Worldwide Customer base up by 40 percent in 2011.

* VoltDB Completes 2011 With Significant Market Growth and Company Expansion Including growth in new customer accounts of more than 300%.

* Clarabridge Wins Record Number of New Clients in 2011 More than 60 new Clarabridge Enterprise customers and more than 700 new Clarabridge Professional customers.

* For 451 Research clients

# Oracle selects Cloudera for Hadoop-based Big Data Appliance Market development report

# Microsoft may offer ‘big security data’ for free Analyst note

# Zimory considering virtual independence for cloud database business Market development report

# Jitterbit sheds light on growth strategy, integration business under new CEO Market development report

# SnapLogic snaps into the enterprise, shifts gaze away from midmarket integration Market development report

* Google News Search outlier of the day: My Best Friend’s Hair Launches Nationwide Website to Help You Find the Perfect Hairstylist

And that’s the Data Day, today.

Who is hiring Hadoop and MapReduce skills?

Continuing my recent exploration of Indeed.com’s job posting trends and data I have recently been taking a look at which organizations (excluding recruitment firms) are hiring Hadoop and MapReduce skills. The results are pretty interesting.

When it comes to who is hiring Hadoop skills, the answer, put simply, is Amazon, or more generally new media:


Source: Indeed.com Correct as of August 2, 2011

This is indicative of the early stage of adoption, and perhaps reflects the fact that many new media Hadoop adopters have chosen to self-support rather than turn to the Hadoop support providers/distributors.

It is no surprise to see those vendors also listed as they look to staff up to meet the expected levels of enterprise adoption (and it is worth noting that Amazon could also be included in the vendors category, given its Elastic MapReduce service).

Fascinating to see that of the vendors, VMware currently has the most job postings on Indeed.com referencing Hadoop, while Microsoft also makes an appearance.

Meanwhile the appearance of Northrop Grumman and Sears Holdings on this list indicates the potential for adoption in more traditional data management adopters, such as government and retail.

It is interesting to compare the results for Hadoop job postings with those mentioning Teradata, which shows a much more varied selection of retail, health, telecoms, and financial services providers, as well as systems integrators, government contractors, new media and vendors.

It is also interesting to compare Hadoop-related bog postings with those specifying MapReduce skills. There are a lot less of them, for a start, and while new media companies are well-represented, there is much greater interest from government contractors.


Source: Indeed.com Correct as of August 2, 2011

Categorizing the “Foo” fighters – making sense of NoSQL

One of the essential problems with the covering the NoSQL movement is that it describes not what the associated databases are, but what they are not (and doesn’t even do that very well since SQL itself is in many cases orthogonal to the problem the databases are designed to solve).

It is interesting to see fellow analyst Curt Monash facing the same problem. As he notes, while there seems to be a common theme that “NoSQL is Foo without joins and transactions,” no one has adequately defined what “Foo” is.

Curt has proposed HVSP (High-Volume Simple Processing) as an alternative to NoSQL, and while I’m not jumping on the bandwagon just yet, it does pass the Ronseal test (it does what it says on the tin), and it also matches my view of what defines these distributed data store technologies.

Some observations:

  • I agree with Curt’s view that object-oriented and XML databases should not be considered part of this new breed of distributed data store technologies. There is a danger that NoSQL simply comes to mean non-relational.
  • I also agree that MapReduce and Hadoop should not be considered part of this category of data management technologies (which is somewhat ironic since if there is any technology for which the terms NoSQL or Not Only SQL are applicable, it is MapReduce).
  • The vendors associated with the NoSQL movement (Basho, Couchio and MongoDB) are in a problematic position. While they are benefiting from, and to some extent encouraging, interest in NoSQL, the overall term masks their individual benefits. My sense is they will look to move away from it sooner rather than later.
  • Memcached is not a key value store. It is a cache. Hence the name.
  • .
    There are numerous categorizations of the various NoSQL technologies available on the Internet. Without wishing to add yet another to the mix, I have created another one – more for my benefit than anything else.

    It includes a list of users for the various projects (where available), and also some sense of whether the various projects fit into CAP Theorem, an understanding of which is, to my mind, essential for understanding how and why the NoSQL/HVSP movement has emerged (look out for more on CAP Theorem in a follow-up post on alternatives to NoSQL).

    Here’s my take, for those that are interested. As you can see there’s a graph database-shaped whole in my knowledge. I’m hoping to fill that sooner rather than later.

    By the way, our Spotlight report introducing The 451 Group’s formal coverage of NoSQL databases will be available here imminently.

    Update: VMware has announced that it has hired Redis creator Salvatore Sanfilippo, and is taking on the Redis key value store project. The image below has been updated to reflect that, as well as the launch of NorthScale’s Membase.