The Data Day, Today: Jan 13 2012

Splunk files for IPO. Oracle updates its price list. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Splunk Inc. Files Registration Statement for an Initial Public Offering And here it is.

* Oracle updated its Engineered System price list.

* Comparing Hadoop Appliances Great post from Pythian’s Gwen Shapira.

* What is big data? Edd Dumbill provides an introduction to the big data landscape.

* Why Couchbase? Damien Katz clarifies the reasons behind his preference for Couchbase over Apache CouchDB.

* Jaspersoft First to Develop Business Intelligence for Platform-as-a-Service BI suite now available with Red Hat OpenShift.

* Birst and ParAccel Partner to Deliver Scalable and Agile Big Data Analytics in the Cloud. Leverage.

* Recommind Names 451 Research Cofounder Nick Patience Director of Product Marketing and Strategy Our loss is Recommind’s gain.

* Oracle Unveils Oracle TimesTen In-Memory Database 11g Release 2 Performance and scalability improvements.

* Walkie Talkie App Voxer Soars Past a Billion Operations per Day powered by Basho Riak 10-4 good buddy.

* ISYS Search to Provide Enhanced Text Data Extraction Capabilities for New Generation of SAP Solutions OEM deal.

* Using SQLFire as a read-only cache for MySQL. VMware explains why and how.

* Announcing MySQL Enterprise Backup 3.7.0 Self-explanatory.

* Tableau Software Doubles Sales in 2011, Announces Massive Growth in Customer Roster Worldwide Customer base up by 40 percent in 2011.

* VoltDB Completes 2011 With Significant Market Growth and Company Expansion Including growth in new customer accounts of more than 300%.

* Clarabridge Wins Record Number of New Clients in 2011 More than 60 new Clarabridge Enterprise customers and more than 700 new Clarabridge Professional customers.

* For 451 Research clients

# Oracle selects Cloudera for Hadoop-based Big Data Appliance Market development report

# Microsoft may offer ‘big security data’ for free Analyst note

# Zimory considering virtual independence for cloud database business Market development report

# Jitterbit sheds light on growth strategy, integration business under new CEO Market development report

# SnapLogic snaps into the enterprise, shifts gaze away from midmarket integration Market development report

* Google News Search outlier of the day: My Best Friend’s Hair Launches Nationwide Website to Help You Find the Perfect Hairstylist

And that’s the Data Day, today.

The Data Day, Today: Jan 10 2012

Oracle OEMs Cloudera. The future of Apache CouchDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Oracle announced the general availability of Big Data Appliance, and an OEM agreement with Cloudera for CDH and Cloudera Manager.

* The Future of Apache CouchDB Cloudant confirms intention to integrate the core capabilities of BigCouch into Apache CouchDB.

* Reinforcing Couchbase’s Commitment to Open Source and CouchDB Couchbase CEO Bob Wiederhold attempts to clear up any confusion.

* Hortonworks Appoints Shaun Connolly to Vice President of Corporate Strategy Former vice president of product strategy at VMware.

* Splunk even more data with 4.3 Introducing the latest Splunk release.

* Announcement of Percona XtraDB Cluster (alpha release) Based on Galera.

* Bringing Value of Big Data to Business: SAP’s Integrated Strategy Forbes interview with with Sanjay Poonen, President and corporate officer of SAP Global Solutions.

* New Release of Oracle Database Firewall Extends Support to MySQL and Enhances Reporting Capabilities Self-explanatory.

* Big data and the disruption curve “Many efforts are being funded by business units and not the IT department and money is increasingly being diverted from large enterprise vendors.”

* Get your SQL Server database ready for SQL Azure Microsoft “codename” SQL Azure Compatibility Assessment.

* An update on Apache Hadoop 1.0 Cloudera’s Charles Zedlewski helpfully explains Apache Hadoop branch numbering.

* Xeround and the CAP Theorem So where does Xeround fit in the CAP Theorem?

* Can Yahoo’s new CEO Thompson harness big data, analytics? Larry Dignan thinks Scott Thompson might just be the right guy for the job.

* US Companies Face Big Hurdles in ‘Big Data’ Use “21% of respondents were unsure how to best define Big Data”

* Schedule Your Agenda for 2012 NoSQL Events Alex Popescu updates his list of the year’s key NoSQL events.

* DataStax take Apache Cassandra Mainstream in 2011; Poised for Growth and Innovation in 2012 The usual momentum round-up from DataStax.

* Objectivity claimed significant growth in adoption of its graph database, InfiniteGraph and flagship object database, Objectivity/DB.

* Cloudera Connector for Teradata 1.0.0 Self-explanatory.

* For 451 Research clients

# SAS delivers in-memory analytics for Teradata and Greenplum Market Development report

# With $84m in funding, Opera sets out predictive-analytics plans Market Development report

* Google News Search outlier of the day: First Dagger Fencing Competition in the World Scheduled for January 14, 2012

And that’s the Data Day, today.

VC funding for Hadoop and NoSQL tops $350m

451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.

According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.

That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.

A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.

The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.

I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.

The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.

Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.

One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.

It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.

While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.

451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.

What is the point of Hadoop?

Among the many calls we have fielded from users, investors and vendors about Apache Hadoop, the most common underlying question we hear could be paraphrased ‘what is the point of Hadoop?’.

It is a more fundamental question than ‘what analytic workloads is Hadoop used for’ and really gets to the heart of uncovering why businesses are deploying or considering deploying Apache Hadoop. Our research suggests there are three core roles:

– Big data storage: Hadoop as a system for storing large, unstructured, data sets
– Big data integration: Hadoop as a data ingestion/ETL layer
– Big data analytics: Hadoop as a platform new new exploratory analytic applications

While much of the attention for Apache Hadoop use-cases focuses on the innovative new analytic applications it has enabled in this latter role thanks to its high-profile adoption at Web properties, for more traditional enterprises and later adopters the first two, more mundane, roles are more likely the trigger for initial adoption. Indeed there are some good examples of these three roles representing an adoption continuum.

We also see the multiple roles playing out at a vendor level, with regards to strategies for Hadoop-related products. Oracle’s Big Data Appliance (451 coverage), for example, is focused very specifically on Apache Hadoop as a pre-processing layer for data to be analyzed in Oracle Database.

While Oracle focuses on Hadoop’s ETL role, it is no surprise that the other major incumbent vendors showing interest in Hadoop can be grouped into three main areas:

– Storage vendors
– Existing database/integration vendors
– Business intelligence/analytics vendors

The impact of these roles on vendor and user adoption plans will be reflected in my presentation at Hadoop World in November, the Blind Men and The Elephant.

You can help shape this presentation, and our ongoing research into Hadoop adoption drivers and trends, by taking our survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.

The significance of Oracle NoSQL

We have previously speculated at The 451 Group about Oracle’s potential to respond to the growing adoption of NoSQL databases, noting that the company had a number of options at its disposal, including Berkeley DB and projects like HandlerSocket.

While some may wonder about the potential impact of Oracle NoSQL (based indeed on Berkeley DB) on the existing NoSQL vendors, I believe the launch says something very significant about NoSQL itself: specifically that its adoption is driven by more than the nature of the query language.

To get a sense of why Oracle NoSQL is significant, think about the way Oracle has traditionally responded to alternative approaches that threaten the relational model and its dominance thereof. Oracle’s approach has traditionally been to subsume the alternative approach, at least in part, into Oracle Database, nullifying the competitive threat.

Oracle CEO Larry Ellison explained the approach himself on a recent call with investors:

“We think that data should be integrated with a single database technology. That’s always been our strategy for Oracle. And it started as a relational database then we added objects, then we added text and then we’ve added a variety of other things like video and audio to the Oracle Database. We think that should be unified and that’s how we’re approaching the problem.”

As we recently covered (451 clients only), Oracle is in the process of replicating this strategy with MySQL, adding support for the ability to directly access MySQL’s InnoDB and MySQL’s Cluster’s NDB storage engines using the memcached API.

This ability to perform non-SQL querying of the database is part of the agility benefit of NoSQL, and if the term NoSQL were to be taken literally would perhaps be enough to discourage would-be NoSQL adopters from turning away from MySQL.

As our NoSQL, NewSQL and Beyond report highlighted, however, agility is just one of six key trends we see driving adoption of NoSQL databases. Scalability, performance, relaxed consistency, intricacy and necessity will not be solved by the ability to query MySQL or MySQL Cluster using the memcached API.

The launch of Oracle NoSQL is therefore a clear indication that there are trends at work here that cannot be solved by adding non-SQL querying to existing relational databases.

There is another significant factor here, which is the fact that Oracle has chose to name the product NoSQL. In one simple naming move the company has effectively disarmed the NoSQL ‘movement’.

We have previously noted that existing NoSQL vendors were turning away from the term in favor of emphasizing their individual strengths. How many of them are going to want to self-identify with an Oracle product? I’m not convinced any of them believe the brand is worth fighting for.

Our big data/total data survey is now live

The 451 Group is conducting a survey into end user attitudes towards the potential benefits of ‘big data’ and new and emerging data management technologies.

Created in conjunction with TheInfoPro, a division of The 451 Group focused on real-world perspectives on the IT customer, the survey contains less than 20 questions and does not ask for details of specific projects. It does cover data volumes and complexity, as well as attitudes to emerging data management technologies – such as Hadoop and exploratory analytics, as well as NoSQL and NewSQL – for certain workloads.

In return for your participation, you will receive a copy of a forthcoming long-format report covering introducing Total Data, The 451 Group’s concept for explaining the changing data management landscape, which will include the results. Respondents will also have the opportunity to become members of TheInfoPro’s peer network.

The survey is expected to close in late October and we are also plan to provide a snapshot of the results in our presentation, The Blind Men and The Elephant, at Hadoop World in early November.

Many thanks in advance for your participation in this survey. We look forward to sharing the results with you. The survey can be found at http://bit.ly/451data

NoSQL Road Show, Hadoop Tuesdays and Hadoop World

I’ll be taking our data management research out on the road in the next few months with a number of events, webinars and presentations.

On October 12 I’m taking part in the NoSQL Road Show Amsterdam, with Basho, Trifork and Erlang Solutions, where I’ll be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.

The following week, October 18, I’m taking part in the Hadoop Tuesdays series of webinars, presented by Cloudera and Informatica, specifically talking about the Hadoop Ecosystem.

The Apache Hadoop ecosystem will again be the focus of attention on November 8 and 9, when I’ll be in New York for Hadoop World, presenting The Blind Men and the Elephant.

Then it’s back to NoSQL with two more stops on the NoSQL Road Show, in London on November 29 and Stockholm on December 1, where I’ll once again be presenting NoSQL, NewSQL, Big Data…Total Data – The Future of Enterprise Data Management.

I hope you can join us for at least one of these events, and am looking forward to learning a lot about NoSQL and Apache Hadoop adoption, interest and concerns.

Beyond ‘big data’

Alistair Croll published an interesting post this week entitled ‘there’s no such thing as big data’ in which he argued, prompted by a friend that “given how much traditional companies put [big data] to work, it might as well not exist.”

Tim O’Reilly continued the theme in his follow-up post, arguing:

“companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue”

There is much to agree with – in fact I have myself argued that when it comes to data, the key issue is not how much you have, but what you do with it. However, there is also a significant change of emphasis here from the underlying principles that have driven the interest in ‘big data’ in the last 12-18 months.

Compare Tim O’Reilly’s statement with the following, from Google’s seminal research paper The Unreasonable Effectiveness of Data:

“invariably, simple models and a lot of data trump more elaborate models based on less data”

While the two statements are not entirely contradictory, they do indicate a change in emphasis related to data. There has been so much emphasis of the ‘big’ in ‘big data’, as if the growing volume, variety and velocity of data itself would deliver improved business insights.

As I have argued in the introduction to our ‘total data’ management concept and the numerous presentations given on the subject this year, in order to deliver value from that data, you have to look beyond the nature of the data and consider what it is that the user wants to do with that data.

Specifically, we believe that one of the key factors in delivering value is companies focusing on storing and processing all of their data (or at least as much as is economically feasible) rather than analysing samples and extrapolating the results.

The other factor is time, and specifically how fast users can get to the results they are looking for. Another way of looking at this is in terms of the rate of query. Again, this is not about the nature of the data, but what the user wants to do with that data.

This focus on the rate of query has implications on the value of the data, as expressed in the following equation:

Value = (Volume ± Variety ± Velocity) x Totality/Time

The rate of query also has significant implications in terms of which technologies are deployed to store and process the data and to actually put the data to use in delivering business insight and value.

Getting back to the points made by Alistair and Tim in relation to the Unreasonable Effectiveness of Data, it would seem that to date there has been more focus on what Google referred to as “a lot of data”, and less on the “simple models” to deliver value from that data.

There is clearly a balance to be struck, and the answer lies not in ‘big data’ but “more clue” and defining and delivering those “simple models”.

Top Issues IT faces with Hadoop MapReduce: a Webinar with Platform Computing

Next Tuesday, August 3, at 8.30 AM PDT I’ll be taking part in a Webinar with Platform Computing to discuss the the benefits and challenges of Hadoop and MapReduce. Here’s the details:

With the explosion of data in the enterprise, especially unstructured data which constitutes about 80% of the total data in the enterprise, new tools and techniques are needed for business intelligence and big data processing. Apache Hadoop MapReduce is fast becoming the preferred solution for the analysis and processing of this data.

The speakers will address the issues facing enterprises deploying open source solutions. They will provide an overview of the solutions available for Big Data, discuss best practices, lessons learned, case studies and actionable plans to move your project forward.

To register for the event please visit the registration page.

Variety, Velocity, and Volume: a Webinar with Azul Systems

This Wednesday, August 3, at 9 AM PDT I’ll be taking part in a Webinar with Azul Systems to discuss the performance challenges of big data in the enterprise. Here’s the details:

“Big Data” is a hot topic and the concept of “Big Data” is a useful frame for the challenges of scaling petabyte or terabyte data that typically cannot be addressed with traditional technologies. However, Big Data is no longer just a challenge for large social media companies – enterprise can also benefit from understanding when and how to apply these technologies and architectures.

In this Webinar Matthew Aslett of the 451 Group reviews the taxonomy of Big Data and explains how organizations are employing new data management technologies and approaches to ensure that they turn the data deluge into more accurate and efficient operations.

Gil Tene, CTO and co-founder of Azul Systems, will then highlight in greater detail the infrastructure and building block choices for enterprise architects and how to address the performance, scalability, and velocity challenges of Big Data in the enterprise.

Key takeways:

  • New strategies for integrating Big Data applications within your existing infrastructure and operations
  • Tradeoffs between capacity and performance
  • The importance and challenges of Java for Big Data in the enterprise.
  • To register for the event please visit the registration page.