The Data Day, Today: August 31 2012

MongoDB. Informatica. Splunk. Stewart Downing. And more.

And that’s the Data Day, today.

The Data Day, Two days: August 29/30 2012

ParStream. MongoDB 2.2. Infochimps. BigQuery. And more.

And that’s the Data Day, today.

The Data Day, Today: Apr 11 2012

IBM launches Galileo database update. SAP outlines database roadmap. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Made in IBM Labs: New IBM Software Accelerates Decision Making in the Era of Big Data IBM launches DB2 10 and InfoSphere Warehouse 10.

* SAP Unveils Unified Strategy for Real-Time Data Management to Grow Database Market Leadership

* SAP Unveils Strategy to Gain Predictive Insights From Big Data

* TIBCO Delivers Breakthrough Software to Analyze Big Data in Motion

* TIBCO Announces Intent to Acquire LogLogic

* TIBCO Spotfire and Attivio Partner to Deliver New Levels of Integration and Discovery for Data and Content

* Mortar Data, Hadoop for the Rest of Us, Gets Seed Funding

* The coming in-memory database tipping point. Microsoft’s perspective on in-memory databases.

* Jaspersoft Extends Partnership with Talend to Deliver Big Data Integration

* Oracle to Hold MySQL Connect Conference in San Francisco September 29 and 30, 2012

* Percona XtraDB Cluster Open Source Software Provides a New Approach to High Availability MySQL

* Tokutek Brings Replication Performance to MySQL and MariaDB

* Continuent Announces Tungsten Enterprise 1.5 for Multi-Master, Multi-Region MySQL Data Services in the Amazon EC2

* SkySQL, hastexo Form Highly Available Partnership

* MySQL at Twitter Twitter releases its MySQL modifications under BSD license.

* Percona Bundles New Relic to Provide Gold and Platinum Support Customers with Comprehensive Application Visibility

* Percona Toolkit 2.1 for MySQL Enables Schema Changes without Scheduling Downtime

* Percona XtraBackup 2.0 for MySQL and Percona Server Provides Increased Performance

* Delphix Expands Agile Data Platform to Support Oracle Exadata

* Red Hat and 10gen Create Compelling Open Source Data Platform

* Announcing Pre-Production MongoDB Subscription from 10gen

* VoltDB Announces Version 2.5

* Red Hat Storage 2.0 Beta: Partners Test Big Data, Hadoop Support

* Sungard wants to sell you Hadoop as a service

* Actian and Lenovo Team to Optimize Big Data and Business Intelligence with New Appliance

* Objectivity Expands European Management Team With Former Sones Founder Mauricio Matthesius

* expressor Expands Data Integration Platform Into Big Data

* The Apache Software Foundation Announces Apache Sqoop as a Top-Level Project

* LucidDB has left Eigenbase moved to Apache License

* For 451 Research clients

# IBM looks to the stars with Galileo relational database update Impact Report

# Indicee eyes fresh VC as it establishes beachhead for cloud BI service using OEM sales Impact Report

# Percona launches XtraDB Cluster for MySQL database high availability Impact Report

# Tokutek targets replication performance with database update Impact Report

# ‘Big data’ in the datacenter: Vigilent secures $6.7m funding round Impact Report

And that’s the Data Day, today.

Update on the relative popularity of NoSQL database skills

Back in December we ran a series of posts looking at the geographic distribution of NoSQL skills, according to the results of searching LinkedIn member profiles, culminating in a look at the relative overall popularity of the major NoSQL databases.

This week I took another look at LinkedIn to update the results for a forthcoming report, which gives us the opportunity to see how the results have changed over the past quarter:

While this provides us with an interesting opportunity to track LinkedIn profile mentions over time there isn’t a huge amount we can learn from this first update – other than that MongoDB seems to be increasing its dominance.

The only significant change that isn’t immediately obvious from looking at the chart is that Apache HBase has overtaken Apache CouchDB by a tiny margin to claim third place overall.

As we noted last time, however, Apache HBase is more reliant on the US than other NosQL databases for its LinkedIn mentions: it is the second most prevalent NoSQL database mentioned in the USA but fourth in the rest of the world.

Two other points to take into consideration:

– The results for Apache Cassandra are probably disproportionately low since we have to search for the full phrase in order to avoid including people called Cassandra.

– Previously we only searched for Membase. This time we added together the search results for both Membase and Couchbase. This may mean the result for Couch/Membase is disproportionately high since some members probably listed both.

This is not meant to be a comprehensive analysis, however, but rather a snapshot of one particular data source.

The Data Day, Today: Feb 24 2012

Teradata partners with Hortonworks. New CEOs for Zettaset and VoltDB. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Teradata-Hortonworks Partnership to Accelerate Business Value from Big Data Technologies

* Skytree Unlocks the Advanced Analytics Power of Big Data with Unprecedented Performance, Scalability and Accuracy

* Big Data Innovator Zettaset Appoints Jim Vogt as New President and CEO

* Zettaset to Create Secure Hadoop with ‘SHadoop’ Initiative

* VoltDB Names Bruce Reading President and Chief Executive Officer

* Basho Unveils New Graphical Operations Dashboard, Diagnostics With Release of Riak 1.1

* Pervasive RushAnalyzer Launches ‘No Compromise’ Predictive Analytics for Hadoop and Big Data

* QlikTech Reveals Pricing for its QlikView Business Discovery Platform

* Kognitio Announces Completely Memory-Based Pricing

* Objectivity Adds New Plugin Framework, Integrated Visualizer And Support For Tinkerpop Blueprints To InfiniteGraph

* Announcing the Infochimps Platform for Big Data

* Big Data, Hadoop and StreamInsight

* Three New Cloud Providers join the MongoDB ecosystem

* Hadoop Has Promise but Also Problems

* Hortonworks: Reaffirming our Commitment to 100% Pure Open Source Despite speculation to the contrary.

* WhySQL? Evernote explains why it continues to use SQL databases.

* More on database consistency Anders Karlsson explains the different definitions of database consistency.

* Graphic proof of big demand for big data talent Or just graphic proof of use of phrase ‘big data’ in jobs ads?

* Will ‘big data’ transform your industry?

* For 451 Research clients

# CrowdFlower – it’s like Hadoop, but with people? Impact Report

# Teradata and Hortonworks strike Hadoop marketing and development deal Market Development report

# Hypertable reemerges with high-performance NoSQL database Market Development report

And that’s the Data Day, today.

The Data Day, today: Jan 5 2012

Apache Hadoop 1.0. The future of CouchDB (or Couchbase anyway). And more.

Welcome to the first in an occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* The Apache Software Foundation Announces Apache Hadoop v1.0 Self-explanatory.

* The Future of CouchDB Apache CouchDB creator Damien Katz explains why he is focusing his attention on Couchbase Server.

* Understanding Microsoft’s big-picture plans for Hadoop and Project Isotope Mary Jo Foley parses Alexander Stojanovic’s presentation.

* MongoDB Extends Leadership in NoSQL 10gen claims more than 400 commercial customers.

* 1010data’s Unique Big Data Analytics Platform Sees Stunning Growth in 2011 1010data runs the numbers on its adoption in 2011.

* TouchDB 1.0 is out TouchDB is a lightweight CouchDB-compatible database engine suitable for embedding into mobile apps.

* Data Scientist = Rock Star, Really? Virginia Backaitis is sceptical.

* Swimming with Dolphins Splunk’s connector for MySQL.

* What the Sumerians can teach us about data Pete Warden finds data inspiration at the British Museum.

* How To (Not) Get Smart About Big Data Wim Rampen on the importance of filtering noise.

* For 451 Research clients

# Total Data: exploratory analytic platforms Spotlight report

# Apache Hadoop reaches version 1.0, with more to come Analyst note

# Acunu hones focus on ‘big data’ platform for operational analytics Market development report

# Jaspersoft gets big into ‘big data,’ illuminates BI business momentum Market development report

* Google News Search outlier of the day: “Bella” Becomes Most Popular Name for Both Dogs and Cats

And that’s the Data Day, today.

The geographic distribution of NoSQL skills – just one more thing

Hidden away amongst the details of our little tour around LinkedIn statistics on NoSQL and Hadoop skills was some interesting information on how many LinkedIn members list the various data management technologies in our sample in their profiles.

Our original post contained the fact that there were 9,079 LinkedIn members with “Hadoop” in their member profiles, for example, compared to 366,084 with “MySQL” in their member profiles.

Later posts showed there were 170 with “Membase” and 1,687 with “HBase”, 787 with “Apache Cassandra” and 376 with “Riak”, 6,048 with “MongoDB” and 2,152 with “Redis”, and finally, 1,844 with “CouchDB” and 268 with “Neo4j”.

This gives us an interesting perspective on the relative adoption of the various NoSQL databases:

If it wasn’t already obvious from the list above, the chart illustrates just how much more prevalent MongoDB skills are compared to the other NoSQL databases, followed by Redis, Apache CouchDB, Apache HBase and Apache Cassandra. The chart also illustrates that while HBase is the second most prevalent NoSQL skill set in the USA, it is only fourth overall given its lower prevalence in the rest of the world.

In response, a representative from a certain vendor notes “Some skills are more valued not because they are more prevalent, but because they are harder to achieve.” Make of that what you will.

The geographic distribution of NoSQL skills: MongoDB and Redis

Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.

The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.

We’ve already taken a look at Membase and HBase, and Apache Cassandra and Riak. Part three examines the geographic spread of 10gen’s MongoDB and Redis.

The statistics showed that 41.0% of the 6,048 LinkedIn members with “MongoDB” in their member profiles are based in the US, putting MongoDB is the top half of the table for geographic spread.

Only 11.2% are in the Bay area, fewer than Hadoop, Membase, HBase, Cassandra, Riak and Redis. The results also indicate that the New York area is a hot-spot for MongoDB skills, with 6.2% – as one might expect given the location of 10gen’s HQ. Other hot-spots include Brazil (4.2%) and Ukraine (2.8%).

Redis is even more widely adopted, with only 37% of the 2,152 LinkedIn members with “Redis” in their member profiles are based in the US, although 12.0% are in the Bay area.

Ukraine is also a hot-spot for Redis skills (3.8%) as is France (3.6%) and Spain (2.9%).

The series will conclude later this week with CouchDB, and Neo4j.

N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker

The geographic distribution of Hadoop skills: in context

NC State University’s Institute for Advanced Analytics recently published some interesting statistics on Apache Hadoop adoption based on a search of LinkedIn data.

The statistics graphically illustrate what a lot of people wer already pretty sure of: that the geographic distribution of Hadoop skills (and presumably therefore adoption) is heavily weighted in favour of the USA, and in particular the San Francisco Bay Area.

The statistics showed that 64% of the 9,079 LinkedIn members with “Hadoop” in their member profiles (by no means perfect but an insightful measure nonetheless) are based in the US, and that the vast majority of those are in the Bay Area.

The results are what we would expect to see given the relative level of immaturity of Apache Hadoop adoption, as well as the nature and location of the early Hadoop adopters and Hadoop-related vendors.

The results got me thinking two things:
– how does the geographic spread compare to a more maturely adopted project?
– how does it compare to the various NoSQL projects?

So I did some searching of LinkedIn to find out.

To answer the first question I performed the same search for MySQL, as an example of a mature, widely-adopted open source project.

The results show that just 32% of the 366,084 LinkedIn members with “MySQL” in their member profiles are based in the US (precisely half that of Hadoop) while only 4.4% are in the Bay area, compared to 28.2% of the 9,079 LinkedIn members with “Hadoop” in their member profiles.

The charts below illustrate the difference in geographic distribution between Hadoop and MySQL. The size of the boxes is in proportion to the search result (click each image for a larger version).

With regards to the second question, I also ran searches for MongoDB, Riak, CouchDB, Apache Cassandra*, Membase*, Neo4j, Hbase, and Redis.

I’ll be posting the results for each of those over the next week or so, but in the meantime, the graphic below shows the split between the USA and Rest of the World (ROW) for all ten projects.

It illustrates, as I suspected, that the distribution of skills for NoSQL databases is more geographically disperse than for Hadoop.

I have some theories as to why that is – but I’d love to hear anyone else’s take on the results.

*I had to use the ‘Apache’ qualifier with Cassandra to filer out anyone called Cassandra, while Membase returned a more statistically relevant result than Couchbase.

World map image: Owen Blacker

VC funding for Hadoop and NoSQL tops $350m

451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.

According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.

That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.

A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.

The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.

I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.

The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.

Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.

One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.

It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.

While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.

451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.