February 3rd, 2012 — Data management
New CEO at Revolution. Pentaho goes big data. EMC Hadoop gets Isilon. And more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Revolution Analytics Names David Rich New CEO
* Pentaho Open Sources Big Data Capabilities to Further Fuel Widespread Adoption
* EMC Isilon is Industry’s First Scale-Out NAS System with Native Hadoop Support
* Actuate Reports Fourth Quarter and Fiscal Year 2011 Financial Results
* Sumo Logic Raises $15M Series B Round for Next Generation Log Management and Analytics
* Announcing Oracle R Enterprise 1.0
* Paul Cormier Joins Hortonworks’ Board of Directors
* DataStax Launches First Complete Solution for Cassandra Development on Windows and Mac
* Latest Release of Kalido Information Engine Eliminates Data Mart Migration and Consolidation Hassles
* Karmasphere Brings More Power, Collaboration, and Faster Insights to Big Data Analytics Teams on Hadoop
* Why Big Data Won’t Make You Smart, Rich, Or Pretty
* SAP HANA – slowly moving out of hype into actual projects
* For 451 Research clients
# Actuate gets ready to go shopping in the ‘big data’ mall Acquirer IQ
# Couchbase cites enterprise adoption, clarifies distributed NoSQL database strategy Impact report
# SpagoBI illuminates 2012 roadmap, takes open source model to US, Latin America Impact report
# Customer data analysis provider nPario combines big data and smart segmentation Impact report
# Tableau details 2012 growth strategy, gets semantic for visual analytics Market development report
# EMC integrates re-branded Hadoop distribution with Isilon NAS Market development report
# Quiterian seeks funding for new customer analytics in the cloud focus Market development report
# Hortonworks refines its commercial strategy for Apache Hadoop Market development report
# Digital Reasoning pledges to automate the analysis of complex data Market development report
And that’s the Data Day, today.
January 30th, 2012 — Data management
I put this slide together for my own benefit as I was trying to keep track of the various incarnations of Couchbase’s brands. Looks like I wasn’t the only one, so I thought I’d also make our perspective available.
There are a couple of differences between our slide and Koji Kawamura’s:
Ours contains an extra layer of names (e.g. “Elastic Couchbase”) that were briefly used by Couchbase in discussion and I believe in marketing, although never for shipping product.
Also ours doesn’t mention memcached. It could be on there given that Membase is based on it, and Couchbase Server can still be deployed in “memcached only mode”, but in that sense it is a feature of Membase/Couchbase Server. And anyway, I couldn’t fit it on 🙂
January 24th, 2012 — Data management
January 13th, 2012 — Data management
January 10th, 2012 — Data management
Oracle OEMs Cloudera. The future of Apache CouchDB. And more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Oracle announced the general availability of Big Data Appliance, and an OEM agreement with Cloudera for CDH and Cloudera Manager.
* The Future of Apache CouchDB Cloudant confirms intention to integrate the core capabilities of BigCouch into Apache CouchDB.
* Reinforcing Couchbase’s Commitment to Open Source and CouchDB Couchbase CEO Bob Wiederhold attempts to clear up any confusion.
* Hortonworks Appoints Shaun Connolly to Vice President of Corporate Strategy Former vice president of product strategy at VMware.
* Splunk even more data with 4.3 Introducing the latest Splunk release.
* Announcement of Percona XtraDB Cluster (alpha release) Based on Galera.
* Bringing Value of Big Data to Business: SAP’s Integrated Strategy Forbes interview with with Sanjay Poonen, President and corporate officer of SAP Global Solutions.
* New Release of Oracle Database Firewall Extends Support to MySQL and Enhances Reporting Capabilities Self-explanatory.
* Big data and the disruption curve “Many efforts are being funded by business units and not the IT department and money is increasingly being diverted from large enterprise vendors.”
* Get your SQL Server database ready for SQL Azure Microsoft “codename” SQL Azure Compatibility Assessment.
* An update on Apache Hadoop 1.0 Cloudera’s Charles Zedlewski helpfully explains Apache Hadoop branch numbering.
* Xeround and the CAP Theorem So where does Xeround fit in the CAP Theorem?
* Can Yahoo’s new CEO Thompson harness big data, analytics? Larry Dignan thinks Scott Thompson might just be the right guy for the job.
* US Companies Face Big Hurdles in ‘Big Data’ Use “21% of respondents were unsure how to best define Big Data”
* Schedule Your Agenda for 2012 NoSQL Events Alex Popescu updates his list of the year’s key NoSQL events.
* DataStax take Apache Cassandra Mainstream in 2011; Poised for Growth and Innovation in 2012 The usual momentum round-up from DataStax.
* Objectivity claimed significant growth in adoption of its graph database, InfiniteGraph and flagship object database, Objectivity/DB.
* Cloudera Connector for Teradata 1.0.0 Self-explanatory.
* For 451 Research clients
# SAS delivers in-memory analytics for Teradata and Greenplum Market Development report
# With $84m in funding, Opera sets out predictive-analytics plans Market Development report
* Google News Search outlier of the day: First Dagger Fencing Competition in the World Scheduled for January 14, 2012
And that’s the Data Day, today.
January 5th, 2012 — Data management
Apache Hadoop 1.0. The future of CouchDB (or Couchbase anyway). And more.
Welcome to the first in an occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* The Apache Software Foundation Announces Apache Hadoop v1.0 Self-explanatory.
* The Future of CouchDB Apache CouchDB creator Damien Katz explains why he is focusing his attention on Couchbase Server.
* Understanding Microsoft’s big-picture plans for Hadoop and Project Isotope Mary Jo Foley parses Alexander Stojanovic’s presentation.
* MongoDB Extends Leadership in NoSQL 10gen claims more than 400 commercial customers.
* 1010data’s Unique Big Data Analytics Platform Sees Stunning Growth in 2011 1010data runs the numbers on its adoption in 2011.
* TouchDB 1.0 is out TouchDB is a lightweight CouchDB-compatible database engine suitable for embedding into mobile apps.
* Data Scientist = Rock Star, Really? Virginia Backaitis is sceptical.
* Swimming with Dolphins Splunk’s connector for MySQL.
* What the Sumerians can teach us about data Pete Warden finds data inspiration at the British Museum.
* How To (Not) Get Smart About Big Data Wim Rampen on the importance of filtering noise.
* For 451 Research clients
# Total Data: exploratory analytic platforms Spotlight report
# Apache Hadoop reaches version 1.0, with more to come Analyst note
# Acunu hones focus on ‘big data’ platform for operational analytics Market development report
# Jaspersoft gets big into ‘big data,’ illuminates BI business momentum Market development report
* Google News Search outlier of the day: “Bella” Becomes Most Popular Name for Both Dogs and Cats
And that’s the Data Day, today.
November 15th, 2011 — Data management
451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.
According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.
That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.
A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.
The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.
I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.
The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.
Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.
One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.
It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.
While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.
451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.
July 29th, 2011 — Data management
NoSQL has never really been about SQL. As we pointed out in our NoSQL, NewSQL and Beyond report, “[one] of the NoSQL idiosyncrasies is that in most cases SQL itself is not the ‘problem’ being avoided. Indeed, a better term might be ‘NoSchema,’ given that a more common quality is a rejection of fixed table schema and join operations”.
Nevertheless the NoSQL term has stuck, and also inspired NewSQL (which, as critics have pointed out, is not really about SQL either), while a number of NoSQL providers started to look at how they could actually add support for SQL queries to their respective databases.
The recently-released version 0.8 of Apache Cassandra features the first implementation of Cassandra Query Language (CQL), an SQL-like query language, for Cassandra.
Meanwhile Couchbase and SQLite have teamed up to create UnQL (Unstructured Query Language), a new data query language for unstructured data. Pronounced ‘uncle’, UnQL is designed to remove the burden of query planning, optimization and execution from NoSQL developers by providing an adaptation of the SQL structured query language for unstructured data models.
As can be seen by an example of the draft syntax, UnQL is designed to be familiar to SQL developers, while also enabling querying over complex and unstructured storage models, such as document models.
UnQL was created by Couchbase CTO and CouchDB creator Damien Katz, alongwith SQLite creator and founder Richard Hipp and both Couchbase and SQLite have committed to implementing UnQL in future versions of their database products.
UnQL is not designed to be specific to select database products, however, and the specification is being released to the public domain at www.unqlspec.org. There is also the potential that open source parsers and query planning implementations will be created to foster adoption.
One of the principle drivers behind UnQL’s development is that a common query language is necessary to drive NoSQL adoption in the same way SQL drove adoption in the relational database market.
It remains to be seen whether UnQL will be picked up by other projects, although the release to the public domain should give confidence that this is not an attempt to force the industry to adopt a ‘standard’ from a single vendor.
April 20th, 2011 — Data management
As we noted last week, necessity is one of the six key factors that are driving the adoption of alternative data management technologies identified in our latest long format report, NoSQL, NewSQL and Beyond.
Necessity is particularly relevant when looking at the history of the NoSQL databases. While it is easy for the incumbent database vendor to dismiss the various NoSQL projects as development playthings, it is clear that the vast majority of NoSQL projects were developed by companies and individuals in response to the fact that the existing database products and vendors were not suitable to meet their requirements with regards to the other five factors: scalability, performance, relaxed consistency, agility and intricacy.
The genesis of much – although by no means all – of the momentum behind the NoSQL database movement can be attributed to two research papers: Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006, and Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007.
The importance of these two projects is highlighted by The NoSQL Family Tree, a graphic representation of the relationships between (most of) the various major NoSQL projects:
Not only were the existing database products and vendors were not suitable to meet their requirements, but Google and Amazon, as well as the likes of Facebook, LinkedIn, PowerSet and Zvents, could not rely on the incumbent vendors to develop anything suitable, given the vendors’ desire to protect their existing technologies and installed bases.
Werner Vogels, Amazon’s CTO, has explained that as far as Amazon was concerned, the database layer required to support the company’s various Web services was too critical to be trusted to anyone else – Amazon had to develop Dynamo itself.
Vogels also pointed out, however, that this situation is suboptimal. The fact that Facebook, LinkedIn, Google and Amazon have had to develop and support their own database infrastructure is not a healthy sign. In a perfect world, they would all have better things to do than focus on developing and managing database platforms.
That explains why the companies have also all chosen to share their projects. Google and Amazon did so through the publication of research papers, which enabled the likes of Powerset, Facebook, Zvents and Linkedin to create their own implementations.
These implementations were then shared through the publication of source code, which has enabled the likes of Yahoo, Digg and Twitter to collaborate with each other and additional companies on their ongoing development.
Additionally, the NoSQL movement also boasts a significant number of developer-led projects initiated by individuals – in the tradition of open source – to scratch their own technology itches.
Examples include Apache CouchDB, originally created by the now-CTO of Couchbase, Damien Katz, to be an unstructured object store to support an RSS feed aggregator; and Redis, which was created by Salvatore Sanfilippo to support his real-time website analytics service.
We would also note that even some of the major vendor-led projects, such as Couchbase and 10gen, have been heavily influenced by non-vendor experience. 10gen was founded by former Doubleclick executives to create the software they felt was needed at the digital advertising firm, while online gaming firm Zynga was heavily involved in the development of the original Membase Server memcached-based key-value store (now Elastic Couchbase).
In this context it is interesting to note, therefore, that while the majority of NoSQL databases are open source, the NewSQL providers have largely chosen to avoid open source licensing, with VoltDB being the notable exception.
These NewSQL technologies are no less a child of necessity than NoSQL, although it is a vendor’s necessity to fill a gap in the market, rather than a user’s necessity to fill a gap in its own infrastructure. It will be intriguing to see whether the various other NewSQL vendors will turn to open source licensing in order to grow adoption and benefit from collaborative development.
NoSQL, NewSQL and Beyond is available now from both the Information Management and Open Source practices (non-clients can apply for trial access). I will also be presenting the findings at the forthcoming Open Source Business Conference.
February 8th, 2011 — Data management, M&A
The predicted consolidation of the NoSQL database landscape has begun. Membase and CouchOne have announced that they are merging to form Couchbase.
And in more interesting NoSQL news, Danish IT company Trifork has announced that it has acquired an 8% stake in Basho as part of the NoSQL vendor’s $7.4m series D round, and has become the European distributor for Riak.
The formation of Couchbase brings together to of the leading companies in the NoSQL space, and the complementary nature of the their technology and business plans highlights that the term NoSQL has been applied to many different database technologies which are being adopted for different reasons.
While Membase had focused on improving the performance of distributed applications through its Membase Server distributed database, CouchOne focused on developer interest in flexible document data stores and mobile applications, rather than performance at scale.
Additionally while Membase was focused on operational adoption with a small (albeit significant) developer community, the priority with CouchOne has been on growing adoption of Apache CouchDB, with commercial efforts only recently becoming the focus of attention.
The technology is also complementary. Couchbase will combine the Membase and CouchDB projects to form a new distributed document store project of the same name that combines the caching and clustering technology of Membase with the CouchDB document data store.
The result will be a new distributed document database covering a variety of use cases from mobile applications (Mobile Couchbase) to scalable clusters (Elastic Couchbase), with synchronization of data between the various Couchbase implementations enabled by CouchSync.
The merged company will be led by Bob Weiderhold, formerly CEO of Membase, while Damien Katz, formerly CEO of CouchOne and creator of the CouchDB database, becomes CTO.
Couchbase is claiming more than 200 customers, which would indicate phenomenal growth for both companies since the launch of their CouchOne Mobile and Membase Server products in September and October 2010 respectively.
Prior to the launch of those products they previously claimed just a handful of customers each, although CouchOne had signed up thousands of users to its free hosted services, so it had a large and willing audience ready for conversion.
Additionally the company claims millions of combined users since CouchDB has been included in every installation of the Ubuntu Linux distribution since late 2009 and Heroku (now part of Salesforce.com) offers a Membase-driven service to thousands of its hosting customers.
We previously predicted that we would see the NoSQL market both consolidate and proliferate this year, and it is worth noting that the merger of CouchOne and Membase will not result in a similar consolidation of open source projects.
While Couchbase.org can be expected to replace membase.org over time, the Couchbase project will be independent of the Apache CouchDB, which will not be impacted by the merger. Couchbase will continue to contribute to both CouchDB and also the memcached project.
While we’re on the subject of NoSQL, it is also interesting to see that Danish IT vendor Trifork has not only signed up to be European distributor of the Riak database, but has also taken a stake in Basho Technologies.
Trifork has acquired newly issued shares in Basho representing 8.35% of the company as part of its series D round, with an option to acquire an additional 3.96% at the end of Q1 2011.