February 24th, 2012 — Data management
February 14th, 2012 — Data management
January 5th, 2012 — Data management
Apache Hadoop 1.0. The future of CouchDB (or Couchbase anyway). And more.
Welcome to the first in an occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* The Apache Software Foundation Announces Apache Hadoop v1.0 Self-explanatory.
* The Future of CouchDB Apache CouchDB creator Damien Katz explains why he is focusing his attention on Couchbase Server.
* Understanding Microsoft’s big-picture plans for Hadoop and Project Isotope Mary Jo Foley parses Alexander Stojanovic’s presentation.
* MongoDB Extends Leadership in NoSQL 10gen claims more than 400 commercial customers.
* 1010data’s Unique Big Data Analytics Platform Sees Stunning Growth in 2011 1010data runs the numbers on its adoption in 2011.
* TouchDB 1.0 is out TouchDB is a lightweight CouchDB-compatible database engine suitable for embedding into mobile apps.
* Data Scientist = Rock Star, Really? Virginia Backaitis is sceptical.
* Swimming with Dolphins Splunk’s connector for MySQL.
* What the Sumerians can teach us about data Pete Warden finds data inspiration at the British Museum.
* How To (Not) Get Smart About Big Data Wim Rampen on the importance of filtering noise.
* For 451 Research clients
# Total Data: exploratory analytic platforms Spotlight report
# Apache Hadoop reaches version 1.0, with more to come Analyst note
# Acunu hones focus on ‘big data’ platform for operational analytics Market development report
# Jaspersoft gets big into ‘big data,’ illuminates BI business momentum Market development report
* Google News Search outlier of the day: “Bella” Becomes Most Popular Name for Both Dogs and Cats
And that’s the Data Day, today.
December 8th, 2011 — Data management
Following last week’s post putting the geographic distribution of Hadoop skills, based on a search of LinkedIn members, in context, this week we will be publishing a series of posts looking in detail at the various NoSQL projects.
The posts examine the geographic spread of LinkedIn members citing a specific NoSQL database in their member profiles, as of December 1, and provides an interesting illustration of the state of adoption for each.
We’ve already taken a look at Membase and HBase, and Apache Cassandra and Riak. Part three examines the geographic spread of 10gen’s MongoDB and Redis.
The statistics showed that 41.0% of the 6,048 LinkedIn members with “MongoDB” in their member profiles are based in the US, putting MongoDB is the top half of the table for geographic spread.
Only 11.2% are in the Bay area, fewer than Hadoop, Membase, HBase, Cassandra, Riak and Redis. The results also indicate that the New York area is a hot-spot for MongoDB skills, with 6.2% – as one might expect given the location of 10gen’s HQ. Other hot-spots include Brazil (4.2%) and Ukraine (2.8%).
Redis is even more widely adopted, with only 37% of the 2,152 LinkedIn members with “Redis” in their member profiles are based in the US, although 12.0% are in the Bay area.
Ukraine is also a hot-spot for Redis skills (3.8%) as is France (3.6%) and Spain (2.9%).
The series will conclude later this week with CouchDB, and Neo4j.
N.B. The size of the boxes is in proportion to the search result (click each image for a larger version). World map image: Owen Blacker
November 15th, 2011 — Data management
451 Research has today published a report looking at the funding being invested in Apache Hadoop- and NoSQL database-related vendors. The full report is available to clients, but below is a snapshot of the report, along with a graphic representation of the recent up-tick in funding.
According to our figures, between the beginning of 2008 and the end of 2010 $95.8m had been invested in the various Apache Hadoop- and NoSQL-related vendors. That figure now stands at more than $350.8m, up 266%.
That statistic does not really do justice to the sudden uptick of interest, however. The figures indicate that funding for Apache Hadoop- and NoSQL-related firms has more than doubled since the end of August, at which point the total stood at $157.5m.
A substantial reason for that huge jump is the staggering $84m series A funding round raised by Apache Hadoop-based analytics service provider Opera Solutions.
The original commercial supporter of Apache Hadoop, Cloudera, has also contributed strongly with a recent $40m series D round. In addition, MapR Technologies raised $20m to invest in its Apache Hadoop distribution, while we know that Hortonworks also raised a substantial round (unconfirmed, but reportedly $20m) from Benchmark Capital and former parent Yahoo as it was spun off in June. Index Ventures also recently announced that it has become an investor in Hortonworks.
I am reliably informed that if you factor in Hortonworks’ two undisclosed rounds, the total funding for Hadoop and NoSQL vendors is actually closer to $400m.
The various NoSQL database providers have also played a part in the recent burst of investment, with 10gen raising a $20m series D round and Couchbase raising $15m. DataStax, which has interests in both Apache Cassandra and Apache Hadoop, raised an $11m series B round, while Neo Technology raised a $10.6m series A round. Basho Technologies raised $12.5m in series D funding in three chunks during 2011.
Additionally, there are a variety of associated players, including Hadoop-based analytics providers such as Datameer, Karmasphere and Zettaset, as well as hosted NoSQL firms such as MongoLab, MongoHQ and Cloudant.
One investor company name that crops up more than most in the list above is Accel Partners, which was an original investor in both Cloudera and Couchbase, and backed Opera Solutions via its Accel- KKR joint venture with Kohlberg Kravis Roberts.
It appears that those investments have merely whetted Accel’s appetite for big data, however, as the firm last week announced a $100m Big Data Fund to invest in new businesses targeting storage, data management and analytics, as well as data-centric applications and tools.
While Accel is the fist VC shop that we are aware of to create a fund specifically for big data investments, we are confident both that it won’t be the last and that other VCs have already informally earmarked funds for data-related investments.
451 clients can get more details on funding and M&A involving more traditional database vendors, as well as our perspective on potential M&A suitors for the Hadoop and NoSQL players.
November 12th, 2010 — Data management
CouchOne has become the first of the major NoSQL database vendors to publicly distance itself from the term NoSQL, something we have been expecting for some time.
While the term NoSQL enabled the likes of 10gen, Basho, CouchOne, Membase, Neo Technologies and Riptano to generate significant attention for their various database projects/products it was always something of a flag of convenience.
Somewhat less convenient is the fact that grouping the key-value, document, graph and column family data stores together under the NoSQL banner masked their differentiating features and potential use cases.
As Mikael notes in the post: “The term ‘NoSQL’ continues to lump all the companies together and drowns out the real differences in the problems we try to tackle and the challenges we face.”
It was inevitable, therefore, that as the products and vendors matured the focus would shift towards specific use cases and the NoSQL movement would fragment.
CouchOne is by no means the only vendor thinking about distancing itself from NoSQL, especially since some of them are working on SQL interfaces. Again, we would see this fragmentation as a sign of maturity, rather than crisis.
The ongoing differentiation is something we plan to cover in depth with a report looking at the specific use cases of the “database alternatives” early in 2011.
It is also interesting that CouchOne is distancing itself from NoSQL in part due to the conflation of the term with Big Data. We have observed this ourselves and would agree that it is a mistake.
While some of the use cases for some of the NoSQL databases do involve large distributed data sets not all of them do, and we had noted that the launch of the CouchOne Mobile development environment was designed to play to the specific strengths of Apache CouchDB: peer-based bidirectional replication, including disconnected mode, and a crash-only design.
Incidentally, Big Data is another term we expect to diminish in usage in 2011, since Bigdata is a trademark of a company called SYSTAP.
Witness the fact that the Data Analytics Summit, which I’ll be attending next week, was previously the Big Data Summit. We assume that is also the reason Big Data News has been upgraded to Massive Data News.
The focus on big data sets and solving big data problems will continue, of course, but expect much less use of Big Data as a brand.
Similarly, while we expect many of the “NoSQL” databases have a bright future, expect much less focus on the term NoSQL.
February 25th, 2010 — Data management
As a company, The 451 Group has built its reputation on taking a lead in covering disruptive technologies and vendors. Even so, with a movement as hyped as NoSQL databases, it sometimes pays to be cautious.
In my role covering data management technologies for The 451 Group’s Information Management practice I have been keeping an eye on the NoSQL database movement for some time, taking the time to understand the nuances of the various technologies involved and their potential enterprise applicability.
That watching brief has now spilled over into official coverage, following our recent assessment of 10gen. I also recently had the chance to meet up with Couchio’s VP of business development, Nitin Borwankar (see coverage initiation of Couchio). I’ve also caught up with Basho Technologies sooner rather than later. A report on that is now imminent.
There are a couple of reasons why I have formally began covering the NoSQL databases. The first is the maturing of the technologies, and the vendors behind them, to the point where they can be considered for enterprise-level adoption. The second is the demand we are getting from our clients to provide our view of the NoSQL space and its players.
This is coming both from the investment community and from existing vendors, either looking for potential partnerships or fearing potential competition. The number of queries we have been getting related to NoSQL and big data have encouraged articulation of my thoughts, so look-out for a two-part spotlight on the implications for the operational and analytical database markets in the coming weeks.
The biggest reason, however, is the recognition that the NoSQL movement is a user-led phenomena. There is an enormous amount of hype surrounding NoSQL but for the most part it is not coming from vendors like 10gen, Couchio and Basho (although they may not be actively discouraging it) but from technology users.
A quick look at the most prominent key-value and column-table NoSQL data stores highlights this. Many of these have been created by user organizations themselves in order fill a void and overcome the limitations of traditional relational databases – for example Google (BigTable), Yahoo (Hbase), Zvents (Hypertable), LinkedIn (Voldemort), Amazon (Dynamo), and Facebook (Cassandra).
It has become clear that traditional database technologies do need meet the scalability and performance requirements of dealing with big data workloads, particularly at a scale experienced by social networking services.
That does raise the question of how applicable these technologies will be to enterprises that do not share the architecture of the likes of Google, Facebook and LinkedIn – at least in the short-term. Although there are users – Cassandra users include Rackspace, Digg, Facebook, and Twitter, for example.
What there isn’t – for the likes of Cassandra and Voldemort, at least – is vendor-based support. That inevitably raises questions about the general applicability of the key-value/column table stores. As Dave Kellog notes, “unless you’ve got Google’s business model and talent pool, you probably shouldn’t copy their development tendencies”.
Given the levels of adoption it seems inevitable that vendors will emerge around some of these projects, not least since, as Dave puts it, “one day management will say: ‘Holy Cow folks, why in the world are we paying programmers to write and support software at this low a level?'”
In the meantime, it would appear that the document-oriented data stores (Couchio’s CouchDB, 10gen’s MongoDB, Basho’s Riak) are much more generally applicable, both technologically and from a business perspective. UPDATE – You can also add Neo Technology and its graph database technology to that list).
In our forthcoming two-part spotlight on this space I’ll articulate in more detail our view on the differentiation of the various NoSQL databases and other big data technologies and their potential enterprise applicability. The first part, on NoSQL and operational databases, is here.
February 18th, 2009 — Data management
Back in July last year we reported on the formation of a new open source cloud computing start-up called 10gen on our Cloud Cover and CAOS Theory blogs.
Seven months later and there have been a few changes at 10gen, such that this information management blog is arguably the most suitable venue for discussion of the implications of 10gen’s MongoDB, the cloud computing database which has now become its major focus.
A quick recap: 10gen launched as an open source platform-as-a-service play offering the MongoDB object database as well as an application server and file system. So far, so cloud stack.
However, the file system quickly became an interface layer to MongoDB while the company more recently decided that its application server runtime and MongoDB are better off apart and shifted its attention to the database, a standalone beta version of which was released last week.
As the two projects have diverged so will this post. To continue reading about the future of the Babble application server head for CAOS Theory, otherwise:
As this post from Geir Magnusson Jr, 10gen VP of Engineering & Co-Founder, at Codehaus describes, MongoDB is not your traditional database.
“As I argue when people give me the chance to speak about it, databases are changing – just look at what is available in the so-called “cloud” arena. It tends not to be a RDBMS if it’s scalable. The storage engine under AppEngine, or Amazon’s SimpleDB, or any of the Dynamo implementations, etc, all of which change your programming model to one that isn’t “tables and joins”. Or look at the excellent CouchDB, a JSON store. If the RDBMS isn’t being replaced outright (like it has to be in “the cloud”), it can to be augmented with other persistence technologies that are better suited for a portion of the data requirements of a system.”
This was one of the themes of my talk at our client event in Boston last year, and nothing has happened since then to change my mind. As Geir explains, the interesting thing about the new cloud databases (for want of a better term) is that they force users to think differently about what a database is for – and specifically to think beyond the realms of the relational.
We see similar forces at work in the data warehousing space driven by column-oriented architectures, but the end result is the same as users are increasingly thinking beyond what already know to consider the best database management tools for the job at hand.
As Geir adds of MongoDB: “It works fine as a database, but you can’t think relational. If you want to just replace MySQL with something else, but don’t want to rethink your data model, MongoDB isn’t for you.”