Necessity is the mother of NoSQL

As we noted last week, necessity is one of the six key factors that are driving the adoption of alternative data management technologies identified in our latest long format report, NoSQL, NewSQL and Beyond.

Necessity is particularly relevant when looking at the history of the NoSQL databases. While it is easy for the incumbent database vendor to dismiss the various NoSQL projects as development playthings, it is clear that the vast majority of NoSQL projects were developed by companies and individuals in response to the fact that the existing database products and vendors were not suitable to meet their requirements with regards to the other five factors: scalability, performance, relaxed consistency, agility and intricacy.

The genesis of much – although by no means all – of the momentum behind the NoSQL database movement can be attributed to two research papers: Google’s BigTable: A Distributed Storage System for Structured Data, presented at the Seventh Symposium on Operating System Design and Implementation, in November 2006, and Amazon’s Dynamo: Amazon’s Highly Available Key-Value Store, presented at the 21st ACM Symposium on Operating Systems Principles, in October 2007.

The importance of these two projects is highlighted by The NoSQL Family Tree, a graphic representation of the relationships between (most of) the various major NoSQL projects:

Not only were the existing database products and vendors were not suitable to meet their requirements, but Google and Amazon, as well as the likes of Facebook, LinkedIn, PowerSet and Zvents, could not rely on the incumbent vendors to develop anything suitable, given the vendors’ desire to protect their existing technologies and installed bases.

Werner Vogels, Amazon’s CTO, has explained that as far as Amazon was concerned, the database layer required to support the company’s various Web services was too critical to be trusted to anyone else – Amazon had to develop Dynamo itself.

Vogels also pointed out, however, that this situation is suboptimal. The fact that Facebook, LinkedIn, Google and Amazon have had to develop and support their own database infrastructure is not a healthy sign. In a perfect world, they would all have better things to do than focus on developing and managing database platforms.

That explains why the companies have also all chosen to share their projects. Google and Amazon did so through the publication of research papers, which enabled the likes of Powerset, Facebook, Zvents and Linkedin to create their own implementations.

These implementations were then shared through the publication of source code, which has enabled the likes of Yahoo, Digg and Twitter to collaborate with each other and additional companies on their ongoing development.

Additionally, the NoSQL movement also boasts a significant number of developer-led projects initiated by individuals – in the tradition of open source – to scratch their own technology itches.

Examples include Apache CouchDB, originally created by the now-CTO of Couchbase, Damien Katz, to be an unstructured object store to support an RSS feed aggregator; and Redis, which was created by Salvatore Sanfilippo to support his real-time website analytics service.

We would also note that even some of the major vendor-led projects, such as Couchbase and 10gen, have been heavily influenced by non-vendor experience. 10gen was founded by former Doubleclick executives to create the software they felt was needed at the digital advertising firm, while online gaming firm Zynga was heavily involved in the development of the original Membase Server memcached-based key-value store (now Elastic Couchbase).

In this context it is interesting to note, therefore, that while the majority of NoSQL databases are open source, the NewSQL providers have largely chosen to avoid open source licensing, with VoltDB being the notable exception.

These NewSQL technologies are no less a child of necessity than NoSQL, although it is a vendor’s necessity to fill a gap in the market, rather than a user’s necessity to fill a gap in its own infrastructure. It will be intriguing to see whether the various other NewSQL vendors will turn to open source licensing in order to grow adoption and benefit from collaborative development.

NoSQL, NewSQL and Beyond is available now from both the Information Management and Open Source practices (non-clients can apply for trial access). I will also be presenting the findings at the forthcoming Open Source Business Conference.


Quick thoughts on Microsoft-Powerset:

  • This is about scaling the original Powerset vision and by extension, the vision of the Xerox PARC engineers that developed the technology on which Powerset is built.

  • This isn’t an alternative to buying Yahoo – that’s overly-simplistic apples vs oranges stuff.

  • But this is about improving Live Search and as such, the Powerset guys probably have a greater chance of influencing Microsoft’s search direction than the semantic-technology focused people within Yahoo would have had within MicroHoo.

  • The semantic technology crowd just lost its poster child; will the next one please step forward?

I told you they were quick!. More tomorrow to 451 clients via TechDealmaker.

Semantic technology conference

This blog’s title alludes to the well documented problem of people not being able to find things, mainly within an enterprise environment. Semantic technology has the potential to change that. And having just spent a day and a half at the Semantic Technology conference in San Jose (it goes on for another three days), I can verify that there’s plenty of people who think that it’s on the verge of doing so.

The promise of semantic technology, as Jeff Pollock of Oracle put it at the end of his presentation, is that  “finding stuff just got easier.”

I spoke to a lot of people and will be talking to numerous vendors, customers and investors in the coming months, but here’s my initial take:

  • Semantic technology will succeed if it’s led out by the consumer market, followed by the enterprise. This is despite the widespread skepticism about the Semantic Web, you know, the bit about it being an untenable, top-down approach to apply meaning to web pages, which I think is itself misunderstood. There’s a couple of major attempts at bring sem tech to the consumer right now: Powerset, a web search engine that launch this month, initiall searching Wikipedia articles and Twine, a social network based on shared interests built by Radar Networks, which will launch fully in the fall, but is in private beta now. Should either of those succeed in terms of usage and Web-scalability, VC funding would follow if proven in a web-scale consumer environment.
  • Semantic technology isn’t a market, it’s an enabling technology.Pollock asserted that and I’d agree with him, it’s much like text analysis in that regard (see below).
  • Standards are baked – RDF/OWL/XML. There may be some fiddling around the edges, but judging by the number of standards bodies endorsing or adopting those two (OASIS, ISO, W3C and OMG), the job seems to be largely done for now.
  • Sem tech vendors and users need to understand that text analysis using NLP or statistical methods isn’t the enemy here, and if you can fix search rather than scoff at it, you might have a winner. I saw too much berating of Google as ‘not getting it’ and text analysis as being ‘shallow’ for my liking.
  • Finally, 1,000 people is a lot of people to attract to a conference about semantic technology. It was 600 last year and although I wasn’t there last year, those that were told me it was the first year that people started to move beyond the theoretical (semantic technology has the potential to do this!) to the actual. And given that there were a very large number of European there, a similar conference on that continent would seem to make sense to me – attendees I spoke with didn’t know of such a thing, but if you do, please let me know in the comments.