NoSQL ≠ open source

I thought we finished with trying to define NoSQL in 2010 but Martin Fowler has raised the question again with his recent post – although he has a good reason to do so since he is collaborating on a book on the subject.

Fowler’s list of common characteristics (which he acknowledges is not definitional) is as follows:

  • Not using the relational model (nor the SQL language)
  • Open source
  • Designed to run on large clusters
  • Based on the needs of 21st century web properties
  • No schema, allowing fields to be added to any record without controls
  • You could argue about whether all NoSQL databases are designed to run on large clusters, but the characteristic from the list above that I would dispute is open source.

    While it is undoubtedly true to say that most NoSQL databases are open source, I don’t believe it defines them in the same way that other common characteristics do.

    The main argument for making open source licensing a requirement of NoSQL seems to me to be historical. The first NoSQL meeting, cited by Fowler, specified that it was about “open source, distributed, non-relational databases”.

    However, making open source licensing a defining characteristic of NoSQL would also exclude a number of products that would otherwise clearly fit the definition of NoSQL, as well as projects such as Google’s BigTable and Amazon’s Dynamo which were the genesis of much – although by no means all – of the momentum behind the NoSQL database movement.

    For the sake of argument let’s assume Amazon decided to release a version of Dynamo that could be deployed on-premise and for whatever reason decided not to release “Dynamo-on-premise” under an open source license.

    Is anyone seriously going to argue that a closed source “Dynamo-on-premise” wouldn’t be a NoSQL database?

    For what it’s worth since our NoSQL, NewSQL and Beyond report the description of NoSQL I have been using is:

  • A new breed of non-relational database products
  • sharing a rejection of fixed table schema and join operations
  • designed to meet scalability requirements of distributed architectures
  • and/or schema-less data management requirements
  • Although, like Fowler I would not claim this to be a definition.