The geographic distribution of Hadoop skills: in context

NC State University’s Institute for Advanced Analytics recently published some interesting statistics on Apache Hadoop adoption based on a search of LinkedIn data.

The statistics graphically illustrate what a lot of people wer already pretty sure of: that the geographic distribution of Hadoop skills (and presumably therefore adoption) is heavily weighted in favour of the USA, and in particular the San Francisco Bay Area.

The statistics showed that 64% of the 9,079 LinkedIn members with “Hadoop” in their member profiles (by no means perfect but an insightful measure nonetheless) are based in the US, and that the vast majority of those are in the Bay Area.

The results are what we would expect to see given the relative level of immaturity of Apache Hadoop adoption, as well as the nature and location of the early Hadoop adopters and Hadoop-related vendors.

The results got me thinking two things:
– how does the geographic spread compare to a more maturely adopted project?
– how does it compare to the various NoSQL projects?

So I did some searching of LinkedIn to find out.

To answer the first question I performed the same search for MySQL, as an example of a mature, widely-adopted open source project.

The results show that just 32% of the 366,084 LinkedIn members with “MySQL” in their member profiles are based in the US (precisely half that of Hadoop) while only 4.4% are in the Bay area, compared to 28.2% of the 9,079 LinkedIn members with “Hadoop” in their member profiles.

The charts below illustrate the difference in geographic distribution between Hadoop and MySQL. The size of the boxes is in proportion to the search result (click each image for a larger version).

With regards to the second question, I also ran searches for MongoDB, Riak, CouchDB, Apache Cassandra*, Membase*, Neo4j, Hbase, and Redis.

I’ll be posting the results for each of those over the next week or so, but in the meantime, the graphic below shows the split between the USA and Rest of the World (ROW) for all ten projects.

It illustrates, as I suspected, that the distribution of skills for NoSQL databases is more geographically disperse than for Hadoop.

I have some theories as to why that is – but I’d love to hear anyone else’s take on the results.

*I had to use the ‘Apache’ qualifier with Cassandra to filer out anyone called Cassandra, while Membase returned a more statistically relevant result than Couchbase.

World map image: Owen Blacker

Tags: , , , , , , , , , , , , ,

4 comments ↓

#1 Henrik Ingo on 12.09.11 at 6:46 am

It’s because in a country like Finland, there are pretty much just 3 companies with enough data for Hadoop to be meaningful: Nokia, Habbo Hotel (Sulake), and since this year, Angry Birds (Rovio).

I know some companies that operate on the Finnish market that have some terabytes – enough to make life with MySQL a struggle – but since there is no Hadoop competence to recruit, they buy SSD and get on with their MySQL lifes.

Excellent research, again!

#2 Tempt tech talent without Googlesque mega perks | Matias Vangsnes on 12.20.11 at 2:48 pm

[…] as New York City and Seattle altogether. The analysts at 451 Group have analysed where skills for Hadoop and also NoSQL congregate: by a huge margin, engineers with these talents live in Silicon Valley. […]

#3 Tempt tech talent without Googlesque mega perks | Adoption from Ukraine on 12.21.11 at 9:55 pm

[…] as New York City and Seattle altogether. The analysts at 451 Group have analysed where skills for Hadoop and also NoSQL congregate: by a huge margin, engineers with these talents live in Silicon Valley. […]

#4 NoSQL para no programadores | La Pastilla Roja on 02.12.12 at 4:36 pm

[…] Too much information Compartir: Esta entrada fue publicada en Computación en la Nube, Minería de Datos, Morfeo […]