451 research — Too much information

The geographic distribution of Hadoop skills: in context

December 2nd, 2011 — Data management

NC State University’s Institute for Advanced Analytics recently published some interesting statistics on Apache Hadoop adoption based on a search of LinkedIn data.

The statistics graphically illustrate what a lot of people wer already pretty sure of: that the geographic distribution of Hadoop skills (and presumably therefore adoption) is heavily weighted in favour of the USA, and in particular the San Francisco Bay Area.

The statistics showed that 64% of the 9,079 LinkedIn members with “Hadoop” in their member profiles (by no means perfect but an insightful measure nonetheless) are based in the US, and that the vast majority of those are in the Bay Area.

The results are what we would expect to see given the relative level of immaturity of Apache Hadoop adoption, as well as the nature and location of the early Hadoop adopters and Hadoop-related vendors.

The results got me thinking two things:
– how does the geographic spread compare to a more maturely adopted project?
– how does it compare to the various NoSQL projects?

So I did some searching of LinkedIn to find out.

To answer the first question I performed the same search for MySQL, as an example of a mature, widely-adopted open source project.

The results show that just 32% of the 366,084 LinkedIn members with “MySQL” in their member profiles are based in the US (precisely half that of Hadoop) while only 4.4% are in the Bay area, compared to 28.2% of the 9,079 LinkedIn members with “Hadoop” in their member profiles.

The charts below illustrate the difference in geographic distribution between Hadoop and MySQL. The size of the boxes is in proportion to the search result (click each image for a larger version).

With regards to the second question, I also ran searches for MongoDB, Riak, CouchDB, Apache Cassandra*, Membase*, Neo4j, Hbase, and Redis.

I’ll be posting the results for each of those over the next week or so, but in the meantime, the graphic below shows the split between the USA and Rest of the World (ROW) for all ten projects.

It illustrates, as I suspected, that the distribution of skills for NoSQL databases is more geographically disperse than for Hadoop.

I have some theories as to why that is – but I’d love to hear anyone else’s take on the results.

*I had to use the ‘Apache’ qualifier with Cassandra to filer out anyone called Cassandra, while Membase returned a more statistically relevant result than Couchbase.

World map image: Owen Blacker

4 Comments

Subscribe via RSS

About this blog.

<< Home

Search
Tags
10gen 12c 451 451 events 451 group 451 reports 451 webinars 1010data Accel Accelerite Accenture accumulo Acquia Actian Actuate Acunu Adaptive Insights Adaptive Planning Adobe ADVIZOR aerospike Ahana AI AIIM Airbyte Aiven Akiban Alation aleri Alfresco Algorithmia Alibaba AllegroGraph Alluxio Alooma Alpine Data alpine data labs Altair alteryx Altiscale amazon Amazon RDS Anaconda analytics anaplan Anodot apache Apache Beam Apache Cassandra Apache Drill Apache Hadoop apache Tajo apama Apple arangodb Arcadia Data Archiving Arm artificial intelligence Ascend ASE ASG aslett Aster aster data Ataccama Atlas AtScale Attivio Attunity Aurora automated decision intelligence Automated Insights AutoML Autonomy Avalanche aws Ayasdi azure Azure Data Lake Azure Machine Learning Azure SQL Database Azure SQL Data Warehouse basho BeyondCore big data Big Data Appliance BigID BigInsights BigML bigquery Bigstep bigtable Bime birst BIRT Bitwise BitYota Blockchain bluedata bmc BOARD Bonsai boomi BusinessObjects C3 calpont Cambridge Semantics Capgemini CAP Theorem Cascading Cask Cask Data cassandra Cazena CCPA CData Centage CenturyLink CEP Chartio Chris Dale Cirro Cisco Citrusleaf citus data ClearDB ClearStory ClearStory Data cloud cloudant cloud database cloudera cloud sql cloudwick ClusterHQ clustrix CockroachDB Cockroach Labs CodeFutures CognitiveScale Cognos Collibra complex event processing Compose Composite concurrent conference Confluent context relevant continuent Continuuity Continuum Continuum Analytics CoolaData coral8 couchbase couchdb Coveo CPM Crate Crate.io CrateDB Cray Crunchy Data CSC DarwinAI data data-driven Data3sixty data Artisans database Database Migration Service data bazaar databricks Data Catalog Data Fabric Dataflow data governance DataHero data hub Dataiku data lake datameer DataOps Dataproc Datarobot DataRPM data science DataStax DataTorrent data warehouse data warehousing Datawatch Datical Dato Datometry Datomic Day Software DB2 DBaaS deep Deep IS DeepSQL Dell delphix Denodo Dgraph Digital Reasoning Diyotta Docker DocumentDB Domino Data Domino Data Lab Domo dotData doug cutting Drawn to Scale Dremio drill druid Drupal dynamoDB e-disclosure e-discovery ECM EDB eDisclosure eDiscovery Einstein Elastic elasticsearch embedded analytics EMC EMR Endeca enterprise 2.0 enterprisedb EraDB Esgyn exadata Exalead exalytics Exasol Experian Data Quality facebook FairCom FAST Fauna fedora Fivetran Flink Fluree FORMCEPT foundationdb franz Fuzzy Logix Galactic Exchange galera Garantia Data gazzang GemFire geniedb GigaSpaces glassbeam Global IDs GoGrid gooddata Google Google Cloud Google Cloud SQL graph GraphDB GraphLab GraphQL greenplum GridGain Guavus H2O H2O.ai Hadapt hadoop HANA HarperDB Hasura HAVEn HAWQ hazelcast HBase HDInsight Hitachi Hitachi Vantara hive hortonworks Host Analytics HP HPE hstreaming Huawei Hunk HyPer Hyperledger Hypertable ibi IBM Idera Iguazio Immuta impala Impetus Imply in-memory Incorta indeed.com indicee Indico infinidb infinitegraph InfluxData InfluxDB infobright infochimps Infogix Informatica Information Builders information governance infosys Infoworks ingres InkTank Instaclustr integration Intel Interana InterSystems Interwoven Io-Tahoe IoT IRIS isilon Isys Search jaspersoft JethroData Jive Software jkool John Newton JSON JustOne K2View kafka Kalido Karmasphere Kinesis Kinetica Kitenga KNIME Knowi Kogentix kognitio Koverse Kubernetes Kudu KXEN Kx systems Kyligence Kyndi Kyvos Insights Lavastorm LegalTech Leonardo Lexalytics Lightbend Linguamatics LinkedIn Loggly Logi Logi Analytics LogTrust Logz.io Looker luciddb Lucid Imagination LucidWorks Lumira M&A Maana machine learning Magnitude Manhattan map MapD MapR mapreduce MariaDB MariaDB Foundation marklogic Mathworks Matillion matt aslett mattaslett matthew aslett matthewaslett MDM membase memcached MemSQL Mesos Mesosphere metamarkets Metanautix MetaScale Metric Insights Micro Focus Microsoft microstrategy Mike Lynch mLab MLOps Mode Analytics mongodb MongoHQ MongoLAb Monte Carlo Mortar Data MuleSoft multi-model MySQL MySQL Cluster narrative science Nebula Graph neo Neo4J neo technology NetApp netezza New Relic NewSQL NGDATA Nirvanix Nodeable Noodle.ai noSQL nosql road show Nuix NuoDB Nutonian NuvolaBase Nvidia objectivity ObjectRocket Octopai ODPi Okera OmniSci OneStream OneTrust Ontotext open source openstack OpenText Open Text opera Oracle Orchestra Orchestrate OrientDB Outlier Pachyderm palantir Panoply panorama paraccel ParallelM Parelastic parquet ParStream Paxata pentaho Pepperdata percona Periscope Data pervasive PingCAP Pivotal PlanetScale platfora Podium Data PolyBase Postgres PostgreSQL Powerset Precisely Precog PredictionIO Predixion Presto Prevedere Privacera progress Qlik qliktech qlikview Qubole quest QuickSight Quid Quiterian R rackspace Radoop rainstor Rapid-I RapidMiner RDS Recommind red hat redis Redis Labs Redshift Reltio RethinkDB Revelytix revolution revolution analytics riak Riak CS Ringside Networks Riversand RJMetrics RStudio ryft SaaS Sagemaker Salesforce Salesforce.com SAND SanDisk SAP SAS SAS Institute Savvis scaleArc ScaleBase Scaleout schooner ScyllaDB Search-based applications Serengeti ShareInsights SharePoint Simba Sinequa SingleStore Siren SiSense Sisu SkySQL skytree SlamData SnapLogic snowflake social networking social software Socialtext Softlayer Software AG Spacecurve SpagoBI spanner Spark Splice MAchine Splunk Spotfire SPSS SQL SQl Azure SQL Server SQLstream SQream Sqrrl Starburst Starburst Data Starcounter Stardog Statistica Statwing stinger Stitch Storage storm Stratio StreamAnalytix streambase Streamlio Streamsets Striim Sumo Logic Survey Swarm64 sybase Symantec Syncsort Tableau Tachyon Talend Tamr TARGIT Tecton tempodb TempoIQ TensorFlow teradata terracotta Tesora TextAnalyticsSummit tez the 451 group ThingSpan ThoughtSpot tibco Tidemark TigerGraph Timescale timesten TimeXtender Titan toad Tokutek total data Translattice Treasure Data trifacta Trillium Trove Twitter Unifi Unifi Software Unisys Varada vectorwise Veristorm Verizon Versant vertica VEsoft Vignette virtualization Vivisimo Viya vmware voldemort voltdb Vora WANdisco Waterline Waterline Data watson WCM webinar WhereScape WibiData Wipro wise.io Workday xeround Xplenty xtremedata Yahoo! YarcData YARN Yellowbrick Yellowfin Yottamine YugaByte Zaloni Zementis Zen ZEPL zettaset Zimory Zoho Zoomdata
Twitter: maslett
- @Peston @sturdyAlex I ain't reading all that. I'm happy for u tho. Or sorry that happened. 05:31:49 PM June 13, 2023 from Twitter for iPhone in reply to Peston Reply Retweet Favorite
- He did win a massive democratic mandate but that didn’t give him carte blanche to lie with impunity. @BorisJohnson… https://t.co/OKCsqKXifc 02:28:43 PM June 11, 2023 from Twitter for iPhone Reply Retweet Favorite
- The odds on a 2023 U.K. General Election are looking very tempting right now… 11:01:30 AM June 10, 2023 from Twitter for iPhone Reply Retweet Favorite
- RT @lewis_goodall: Untruths being peddled everywhere. Johnson wasn’t forced out by anyone but himself. The Privileges Committee (Conserva… 08:32:06 AM June 10, 2023 from Twitter for iPhone Reply Retweet Favorite
@maslett
Categories
- 2.0
- Archiving
- Collaboration
- Content management
- CRM
- Data management
- eDiscovery
- Internet
- M&A
- Mobile
- PaaS
- Search
- Storage
- Text analysis
- Uncategorized
451 Group blogroll
Recent Posts
Subscribe via Email

Enter your email address:
Delivered by FeedBurner
Archives
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008

The geographic distribution of Hadoop skills: in context

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives