Alistair Croll — Too much information

Beyond ‘big data’

August 11th, 2011 — Data management

Alistair Croll published an interesting post this week entitled ‘there’s no such thing as big data’ in which he argued, prompted by a friend that “given how much traditional companies put [big data] to work, it might as well not exist.”

Tim O’Reilly continued the theme in his follow-up post, arguing:

“companies that have massive amounts of data without massive amounts of clue are going to be displaced by startups that have less data but more clue”

There is much to agree with – in fact I have myself argued that when it comes to data, the key issue is not how much you have, but what you do with it. However, there is also a significant change of emphasis here from the underlying principles that have driven the interest in ‘big data’ in the last 12-18 months.

Compare Tim O’Reilly’s statement with the following, from Google’s seminal research paper The Unreasonable Effectiveness of Data:

“invariably, simple models and a lot of data trump more elaborate models based on less data”

While the two statements are not entirely contradictory, they do indicate a change in emphasis related to data. There has been so much emphasis of the ‘big’ in ‘big data’, as if the growing volume, variety and velocity of data itself would deliver improved business insights.

As I have argued in the introduction to our ‘total data’ management concept and the numerous presentations given on the subject this year, in order to deliver value from that data, you have to look beyond the nature of the data and consider what it is that the user wants to do with that data.

Specifically, we believe that one of the key factors in delivering value is companies focusing on storing and processing all of their data (or at least as much as is economically feasible) rather than analysing samples and extrapolating the results.

The other factor is time, and specifically how fast users can get to the results they are looking for. Another way of looking at this is in terms of the rate of query. Again, this is not about the nature of the data, but what the user wants to do with that data.

This focus on the rate of query has implications on the value of the data, as expressed in the following equation:

Value = (Volume ± Variety ± Velocity) x Totality/Time

The rate of query also has significant implications in terms of which technologies are deployed to store and process the data and to actually put the data to use in delivering business insight and value.

Getting back to the points made by Alistair and Tim in relation to the Unreasonable Effectiveness of Data, it would seem that to date there has been more focus on what Google referred to as “a lot of data”, and less on the “simple models” to deliver value from that data.

There is clearly a balance to be struck, and the answer lies not in ‘big data’ but “more clue” and defining and delivering those “simple models”.

1 Comment

Subscribe via RSS

About this blog.

<< Home

Search
Tags
10gen 12c 451 451 events 451 group 451 reports 451 webinars 1010data Accel Accelerite Accenture accumulo Acquia Actian Actuate Acunu Adaptive Insights Adaptive Planning Adobe ADVIZOR aerospike Ahana AI AIIM Airbyte Aiven Akiban Alation aleri Alfresco Algorithmia Alibaba AllegroGraph Alluxio Alooma Alpine Data alpine data labs Altair alteryx Altiscale amazon Amazon RDS Anaconda analytics anaplan Anodot apache Apache Beam Apache Cassandra Apache Drill Apache Hadoop apache Tajo apama Apple arangodb Arcadia Data Archiving Arm artificial intelligence Ascend ASE ASG aslett Aster aster data Ataccama Atlas AtScale Attivio Attunity Aurora automated decision intelligence Automated Insights AutoML Autonomy Avalanche aws Ayasdi azure Azure Data Lake Azure Machine Learning Azure SQL Database Azure SQL Data Warehouse basho BeyondCore big data Big Data Appliance BigID BigInsights BigML bigquery Bigstep bigtable Bime birst BIRT Bitwise BitYota Blockchain bluedata bmc BOARD Bonsai boomi BusinessObjects C3 calpont Cambridge Semantics Capgemini CAP Theorem Cascading Cask Cask Data cassandra Cazena CCPA CData Centage CenturyLink CEP Chartio Chris Dale Cirro Cisco Citrusleaf citus data ClearDB ClearStory ClearStory Data cloud cloudant cloud database cloudera cloud sql cloudwick ClusterHQ clustrix CockroachDB Cockroach Labs CodeFutures CognitiveScale Cognos Collibra complex event processing Compose Composite concurrent conference Confluent context relevant continuent Continuuity Continuum Continuum Analytics CoolaData coral8 couchbase couchdb Coveo CPM Crate Crate.io CrateDB Cray Crunchy Data CSC DarwinAI data data-driven Data3sixty data Artisans database Database Migration Service data bazaar databricks Data Catalog Data Fabric Dataflow data governance DataHero data hub Dataiku data lake datameer DataOps Dataproc Datarobot DataRPM data science DataStax DataTorrent data warehouse data warehousing Datawatch Datical Dato Datometry Datomic Day Software DB2 DBaaS deep Deep IS DeepSQL Dell delphix Denodo Dgraph Digital Reasoning Diyotta Docker DocumentDB Domino Data Domino Data Lab Domo dotData doug cutting Drawn to Scale Dremio drill druid Drupal dynamoDB e-disclosure e-discovery ECM EDB eDisclosure eDiscovery Einstein Elastic elasticsearch embedded analytics EMC EMR Endeca enterprise 2.0 enterprisedb EraDB Esgyn exadata Exalead exalytics Exasol Experian Data Quality facebook FairCom FAST Fauna fedora Fivetran Flink Fluree FORMCEPT foundationdb franz Fuzzy Logix Galactic Exchange galera Garantia Data gazzang GemFire geniedb GigaSpaces glassbeam Global IDs GoGrid gooddata Google Google Cloud Google Cloud SQL graph GraphDB GraphLab GraphQL greenplum GridGain Guavus H2O H2O.ai Hadapt hadoop HANA HarperDB Hasura HAVEn HAWQ hazelcast HBase HDInsight Hitachi Hitachi Vantara hive hortonworks Host Analytics HP HPE hstreaming Huawei Hunk HyPer Hyperledger Hypertable ibi IBM Idera Iguazio Immuta impala Impetus Imply in-memory Incorta indeed.com indicee Indico infinidb infinitegraph InfluxData InfluxDB infobright infochimps Infogix Informatica Information Builders information governance infosys Infoworks ingres InkTank Instaclustr integration Intel Interana InterSystems Interwoven Io-Tahoe IoT IRIS isilon Isys Search jaspersoft JethroData Jive Software jkool John Newton JSON JustOne K2View kafka Kalido Karmasphere Kinesis Kinetica Kitenga KNIME Knowi Kogentix kognitio Koverse Kubernetes Kudu KXEN Kx systems Kyligence Kyndi Kyvos Insights Lavastorm LegalTech Leonardo Lexalytics Lightbend Linguamatics LinkedIn Loggly Logi Logi Analytics LogTrust Logz.io Looker luciddb Lucid Imagination LucidWorks Lumira M&A Maana machine learning Magnitude Manhattan map MapD MapR mapreduce MariaDB MariaDB Foundation marklogic Mathworks Matillion matt aslett mattaslett matthew aslett matthewaslett MDM membase memcached MemSQL Mesos Mesosphere metamarkets Metanautix MetaScale Metric Insights Micro Focus Microsoft microstrategy Mike Lynch mLab MLOps Mode Analytics mongodb MongoHQ MongoLAb Monte Carlo Mortar Data MuleSoft multi-model MySQL MySQL Cluster narrative science Nebula Graph neo Neo4J neo technology NetApp netezza New Relic NewSQL NGDATA Nirvanix Nodeable Noodle.ai noSQL nosql road show Nuix NuoDB Nutonian NuvolaBase Nvidia objectivity ObjectRocket Octopai ODPi Okera OmniSci OneStream OneTrust Ontotext open source openstack OpenText Open Text opera Oracle Orchestra Orchestrate OrientDB Outlier Pachyderm palantir Panoply panorama paraccel ParallelM Parelastic parquet ParStream Paxata pentaho Pepperdata percona Periscope Data pervasive PingCAP Pivotal PlanetScale platfora Podium Data PolyBase Postgres PostgreSQL Powerset Precisely Precog PredictionIO Predixion Presto Prevedere Privacera progress Qlik qliktech qlikview Qubole quest QuickSight Quid Quiterian R rackspace Radoop rainstor Rapid-I RapidMiner RDS Recommind red hat redis Redis Labs Redshift Reltio RethinkDB Revelytix revolution revolution analytics riak Riak CS Ringside Networks Riversand RJMetrics RStudio ryft SaaS Sagemaker Salesforce Salesforce.com SAND SanDisk SAP SAS SAS Institute Savvis scaleArc ScaleBase Scaleout schooner ScyllaDB Search-based applications Serengeti ShareInsights SharePoint Simba Sinequa SingleStore Siren SiSense Sisu SkySQL skytree SlamData SnapLogic snowflake social networking social software Socialtext Softlayer Software AG Spacecurve SpagoBI spanner Spark Splice MAchine Splunk Spotfire SPSS SQL SQl Azure SQL Server SQLstream SQream Sqrrl Starburst Starburst Data Starcounter Stardog Statistica Statwing stinger Stitch Storage storm Stratio StreamAnalytix streambase Streamlio Streamsets Striim Sumo Logic Survey Swarm64 sybase Symantec Syncsort Tableau Tachyon Talend Tamr TARGIT Tecton tempodb TempoIQ TensorFlow teradata terracotta Tesora TextAnalyticsSummit tez the 451 group ThingSpan ThoughtSpot tibco Tidemark TigerGraph Timescale timesten TimeXtender Titan toad Tokutek total data Translattice Treasure Data trifacta Trillium Trove Twitter Unifi Unifi Software Unisys Varada vectorwise Veristorm Verizon Versant vertica VEsoft Vignette virtualization Vivisimo Viya vmware voldemort voltdb Vora WANdisco Waterline Waterline Data watson WCM webinar WhereScape WibiData Wipro wise.io Workday xeround Xplenty xtremedata Yahoo! YarcData YARN Yellowbrick Yellowfin Yottamine YugaByte Zaloni Zementis Zen ZEPL zettaset Zimory Zoho Zoomdata
Twitter: maslett
- @Peston @sturdyAlex I ain't reading all that. I'm happy for u tho. Or sorry that happened. 05:31:49 PM June 13, 2023 from Twitter for iPhone in reply to Peston Reply Retweet Favorite
- He did win a massive democratic mandate but that didn’t give him carte blanche to lie with impunity. @BorisJohnson… https://t.co/OKCsqKXifc 02:28:43 PM June 11, 2023 from Twitter for iPhone Reply Retweet Favorite
- The odds on a 2023 U.K. General Election are looking very tempting right now… 11:01:30 AM June 10, 2023 from Twitter for iPhone Reply Retweet Favorite
- RT @lewis_goodall: Untruths being peddled everywhere. Johnson wasn’t forced out by anyone but himself. The Privileges Committee (Conserva… 08:32:06 AM June 10, 2023 from Twitter for iPhone Reply Retweet Favorite
@maslett
Categories
- 2.0
- Archiving
- Collaboration
- Content management
- CRM
- Data management
- eDiscovery
- Internet
- M&A
- Mobile
- PaaS
- Search
- Storage
- Text analysis
- Uncategorized
451 Group blogroll
Recent Posts
Subscribe via Email

Enter your email address:
Delivered by FeedBurner
Archives
- August 2021
- July 2021
- June 2021
- May 2021
- April 2021
- March 2021
- February 2021
- January 2021
- December 2020
- November 2020
- October 2020
- September 2020
- August 2020
- July 2020
- June 2020
- May 2020
- April 2020
- March 2020
- February 2020
- January 2020
- December 2019
- November 2019
- October 2019
- September 2019
- August 2019
- July 2019
- June 2019
- May 2019
- April 2019
- March 2019
- February 2019
- January 2019
- December 2018
- November 2018
- October 2018
- September 2018
- August 2018
- July 2018
- June 2018
- May 2018
- April 2018
- March 2018
- February 2018
- January 2018
- December 2017
- November 2017
- October 2017
- September 2017
- August 2017
- July 2017
- June 2017
- May 2017
- April 2017
- March 2017
- February 2017
- January 2017
- December 2016
- November 2016
- October 2016
- September 2016
- August 2016
- July 2016
- June 2016
- May 2016
- April 2016
- March 2016
- February 2016
- January 2016
- December 2015
- November 2015
- October 2015
- September 2015
- August 2015
- July 2015
- June 2015
- May 2015
- April 2015
- March 2015
- February 2015
- January 2015
- December 2014
- November 2014
- October 2014
- September 2014
- August 2014
- July 2014
- June 2014
- May 2014
- April 2014
- March 2014
- February 2014
- January 2014
- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- June 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- August 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
- January 2010
- December 2009
- November 2009
- October 2009
- September 2009
- August 2009
- July 2009
- June 2009
- May 2009
- April 2009
- March 2009
- February 2009
- January 2009
- December 2008
- November 2008
- October 2008
- September 2008
- August 2008
- July 2008
- June 2008
- May 2008
- April 2008
- March 2008
- February 2008
- January 2008

Beyond ‘big data’

Search

Tags

Twitter: maslett

Categories

451 Group blogroll

Recent Posts

Subscribe via Email

Archives