Entries from January 2012 ↓

451 Research MySQL/NoSQL/NewSQL survey

I’ve just launched a new survey that should be of interest if you are currently using or actively considering MySQL or any of the NoSQL or NewSQL offerings

The aim of the survey is threefold:

– identify trends in database usage over time
– explore changing attitudes to MySQL following its acquisition by Oracle
– examine the competitive dynamic between MySQL and other database technologies, including NoSQL and NewSQL

There are just 12 questions to answer, spread over four pages, and the entire survey should take no longer than five minutes to complete.

All individual responses are of course confidential. The results will be published as part of a major research report due at the end of Q1. Thanks in advance for your participation.

The survey can be found at: http://www.surveymonkey.com/s/MySQLNoSQLNewSQL

The Data Day, Today: Jan 13 2012

Splunk files for IPO. Oracle updates its price list. And more.

An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

* Splunk Inc. Files Registration Statement for an Initial Public Offering And here it is.

* Oracle updated its Engineered System price list.

* Comparing Hadoop Appliances Great post from Pythian’s Gwen Shapira.

* What is big data? Edd Dumbill provides an introduction to the big data landscape.

* Why Couchbase? Damien Katz clarifies the reasons behind his preference for Couchbase over Apache CouchDB.

* Jaspersoft First to Develop Business Intelligence for Platform-as-a-Service BI suite now available with Red Hat OpenShift.

* Birst and ParAccel Partner to Deliver Scalable and Agile Big Data Analytics in the Cloud. Leverage.

* Recommind Names 451 Research Cofounder Nick Patience Director of Product Marketing and Strategy Our loss is Recommind’s gain.

* Oracle Unveils Oracle TimesTen In-Memory Database 11g Release 2 Performance and scalability improvements.

* Walkie Talkie App Voxer Soars Past a Billion Operations per Day powered by Basho Riak 10-4 good buddy.

* ISYS Search to Provide Enhanced Text Data Extraction Capabilities for New Generation of SAP Solutions OEM deal.

* Using SQLFire as a read-only cache for MySQL. VMware explains why and how.

* Announcing MySQL Enterprise Backup 3.7.0 Self-explanatory.

* Tableau Software Doubles Sales in 2011, Announces Massive Growth in Customer Roster Worldwide Customer base up by 40 percent in 2011.

* VoltDB Completes 2011 With Significant Market Growth and Company Expansion Including growth in new customer accounts of more than 300%.

* Clarabridge Wins Record Number of New Clients in 2011 More than 60 new Clarabridge Enterprise customers and more than 700 new Clarabridge Professional customers.

* For 451 Research clients

# Oracle selects Cloudera for Hadoop-based Big Data Appliance Market development report

# Microsoft may offer ‘big security data’ for free Analyst note

# Zimory considering virtual independence for cloud database business Market development report

# Jitterbit sheds light on growth strategy, integration business under new CEO Market development report

# SnapLogic snaps into the enterprise, shifts gaze away from midmarket integration Market development report

* Google News Search outlier of the day: My Best Friend’s Hair Launches Nationwide Website to Help You Find the Perfect Hairstylist

And that’s the Data Day, today.

NoSQL ≠ open source

I thought we finished with trying to define NoSQL in 2010 but Martin Fowler has raised the question again with his recent post – although he has a good reason to do so since he is collaborating on a book on the subject.

Fowler’s list of common characteristics (which he acknowledges is not definitional) is as follows:

  • Not using the relational model (nor the SQL language)
  • Open source
  • Designed to run on large clusters
  • Based on the needs of 21st century web properties
  • No schema, allowing fields to be added to any record without controls
  • You could argue about whether all NoSQL databases are designed to run on large clusters, but the characteristic from the list above that I would dispute is open source.

    While it is undoubtedly true to say that most NoSQL databases are open source, I don’t believe it defines them in the same way that other common characteristics do.

    The main argument for making open source licensing a requirement of NoSQL seems to me to be historical. The first NoSQL meeting, cited by Fowler, specified that it was about “open source, distributed, non-relational databases”.

    However, making open source licensing a defining characteristic of NoSQL would also exclude a number of products that would otherwise clearly fit the definition of NoSQL, as well as projects such as Google’s BigTable and Amazon’s Dynamo which were the genesis of much – although by no means all – of the momentum behind the NoSQL database movement.

    For the sake of argument let’s assume Amazon decided to release a version of Dynamo that could be deployed on-premise and for whatever reason decided not to release “Dynamo-on-premise” under an open source license.

    Is anyone seriously going to argue that a closed source “Dynamo-on-premise” wouldn’t be a NoSQL database?

    For what it’s worth since our NoSQL, NewSQL and Beyond report the description of NoSQL I have been using is:

  • A new breed of non-relational database products
  • sharing a rejection of fixed table schema and join operations
  • designed to meet scalability requirements of distributed architectures
  • and/or schema-less data management requirements
  • Although, like Fowler I would not claim this to be a definition.

    Who said you can’t go home again?

    Every new year represents some change; the hope of new challenges and opportunities. It is not all that often that a fresh new year also brings such literal and fundamental change, as it has for me this year. I ended 2011 on the vendor-side of things – and I am starting 2012 on the analyst side.

    Of course, this is highly familiar ground for me. I was not only an analyst with 451 Research previously, but I have also been a demi-analyst of sorts through my blogging and other non-traditional marketing activities with both SugarCRM  and Basho Technologies.

    Coming back to 451 Research is exciting for many reasons: this has always been a great team of highly intelligent individuals with great vision, and the type of analysis here is right up my alley.

    In that vein, I wanted to give a heads up around the kinds of technology innovation I plan to make my area of focus. I will cover, as I did in my first go-round here, on core CRM, ERP and other packaged applications. But the world of applications is changing, rapidly and in fascinating ways. I will also cover how social media (and other data sets) are influencing how developers build applications – and how end-users interact with them. Also, I see the cloud and platform-as-a-service creating new and exciting applications choices for businesses of all sizes. PaaS means many things to many people, but I believe we will see even more PaaS development around enterprise apps in the coming months.

    As noted above, Data in all its forms and sources is changing how we approach business. We have moved from leaving most of our enterprise data out of the applications we use daily to thinking about “Total Data” in just a few short years. This is an exciting area of technology development, and how data analysis plays into modern apps will be a focus. I am excited about working with the like of Matt Aslett and other team members on this research.

    I am also excited to be working with Kathleen Reidy around how technologies such as enterprise search, text analytics, and collaboration/content management tools are shaping new concepts like the “social enterprise.”

    Mobile apps – in the business sense – have taken much more of a front seat since I last covered applications – so I will try to keep on top of mobile as well. And again, this will be a collaborative effort to augment this existing strong mobile coverage here.

    To sum up…essentially, if it lies at the top of the stack, and is indicative of “cool new tech” – I will probably be interested.

    I look forward to speaking with some new, old and familiar technology providers. A lot has changed in the last five years since I last wore an analyst’s hat. But following this change from the vendor-side has given me an interesting angle. I hope my research and ideas offered through 451 Research’s many outlets reflects this in a positive and valuable manner for our ever-growing audience.

    The Data Day, Today: Jan 10 2012

    Oracle OEMs Cloudera. The future of Apache CouchDB. And more.

    An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

    * Oracle announced the general availability of Big Data Appliance, and an OEM agreement with Cloudera for CDH and Cloudera Manager.

    * The Future of Apache CouchDB Cloudant confirms intention to integrate the core capabilities of BigCouch into Apache CouchDB.

    * Reinforcing Couchbase’s Commitment to Open Source and CouchDB Couchbase CEO Bob Wiederhold attempts to clear up any confusion.

    * Hortonworks Appoints Shaun Connolly to Vice President of Corporate Strategy Former vice president of product strategy at VMware.

    * Splunk even more data with 4.3 Introducing the latest Splunk release.

    * Announcement of Percona XtraDB Cluster (alpha release) Based on Galera.

    * Bringing Value of Big Data to Business: SAP’s Integrated Strategy Forbes interview with with Sanjay Poonen, President and corporate officer of SAP Global Solutions.

    * New Release of Oracle Database Firewall Extends Support to MySQL and Enhances Reporting Capabilities Self-explanatory.

    * Big data and the disruption curve “Many efforts are being funded by business units and not the IT department and money is increasingly being diverted from large enterprise vendors.”

    * Get your SQL Server database ready for SQL Azure Microsoft “codename” SQL Azure Compatibility Assessment.

    * An update on Apache Hadoop 1.0 Cloudera’s Charles Zedlewski helpfully explains Apache Hadoop branch numbering.

    * Xeround and the CAP Theorem So where does Xeround fit in the CAP Theorem?

    * Can Yahoo’s new CEO Thompson harness big data, analytics? Larry Dignan thinks Scott Thompson might just be the right guy for the job.

    * US Companies Face Big Hurdles in ‘Big Data’ Use “21% of respondents were unsure how to best define Big Data”

    * Schedule Your Agenda for 2012 NoSQL Events Alex Popescu updates his list of the year’s key NoSQL events.

    * DataStax take Apache Cassandra Mainstream in 2011; Poised for Growth and Innovation in 2012 The usual momentum round-up from DataStax.

    * Objectivity claimed significant growth in adoption of its graph database, InfiniteGraph and flagship object database, Objectivity/DB.

    * Cloudera Connector for Teradata 1.0.0 Self-explanatory.

    * For 451 Research clients

    # SAS delivers in-memory analytics for Teradata and Greenplum Market Development report

    # With $84m in funding, Opera sets out predictive-analytics plans Market Development report

    * Google News Search outlier of the day: First Dagger Fencing Competition in the World Scheduled for January 14, 2012

    And that’s the Data Day, today.

    Total Data: delivering value from big data

    One of the problems I have with the term ‘big data’ is that it can covers a diverse set of products that can be applied to different problems. While ‘big data’ highlights the problem – volume/variety/velocity, and promises a solution – value, it doesn’t provide a path in between the two.

    The selection of appropriate technologies to deliver the value required from big data is central to the Total Data management concept, recently introduced in our long format report of the same name.

    In determining the potential value of a specific technology, we acknowledged that the volume, variety and velocity of data must be taken into account. However, we also considered the impact of processing the data in its totality, the query frequency, the desire to explore data rather than simply query it, and the dependency on existing skills and resources.

    This can be expressed as:

    ‘Total data’ = (Volume +/- Variety +/- Velocity)
    + (Totality +/- Exploration +/- Dependency +/- Frequency)

    The various technologies that can be considered ‘big data’ technologies have individual benefits based on the seven factors expressed in the equation above, with a significant amount of overlap. As such, mapping a combination of factors to a specific data management technology is no simple task. However, it is possible to express an approximation of how individual technologies relate to these seven factors.

    The graphic below illustrates the relationship between the factors impacting the generation of value from big data and the technologies discussed in the report.

    The fact that some technologies overlap does not necessarily mean that they should be considered appropriate for the same workloads, but it does illustrate that similar factors are driving the adoption of those technologies for their respective workloads. It illustrates, for example, that among the analytic technologies in particular, there is significant potential overlap in the drivers encouraging the adoption of EDWs and exploratory analytic platforms (EAPs), and EAPs and Hadoop.

    Our research indicates that these three platforms (and others) are being used across different companies for the same workloads.

    Although Hadoop is largely a complement to the EDW, we see a lot of confusion from would-be adopters about what workloads should be deployed on Hadoop, rather than on the EDW. While Hadoop is better suited to unstructured and semi-structured data and workloads that benefit from a more relaxed approach to schema, unfortunately there is no shortcut to determining which is the best technology to deploy for a particular workload.

    However, we have seen several companies discuss the approaches they have taken to solving this problem, which does provide some general guidance.

    For example, JPMorgan Chase has created a spider chart that assesses the relative strengths and weaknesses of traditional relational databases and what it calls ‘big-data analytics’ on Hadoop. While there is a small amount of overlap, the company has found that the strengths of traditional databases lie in transactional data update patterns, concurrent jobs, responsiveness and table join complexity. In comparison, Hadoop’s strengths lie in data volume per job, schema complexity, processing freedom and data volume in general.

    Another company that has built its own model to understand how different queries perform on different platforms is eBay, which has an added level of complexity with its Singularity EAP. The company has built its own model to understand how queries perform on the various platforms – in terms of system unit cost, units consumed, query cost, latency and parallel efficiency – to help users decide if they should be running queries against the EDW, Singularity or Hadoop. Using a standard Hive query, eBay was able to demonstrate that Hadoop performed well in terms of parallel efficiency and unit cost, while the EDW performed well in terms of units consumed and latency, and Singularity performed well in terms of query cost, latency and units consumed.

    Disney is another company that has taken its own approach to comparing potential deployment options, also adding NoSQL databases to its financial estimates and net-present-value analysis. While the company faced hardware, support, training and learning-curve costs in adopting Hadoop and NoSQL databases, it had to weigh that against the hardware, licensing and support costs of traditional relational databases. The most critical factor, however – and the most difficult to calculate – was the lost opportunity cost of not adopting new technologies, which was likely to limit Disney’s ability to execute on its strategic initiatives

    451 Research clients can get more detail about these projects, as well as a definition of exploratory analytic platform, datastructure, and queryable archive, by taking a look at our Total Data report.

    Welcome (back) Martin Schneider!

    It’s my pleasure to announce that we recently recruited Martin Schneider as a Research Manager (press release here), based in our San Francisco office. Martin will be focused primarily on the innovation and disruption taking place at ‘the top of the stack’ in the application software space; a part of the industry that is undergoing huge change through the combined impact of cloud, software-as-a-service and social media.

    This is actually Martin’s second stint with the company, and is part of the reason why we are so excited to have him back in the building. Martin first joined 451 in 2004, where as analyst and then senior analyst he spearheaded our coverage of the CRM software market. Since being tempted away to join the vendor side in 2007, Martin has worked in various senior marketing roles for two software startups, first for CRM specialist SugarCRM, and then for cloud storage and data management firm Basho.

    This experience at the sharp-end of the startup world, allied with Martin’s extensive industry knowledge, contacts, prolific work-rate and unbridled enthusiasm, means that we now have a top-class, commercially-minded analyst to spearhead coverage and help lead the broader debate in a critical part of the industry.

    Equally important is that Martin will be a key link within the 451 Research chain; after all, applications are what enterprise IT really cares about, and what the rest of the stack is optimized for. Therefore, Martin’s role will involve extensive collaboration with multiple 451 Research practices, analysts and research directors, especially around infrastructure and cloud computing, as well as information management.

    In other good news, we also announced that Matt Aslett has been promoted to the role of Research Manager. Frequent readers of this blog will already be very familiar with Matt, and this is a well-deserved promotion. Matt is recognized by the industry as being at the forefront in his field. His published analysis is thoroughly informed, insightful and prolific, our clients love him, and he has become a popular public speaker for all things data-related, especially around his concept of “total data.”

    With these two promotions 451 Research is starting 2012 with a bang. Welcome Martin and congratulations Matt.

    One other announcement is that Nick Patience recently left the company to pursue an opportunity in the vendor world. As a co-founder, Nick has played an instrumental role in the company’s growth and development over the last 12 years. We’re sorry to see him go, but wish him the best of luck in his new role on the dark side!

    More M&A to come in the name of “customer experience”

    When SDL finally came to terms with Alterian in December, we were inspired to take a look at this and other recent acquisitions that have been done as part of the broadening of WCM into Web-experience (or customer-experience) management.  Alterian brings SDL another WCM product, since Alterian acquired Mediasurface in 2008, but SDL is really after the real-time analytics and campaign management tools that are part of Alterian’s marketing automation portfolio.

    It strikes us that these areas are fairly far afield from SDL’s origins in language technology and services.  The deal wasn’t surprising though given how far SDL has gone into WCM.  It’s not enough today though at least at the high-end of the market to be in WCM without a broader play for online marketing / marketing automation.

    While there are some vendor attempts to grow web-experience management organically (Sitecore is probably most notable here), there has been a good deal of M&A inspired by bringing together WCM, web analytics, content targeting/recommendations, social and testing technologies, among others.

    We’ve put together a report that reviews many of these past deals and provides some predictive analysis of M&A in this sector — available here for 451 Research subscribers.

    Some forward-looking takeaways from this are:

    • There are few WCM independents left to be acquired, particularly in the non-.NET camp, though there are several potential acquirers that might still want a stronger WCM component.
    • CoreMedia may become a desirable target, as a rare independent with a Java codebase and high-end customers. Both SAP and IBM could pursue, though SAP seems more likely as CoreMedia is a German company and already plays the WCM part in SAP’s Web Channel Experience Management initiative.
    • WCM isn’t the only field for potential targets in the name of customer-experience or even more strictly in web-experience management.  Content targeting, analytics, and testing/optimization will all likely hold interest in 2012.
    • It’s not just the big IT players that have a role in this consolidating landscape, though Adobe, Oracle and IBM are key players to be sure.  We’ve also seen smaller players, like Norway’s eZ Systems, making small technology buys to round out their portfolios.  eZ bought two companies in 2011 — YOUCHOOSE for its recommendations engine and odoscope for web analytics.
    • There are lots of small technology providers in this sector, most are SaaS, and we expect there will more acquisitions like these to come.

    Reconsidering Oracle’s antitrust commitments to MySQL

    As I mentioned earlier this week, a major research focus for Q1 is the MySQL ecosystem, the positives and negatives of Oracle’s MySQL strategy, and the competitive overlap between MySQL, NoSQL and NewSQL.

    It is impossible to think about this without reconsidering the commitments made by Oracle to customers, developers and users of MySQL in late December 2009, which played a significant part in satisfying European Commission concerns about Oracle’s acquisition of Sun.

    While the commitments were both welcomed and derided when they were announced, it is worth considering today whether those commitments have been as significant in practice as they appeared to be two years ago.

    For example, Oracle’s commitment to and investment in InnoDB – while positive for MySQL users – has arguably diminished the relevance of some of the storage engine-related commitments.

    We will be coming to our own conclusions based on our research over the coming weeks, but I am interested in any feedback from MySQL customers, developers and users about how well Oracle has kept to its commitments and their significance in hindsight.

    You can find a full list of the commitments here but the edited highlights are below:

    1. Continued Availability of Storage Engine APIs.

    2. Non-assertion of copyright and no requirement for a commercial license related to implementing the storage engine APIs .

    3. Extension of any existing commercial storage engine licenses until December 10, 2014.

    4. Commitment to continue licensing MySQL using the GNU GPL.

    5. Customers would not be required to purchase support services from Oracle as a condition of obtaining a commercial license to MySQL.

    6. Increase spending on MySQL research and development.

    7. Commitment to create and fund a customer advisory board.

    8. Commitment to create and fund a MySQL Storage Engine Vendor Advisory Board.

    9. Commitment to retain the free MySQL Reference Manual.

    10. Retention of annual or multi-year subscription renewals for end-users and embedded customers.

    The Data Day, today: Jan 5 2012

    Apache Hadoop 1.0. The future of CouchDB (or Couchbase anyway). And more.

    Welcome to the first in an occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.

    * The Apache Software Foundation Announces Apache Hadoop v1.0 Self-explanatory.

    * The Future of CouchDB Apache CouchDB creator Damien Katz explains why he is focusing his attention on Couchbase Server.

    * Understanding Microsoft’s big-picture plans for Hadoop and Project Isotope Mary Jo Foley parses Alexander Stojanovic’s presentation.

    * MongoDB Extends Leadership in NoSQL 10gen claims more than 400 commercial customers.

    * 1010data’s Unique Big Data Analytics Platform Sees Stunning Growth in 2011 1010data runs the numbers on its adoption in 2011.

    * TouchDB 1.0 is out TouchDB is a lightweight CouchDB-compatible database engine suitable for embedding into mobile apps.

    * Data Scientist = Rock Star, Really? Virginia Backaitis is sceptical.

    * Swimming with Dolphins Splunk’s connector for MySQL.

    * What the Sumerians can teach us about data Pete Warden finds data inspiration at the British Museum.

    * How To (Not) Get Smart About Big Data Wim Rampen on the importance of filtering noise.

    * For 451 Research clients

    # Total Data: exploratory analytic platforms Spotlight report

    # Apache Hadoop reaches version 1.0, with more to come Analyst note

    # Acunu hones focus on ‘big data’ platform for operational analytics Market development report

    # Jaspersoft gets big into ‘big data,’ illuminates BI business momentum Market development report

    * Google News Search outlier of the day: “Bella” Becomes Most Popular Name for Both Dogs and Cats

    And that’s the Data Day, today.