SAP-Sybase: the database rationale

The 451 Group has published its take on the proposed acquisition of Sybase by SAP. The full report provides details on the deal, valuation and timing, as well as assessing the rationale and competitive impact in three core areas: data management, mobility, and applications.

As a taster, here’s an excerpt from our view of the deal from a database perspective:

The acquisition of Sybase significantly expands SAP’s interests in database technology, and the improved ability of the vendor to provide customers with an alternative to rival Oracle’s database products is, alongside mobile computing, a significant driver for the deal. Oracle and SAP have long been rivals in the enterprise application space, but Oracle’s dominance in the database market has enabled it to wield significant influence over SAP accounts. For instance, Oracle claims to be the most popular database for deploying SAP, and that two-thirds of all SAP customers run on Oracle Database. Buying a database platform of its own will enable SAP to break any perceived dependence on its rival, although this is very much a long-term play: Sybase’s database business is tiny compared to Oracle, which reported revenue from new licenses for database and middleware products of $1.2bn in the third quarter alone.

The long-term acquisition focus is on the potential for in-memory database technology, which has been a pet project for SAP cofounder and supervisory board chairman Hasso Plattner for some time. As the performance of systems hardware has improved, it is now possible to run more enterprise workloads in memory, rather than on disk. By using in-memory database technology, SAP is aiming to improve the performance of its transactional applications and BI software while also hoping to leapfrog rival Oracle, which has its disk-based database installed base to protect. Sybase also has a disk-based database installed base, but has been actively exploring in-memory database technology, and SAP can arguably afford to be much more aggressive about a long-term in-memory vision since its reliance on that installed base is much less than Sybase’s or Oracle’s.

SAP has already delivered columnar in-memory database technology to market via its Business Warehouse Accelerator (BWA) hardware-based acceleration engine and the SAP BusinessObjects Explorer data-exploration tool. Sybase has also delivered in-memory database technology for its transactional ASE database with the release of version 15.5 earlier this year. By acquiring Sybase, SAP has effectively delivered on Plattner’s vision of in-memory databases for both analytical and transaction processing, albeit with two different products. At this stage, it appears that SAP’s in-memory functionality will quickly be applied to the IQ analytic database while ASE will retain its own in-memory database features. Over time, expect R&D to focus on delivering column-based in-memory database technology for both operational and analytic workloads.

In addition, SAP touted the applicability of its in-memory database technology to Sybase’s complex-event-processing (CEP) technology and Risk Analytics Platform (RAP). Sybase was already planning to replicate the success of RAP in other verticals following its acquisition of CEP vendor Aleri in February, and we would expect SAP to accelerate that.

Meanwhile, SAP intends to continue to support databases from other vendors. In the short term, this will be a necessity since SAP’s application software does not currently run on Sybase’s databases. Technically, this should be easy to overcome, although clearly it will take time, and we would expect SAP to encourage its application and BI customers to move to Sybase ASE and IQ for new deployments in the long term. One of the first SAP products we would expect to see ported to Sybase IQ is the NetWeaver Business Warehouse (BW) model-driven data-warehouse environment. SAP’s own MaxDB is currently the default database for BW, although it enables deployment to Oracle, IBM DB2, Microsoft SQL Server, MaxDB, Teradata and Hewlett-Packard’s Neoview. Expect IQ to be added to that list sooner rather than later, and to potentially replace MaxDB as the default database.

I have some views on how SAP could accelerate the migration of its technology and users to Sybase’s databases but – for reasons that will become apparent – they will have to wait until next week.

Categorizing the “Foo” fighters – making sense of NoSQL

One of the essential problems with the covering the NoSQL movement is that it describes not what the associated databases are, but what they are not (and doesn’t even do that very well since SQL itself is in many cases orthogonal to the problem the databases are designed to solve).

It is interesting to see fellow analyst Curt Monash facing the same problem. As he notes, while there seems to be a common theme that “NoSQL is Foo without joins and transactions,” no one has adequately defined what “Foo” is.

Curt has proposed HVSP (High-Volume Simple Processing) as an alternative to NoSQL, and while I’m not jumping on the bandwagon just yet, it does pass the Ronseal test (it does what it says on the tin), and it also matches my view of what defines these distributed data store technologies.

Some observations:

  • I agree with Curt’s view that object-oriented and XML databases should not be considered part of this new breed of distributed data store technologies. There is a danger that NoSQL simply comes to mean non-relational.
  • I also agree that MapReduce and Hadoop should not be considered part of this category of data management technologies (which is somewhat ironic since if there is any technology for which the terms NoSQL or Not Only SQL are applicable, it is MapReduce).
  • The vendors associated with the NoSQL movement (Basho, Couchio and MongoDB) are in a problematic position. While they are benefiting from, and to some extent encouraging, interest in NoSQL, the overall term masks their individual benefits. My sense is they will look to move away from it sooner rather than later.
  • Memcached is not a key value store. It is a cache. Hence the name.
  • .
    There are numerous categorizations of the various NoSQL technologies available on the Internet. Without wishing to add yet another to the mix, I have created another one – more for my benefit than anything else.

    It includes a list of users for the various projects (where available), and also some sense of whether the various projects fit into CAP Theorem, an understanding of which is, to my mind, essential for understanding how and why the NoSQL/HVSP movement has emerged (look out for more on CAP Theorem in a follow-up post on alternatives to NoSQL).

    Here’s my take, for those that are interested. As you can see there’s a graph database-shaped whole in my knowledge. I’m hoping to fill that sooner rather than later.

    By the way, our Spotlight report introducing The 451 Group’s formal coverage of NoSQL databases will be available here imminently.

    Update: VMware has announced that it has hired Redis creator Salvatore Sanfilippo, and is taking on the Redis key value store project. The image below has been updated to reflect that, as well as the launch of NorthScale’s Membase.

    The future of the database is… plaid?

    Oracle has introduced a hybrid column-oriented storage option for Exadata with the release of Oracle Database 11g Release 2.

    Ever since Mike Stonebraker and fellow researchers at MIT, Brandeis University, the University of Massachusetts and Brown University presented (PDF) C-Store, a column-oriented database at the 31st VLDB Conference, in 2005, the database industry has debated the relative merits of row- and column-store databases.

    While row-based databases dominated the operational database market, column-based database have made in-roads in the analytic database space, with Vertica (based on C-Store) as well as Sybase, Calpont, Infobright, Kickfire, Paraccel and SenSage pushing column-based data warehousing products based on the argument that column-based storage favors the write performance required for query processing.

    The debate took a fresh twist recently as former SAP chief executive, Hasso Plattner, recently presented a paper (PDF) calling for the use of in-memory column-based storage databases for both analytical and transaction processing.

    As interesting as that is in theory, of more immediate interest is the fact that Oracle – so often the target of column-based database vendors – has introduced a hybrid column-oriented storage option with the release of Oracle Database 11g Release 2.

    As Curt Monash recently noted there are a couple of approaches emerging to hybrid row/column stores.

    Oracle’s approach, as revealed in a white paper (PDF) has been to add new hybrid columnar compression capabilities in its Exadata Storage servers.

    This approach maintains row-based storage in the Oracle Database itself while enabling the use of column-storage to improve compression rates in Exadata, claiming a compression ratio of up to 10 without any loss of query performance and up to 40 for historical data.

    As Oracle’s Kevin Closson explains in a blog post: “The technology, available only with Exadata storage, is called Hybrid Columnar Compression. The word hybrid is important. Rows are still used. They are stored in an object called a Compression Unit. Compression Units can span multiple blocks. Like values are stored in the compression unit with metadata that maps back to the rows.”

    Vertica took a different hybrid approach with the release of Vertica Database, 3.5, which introduced FlexStore, a new version of the column-store engine, including the ability to group a small number of columns or rows together to reduce input/output bottlenecks. Grouping can be done automatically based on data size (grouped rows can use up to 1MB) to improve query performance of whole rows or specified based on the nature of the column data (for example, bid, ask and date columns for a financial application) to improve query performance.

    Likewise, the Ingres VectorWise project (previously mentioned here) will create a new storage engine for the Ingres Database positioned as a platform for data-warehouse and analytic workloads, make use of vectorized execution, which sees multiple instructions processed simultaneously. The Vectorwise architecture makes use of Partition Attributes Across (PAX), which similarly groups multiple rows into blocks to improve processing, while storing the data in columns.

    Update – Daniel Abadi has provided an overview at the different approaches to hybrid row-column architectures and suggests something I had suspected, that Oracle is also using the PAX approach, except outside the core database, while Vertica is using what he calls a fine-grained hybrid approach. He also speculates that Microsoft may end up going the third route, fractured mirrors – Update

    Perhaps the future of the database may not be row- or column-based, but plaid.