Entries from March 2008 ↓

Is H-Store the future of database management systems?

Given his past involvement in the creation of Ingres and Postgres (not to mention Vertica and StreamBase), when Michael Stonebraker starts talking about a new database research project, the world tends to sit up and listen.

Such is the case with H-Store, a new approach to the OLTP database proposed by Stonebraker, along with Samuel Madden at MIT, Daniel Abadi at Yale, and Stan Zdonik at Brown, amongst others.

Details of the H-Store project were presented (PDF) at the VLDB conference in September 2007 but have returned to the agenda in
recent weeks thanks to an online discussion between Curt Monash and Stonebraker.

Monash has also put together a nice Q&A for ZDnet that introduces H-Store from which we learn that Stonebraker “proposes to manage
high-end OLTP databases entirely in RAM, with no disks, no redo logs, little multi-threading, and only optimistic locking. And by the way, he wants to replace SQL with Ruby or Python.”

As you might guess from that description, H-Store calls for a complete re-write of OLTP systems, and it does so on the quite simple premise that current OLTP engines were not designed to run on today’s architectures. “Current OLTP database designs, which date largely from the 1970s, are based on several assumptions about the architecture of database applications and hardware
that are less true today than they were 30 years ago,” states the overview.

The H-Store project contends that for the most part OLTP systems can be run in memory on a shared nothing cluster of machines, and that doing so increases performance. Early results (in the PDF above) indicate that H-Store ran transactions 82x faster than a traditional relational database set-up.

Current OLTP systems were also designed when there as single database market, which brings us back to Monash’s post on the diversity of database management systems and his attempts to categorize them. Monash’s four categories are: high end OLTP (Oracle, DB2…), specialty data warehouse (Teradata, Netezza, Vertica…), mid-range relational (MySQL, PostgreSQL, OpenEdge…) and embedded relational (SQL Anywhere, TimesTen).

Although Stonebraker argues there should be two data warehouse sectors (row- and column-based), both agree that H-Store will be positioned squarely at the high-end OLTP category (so going up against Oracle, but only for a portion of what Oracle is currently used for).

So can we expect to see Oracle handing over the OLTP database crown to Stonebraker et al? Perhaps, but not for a very long time to come. As Monash notes in his ZDnet article:

“H-Store lags C-Store [which became Vertica] by about three years, and Vertica started enjoying significant sales late last year, so a first approximation would suggest H-Store will be useful some time in 2010, with the first serious academic prototype being finished late this year. But let’s not assume H-Store will succeed commercially as fast as C-Store did. It’s one thing to adopt a complex-analytics product that will only ever have a handful of actual users, and quite another to bet a super-high-volume OLTP system on unproven technology.”

We will be watching with interest.

Welcome to Too Much Information

Welcome to the new 451 Group blog about information management. What’s information management, you may ask?

It’s the confluence of a variety of strategies organization employ to get their arms and exploit the myriad sources of data and information at their disposal. Specifically this means 451’s coverage of the following areas:

  • Search
  • Collaboration
  • Content management
  • Text analysis
  • eDiscovery
  • Archiving
  • Storage
  • Databases (relational & otherwise)
  • Business intelligence
  • Master & metadata management

It is written mainly by Kathleen Reidy and myself, and both of us will be at the AIIM Expo this week in Boston where we will be taking the temperature of the content management market & talking with a bunch of vendors and end users.

More on that and Drupalcon this week.