Amazon launches DynamoDB. Red Hat virtually supports JasperReports. And more.
An occasional series of data-related news, views and links posts on Too Much Information. You can also follow the series @thedataday.
* Amazon Web Services Launches Amazon DynamoDB See also blog posts from Werner Vogels and Jeff Barr, as well as reaction from DataStax and Basho.
* Jaspersoft Delivers Analytics for Red Hat Enterprise Virtualization Customers JasperReports Server is embedded in Red Hat Enterprise Virtualization 3.0.
* Tableau 7.0 Brings Simplicity to Business Intelligence Including new Data Server for data sharing and management.
* Hortonworks to Deliver Next-Generation of Apache Hadoop Pre-announcement (emphasis on the pre).
* RainStor Announces First Enterprise Database Running Natively on Hadoop as well as partnerships with Cloudera, Hortonworks, and MapR, and support from Composite Software.
* Talend Platform for Data Services Operationalizes Information and Data A common development, deployment and monitoring environment for both data management and application integration.
* Fujitsu Launches Cloud Services as a Platform for Big Data Data Utilization Platform Services.
* All you wanted to know about Hadoop, but were too afraid to ask A graphic illustration of the various versions of Apache Hadoop.
* Oracle Database or Hadoop? Another good post from Pythian’s Gwen Shapira. See also Aaron Cordova’s Do I need SQL or Hadoop?
* Meet Code 42, Accel’s first Big Data Fund investment GigaOM has the details.
* MapR CEO Sees Big Changes in Big Data in 2012 Predictive.
* Introducing DataFu: an open source collection of useful Apache Pig UDFs LinkedIn launches open source user-defined functions.
* Big Data Needs Data Scientists, Or Quants, Or Excel Jockeys … or something.
* Career of the Future: Data Scientist [INFOGRAPHIC] Infotaining.
* Knives out for Oracle. SAP and IBM offer some perspectives on Exalytics and Big Data Appliance respectively.
* For 451 Research clients
# Information Builders uses Infobright to take BI in-memory, expands SMB reach Market development report
# RainStor launches database complement to Apache Hadoop Market development report
# Heroku’s Postgres is poised for growing interest in database as a service Market development report
* Google News Search outlier of the day: This Spud’s For All of You: “2012 Is the Year of the Potato”
And that’s the Data Day, today.
At last year’s 451 Group client event I presented on the topic of database management trends and databases in the cloud.
At the time there was a lot of interest in cloud-based data management as Oracle and Microsoft had recently made their database management systems available on Amazon Web Services and Microsoft was about to launch the Azure platform.
In the presentation I made the distinction between online distributed databases (BigTable, HBase, Hypertable), simple data query services (SimpleDB, Microsoft SSDS as was), and relational databases in the cloud (Oracle, MySQL, SQL Server on AWS etc) and cautioned that although relational databases were being made available on cloud platforms, there were a number of issues to be overcome, such as licensing, pricing, provisioning and administration.
Since then we have seen very little activity from the major database players with regards to cloud computing (although Microsoft has evolved SQL Data Services to be a full-blown relational database as a service for the cloud, see the 451’s take on that here).
In comparison there has been a lot more activity in the data warehousing space with regards to cloud computing. On the one hand there data warehousing players are later to the cloud, but in another they are more advanced, and for a couple of reasons I believe data warehousing is better suited to cloud deployments than the general purpose database.
For one thing most analytical databases are better suited to deployment in the cloud thanks to their massively parallel architectures being a better fit for clustered and virtualized cloud environments.
And for another, (some) analytics applications are perhaps better suited to cloud environments since they require large amounts of data to be stored for long periods but processed infrequently.
We have therefore seen more progress from analytical than transactional database vendors this year with regards to cloud computing. Vertica Systems launched its Vertica Analytic Database for the Cloud on EC2 in May 2008 (and is wotking on cloud computing services from Sun and Rackspace), while Aster Data followed suit with the launch of Aster nCluster Cloud Edition for Amazon and AppNexus in February this year, while February also saw Netezza partner with AppNexus on a data warehouse cloud service. The likes of Teradata and illuminate are also thinking about, if not talking about, cloud deployments.
To be clear the early interest in cloud-based data warehousing appears to be in development and test rather than mission critical analytics applications, although there are early adopters and ShareThis, the online information-sharing service, is up and running on Amazon Web Services’ EC2 with Aster Data, while search marketing firm Didit is running nCluster Cloud Edition on AppNexus’ PrivateScale, and Sonian is using the Vertica Analytic Database for the Cloud on EC2.
Greenplum today launched its take on data warehousing in the cloud, focusing its attention initially on private cloud deployments with its Enterprise Data Cloud initiative and plans to deliver “a new vision for bringing the power of self-service to data warehousing and analytics”.
That may sound a bit woolly (and we do see the EDC as the first step towards private cloud deployments) but the plan to enable the Greenplum Database to act as a flexible pool of warehoused data from which business users will be able to provision data marts makes sense as enterprises look to replicate the potential benefits of cloud computing in their datacenters.
Functionality including self-service provisioning and elastic scalability are still to come but version 3.3 does include online data-warehouse expansion capabilities and is available now. Greenplum also notes that it has customers using the Greenplum Database in private cloud environments, including Fox Interactive Media’s MySpace, Zions Bancorporation and Future Group.
The initiative will also focus on agile development methodologies and an ecosystem of partners, and while we were somewhat surprised by the lack of virtualization and cloud provisioning vendors involved in today’s announcement, we are told they are in the works.
In the meantime we are confident that Greenplum’s won’t be the last announcement from a data management focused on enabling private cloud computing deployments. While much of the initial focus around cloud-based data management was naturally focused on the likes of SimpleDB the ability to deliver flexible access to, and processing of, enterprise data is more likely to be taking place behind the firewall while users consider what data and which applications are suitable for the public cloud.
Also worth mentioning while we’re on the subject in RainStor, the new cloud archive service recently launched by Clearpace Software, which enable users to retire data from legacy applications to Amazon S3 while ensuring that the data is available for querying on an ad hoc basis using EC2. Its an idea that resonates thanks to compliance-driven requirements for long-term data storage, combined with the cost of storing and accessing that data.
451 Group subscribers should stay tuned for our formal take on RainStor, which should be published any day now, while I think it’s probably fair to say you can expect more of this discussion at this year’s client event.