To make such an impact Redshift obviously fulfilled a pent-up need. As a Data-Warehouse-as-a-Service it taps into the trend for ever-growing data volumes and more varied, flexible analytics delivered on-demand.
Looking At Redshift
If you look at a Redshift database in a query tool it looks a lot like a traditional RDBMS such as Oracle, Microsoft SQL Server or PostgreSQL. In fact, Redshift is based on PostgreSQL 8.0.2.
However, rather than being optimised to support Transactional workloads it is optimised to support Analytic workloads. Typically the workload for a transactional database is based around storing/retrieving/updating the properties for a single item e.g. order 67632. In contrast analytic workloads usually operate on a few properties but do so for massive number of items e.g. the average of all sales for each region. When the data volume is comparatively small an RDBMS is suited to either task, but as the quantity of data grows the time taken to execute analytic-style queries in an ad-hoc fashion will diminish.
Find Out More: Introducing Redshift
In the session we looked at Redshift flexing it’s muscles on a two node, Compute dense cluster, with a test data set of over 2 billion rows. With simple queries we saw results that require a full table scan taking between 10 and 30 seconds. More complex workloads with joins, sub-queries, unions and where clauses taking between 1 and 2 minutes 30 seconds. The data was organised using a star schema and as we heard later on in the evening this give significant performance advantages to traditional relational model.
Slides for the talk can be downloaded from Introduction to Amazon Redshift.