Amazon Redshift in Cambridge

  • AWS
  • Big Data
  • Data
  • Presentations
  • Redshift
Posted on

Amazon Redshift in Cambridge

Peter Marriott

After Amazon Web Services' Redshift was first launched in 2012 it became the fastest growing service in AWS's history (according to Werner Vogels at AWS Summit 2015).

To make such an impact Redshift obviously fulfilled a pent-up need. As a Data-Warehouse-as-a-Service it taps into the trend for ever-growing data volumes and more varied, flexible analytics delivered on-demand.

Looking At Redshift

If you look at a Redshift database in a query tool it looks a lot like a traditional RDBMS such as Oracle, Microsoft SQL Server or PostgreSQL. In fact, Redshift is based on PostgreSQL 8.0.2.

However, rather than being optimised to support Transactional workloads it is optimised to support Analytic workloads. Typically the workload for a transactional database is based around storing/retrieving/updating the properties for a single item e.g. order 67632. In contrast analytic workloads usually operate on a few properties but do so for massive number of items e.g. the average of all sales for each region. When the data volume is comparatively small an RDBMS is suited to either task, but as the quantity of data grows the time taken to execute analytic-style queries in an ad-hoc fashion will diminish.

Find Out More: Introducing Redshift

I gave an introduction to Redshift at Cambridge AWS User Group on as part of their Big Data! meeting.

In the session we looked at Redshift flexing it's muscles on a two node, Compute dense cluster, with a test data set of over 2 billion rows. With simple queries we saw results that require a full table scan taking between 10 and 30 seconds. More complex workloads with joins, sub-queries, unions and where clauses taking between 1 and 2 minutes 30 seconds. The data was organised using a star schema and as we heard later on in the evening this give significant performance advantages to traditional relational model.

Slides for the talk can be downloaded from Introduction to Amazon Redshift.