In the previous articles (here
[https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-1], and here
[https://www.rittmanmead.com/blog/2016/12/etl-offload-with-spark-and-amazon-emr-part-2-code-development-with-notebooks-and-docker/]
) I gave the background to a project we did for a client, exploring the benefits
of Spark-based ETL processing running on Amazon's Elastic Map Reduce