Well I am feeling a bit smug

Peter Scott

May 17, 2005 • 1 min read

Rang the user who created last week's query from hell... he still needed to find that mystery transaction. It turns out the police were investigating a credit card fraud and knew the amount but no other details (strange they did not know the date (even to the nearest month would help) or store, but hey).
So this afternoon I decided to give the guy a lesson in writing efficient SQL. First things first, the guy had created the query using drag and drop in Business Objects. The main problem here was that the resulting query joined three chunky (600M, 20M and 0.6M rows) tables to output a single (possibly more - there could always be more than one transaction that had the same exact spend) row. His query read all of that data and then sorted it!
The most performant way to tackle this is to minimise the amount of data required. As a minimum we need to calculate the total values of each sales transaction over the last 2.25 years (so that's the 600M row table). This information is held at day level (the partition key) and is identified by a unique invoice number. So our simplified query becomes:

SELECT invoice_date, invoice_number, sum(sale_value) from invoice_lines group by invoice_date, invoice_number having sum(sale_value) = 1234.56

Bingo - this returned 1 row in around 15 minutes (and far better than the users attempt on Friday which was aborted after 8 hours) By wrapping this query as an inline view we can then join to the invoice header (on date (partition key) and invoice number) to get the customer key and thence the customer details such as name. So we look for the document header in a single partition (say 0.2M rows instead of 20M) and then find the single customer (1 row instead of 600,000)

Result one impressed user (but I won't tell him I changed the DOP on the table from 8 to default - the query ran on 56 slaves!)

Sign up for more like this.