Know who your customers are – part 3
So you managed to go through all those hoops and have a way to tie (at least some) transactions to identifiable individuals. But can you be sure that each customer on your customer table is different person?
Cleaning of customer data is notoriously difficult. True duplicates should not happen if your data load cleansing rules (look-ups, constraints etc) work but what about non-identical records that refer to the same person. Going through my junk snail-mail I often find two envelopes with the same mail shot (strangely, it always is opportunities for me to invest my cash and never cash-off vouchers for something I buy everyday). Closer examination of the envelopes gives some idea of the ways in which I get misidentified. Ignoring the mailing company that has bought my address on two or more mailing lists, I find: misspellings of my name (!) or street address (I could understand that one); using various combinations of first names and initials; changing my title (I’ve been Mr, Mrs, Ms, Dr); omitting part of my address (If district is omitted the post office still can find me, my street name is unique in Great Britain if not the world); using the wrong postal / ZIP code (they can change!) and finally, the times when (my fault here) I register for something twice and end up with two unique customer numbers. And that’s not counting the stuff that gets sent to former addresses.
So if mailing companies get it wrong how do I get my own act together in a data warehouse? Cleansing addresses can be achieved by validating against third-party address validator; tools such as Oracle Warehouse Builder have wizards that use these third-party products to construct address-cleansing maps with relative ease. These products can correct addresses, standardise name and address components and add extra content (such as geocoding). But is the data warehouse the right place to cleanse customer data – surely this should happen further upstream in the business data-flow?