Data warehouses, a multi-dimensional being (2)

Keen eyed readers would have observed that the original two “part ones” of this article merged and slightly downsized, as part of this process the name ‘1b’ was deleted from history. This posting is dedicated to the words and memory of that fallen ‘b-ing’

Attributes and Hierarchies

Attributes :- ‘A quality or characteristic inherent in or ascribed to someone or something.’ I would add two further qualifications for a data warehousing definition: The ‘something’ that has the attribute is a dimensional quantity (such as customer or product) and the attribute could be used as a the basis of a query. For example if the product was a lady’s green t-shirt, size small, then the colour attribute would ‘green’ and a the size attribute could be ‘lady’s small’. The key thing here is that each attribute would only have a single value for a given product – a green t-shirt is green, not red. We could write queries against our clothing data warehouse to find total sales for green clothes; this total would include all products that shared the ‘green’ attribute. To be useful as an attribute I would also say that more than one member of a dimension should share the same attribute. For a customer attribute, street address (such as ‘23 Railway Cuttings, East Cheam’) would not be useful as this would usually identify a single customer; a possible exception being geo-coded addresses on customers which could have benefits in a spatially aware DWH; city however could be better as an attribute as it may identify 100s of customers. Likewise date of birth may not be a good attribute choice but age range or (for demented data-miners) star sign could be useful.

If you consider a geographically organised dimension such as store location it is possible to create a hierarchy of attribute levels using say Store, City, State, Region, and Country. Each level of this inverted tree structure is another potential level of aggregation or roll-up in the data warehouse. Most of these hierarchies will have a top level of ‘TOTAL’ or ‘ALL’. A possible exception to this would be the time dimension where ‘TOTAL for EVER’ is not the most meaningful of measures (time hierarchies commonly stop at the year level) Dimensions can have more than one hierarchy going up through them; again time is common example of this where fiscal reporting may not match with calendar reporting – here only the base level (in my example, day) is common to both hierarchies. Another example is store type and store location, location is a geographic hierarchy and store type could have single intermediate level that contains the attribute values of convenience stores, city centre and out-of-town; both hierarchies share the common base level of stores and the top level of ‘Total’