Degenerate Dimensions Defined

Much of my work involves designing and reviewing dimensional data models, and an interesting issue that often comes up is how to deal with data items such as invoice number, order number and so on, that are not strictly facts - you're not going to want to add them up, or average them, or perform any other maths on them - but they don't seem to fit into existing dimensions.

Ralph Kimball coined the term 'Degenerate Dimensions' for these data items, as they perform much the same function as dimensions: they sit in the fact table and allow you to limit down or 'slice and dice' your fact table measures, but they aren't foreign key links through to dimension tables, as all the information you want - the invoice number, or the order number - is contained in the degenerate dimension column itself. Degenerate dimensions are useful as they tie the transactions, or events, in the fact table back to real-life items - invoices, orders and so on - and they can be a quick way to group together similar transactions for further analysis.

The key here is not to go overboard and make these degenerate dimensions into full dimension tables - for example, an Invoice dimensions - as in all likelihood this dimension table will grow at the same rate as your fact table. If there is other interesting information to go with the invoice - for example, who the customer was, what products were ordered - this is better placed in specific dimensions for customers and products where it can be stored as a kind of 'master copy', rather than storing it alongside each order in a balooning Invoice dimension.

The other advantage with degenerate dimensions is that they're a lot easier to build and maintain when using ETL tools such as Oracle Warehouse Builder, as you don't have to create dimension lookup tables, create synthetic keys, sequences and so on. Indeed, if you're loading your dimensional model into a multidimensional database such as Oracle OLAP, your database will be much smaller in size and easier to handle if you can keep the number of formal dimensions to a minimum, as they tend to 'explode' in size the more dimensions you add to the database.

Judicious use of degenerate dimensions keeps your dimensional model rational and your database size reasonable, whilst allowing you to keep useful items in the fact table that help us tie the data warehouse back to the original source systems.