Monday, April 28, 2008

Slowly Changing Dimensions

Slowly Changing Dimensions

Dimensions that change over time are called Slowly Changing Dimensions. For instance, a product price changes over time; People change their names for some reason; Country and State names may change over time. These are a few examples of Slowly Changing Dimensions since some changes are happening to them over a period of time.

Slowly Changing Dimensions are often categorized into three types namely Type1, Type2 and Type3.

The following section deals with how to capture and handling these changes over time.
The "Product" table mentioned below contains a product named, Product1 with Product ID being the primary key. In the year 2004, the price of Product1 was $150 and over the time, Product1's price changes from $150 to $350. With this information, let us explain the three types of Slowly Changing Dimensions.

Product Price in 2004:

Product ID(PK) Year Product Name Product Price
1 2004 Product1 $150

Type 1: Overwriting the old values.

In the year 2005, if the price of the product changes to $250, then the old values of the columns "Year" and "Product Price" have to be updated and replaced with the new values. In this Type 1, there is no way to find out the old value of the product "Product1" in year 2004 since the table now contains only the new price and year information.


Product ID (PK) Year Product Name Product Price
1 2005 Product1 $250

Type 2: Creating an another additional record.

In this Type 2, the old values will not be replaced but a new row containing the new values will be added to the product table. So at any point of time, the difference between the old values and new values can be retrieved and easily be compared. This would be very useful for reporting purposes.

Product Product ID(PK) Year Product Name Product Price
1 2004 Product1 $150
1 2005 Product1 $250

The problem with the above mentioned data structure is "Product ID" cannot store duplicate values of "Product1" since "Product ID" is the primary key. Also, the current data structure doesn't clearly specify the effective date and expiry date of Product1 like when the change to its price happened. So, it would be better to change the current data structure to overcome the above primary key violation.Product

Product ID(PK) EffectiveDateTime(PK) Year Product Name ProductPrice ExpiryDateTime
1 01-01-2004 12.00AM 2004 Product1 $150 12-31-200411.59PM
1 01-01-2005 12.00AM 2005 Product1 $250

In the changed Product table's Data structure, "Product ID" and "Effective DateTime" are composite primary keys. So there would be no violation of primary key constraint. Addition of new columns, "Effective DateTime" and "Expiry DateTime" provides the information about the product's effective date and expiry date which adds more clarity and enhances the scope of this table.

Type2 approach may need additional space in the data base, since for every changed record, an additional row has to be stored. Since dimensions are not that big in the real world, additional space is negligible.

Type 3: Creating new fields.

In this Type 3, the latest update to the changed values can be seen. Example mentioned below illustrates how to add new columns and keep track of the changes. From that, we are able to see the current price and the previous price of the product, Product1.Product

Product ID(PK) CurrentYear ProductName CurrentProductPrice Old ProductPrice Old Year
1 2005 Product1 $250 $150 2004

The problem with the Type 3 approach, is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2006, if the product1's price changes to $350, then we would not be able to see the complete history of 2004 prices, since the old values would have been updated with 2005 product information.Product

Product ID(PK) Year Product Name Product Price Old Product Price Old Year
1 2006 Product1 $350 $250 2005