Back in 2012 we introduced a way to “fix” the issues with Slowly Changing Dimensions in Dimensional Modeling. That script was actually sent to Ralph Kimball, but seeing as not much has happened in the following seven years, we decided to become a bit more formal and write a paper about it. It is entitled “Temporal Dimensional Modeling” and can be read on ResearchGate. Even if you are not interested in Dimensional Modeling (disclaimer: we are not either), you can learn a lot about temporality and improper ways to manage it, through the existing SCD types. Of particular general interest is the twine, a clever way to find historically correct relationships, given information that is stored differently.
Here is the abstract from the paper:
One of the prevalent techniques for modeling data warehouses is and has for the last decade been dimensional modeling. As initially defined, it had no constructs for keeping a record of changes and only provided the as-is latest view of available information. Since its introduction and from increasing requirements to record changes, different approaches have been suggested to manage change, mainly in the form of slowly changing dimensions of various types. This paper will show that every existing type of slowly changing dimension may lead to undesired anomalies, either at read or at write time, making them unsuitable for application in performance critical or near real-time data warehouses. Instead, based on current research in temporal database modeling, we introduce temporal dimensions that make facts and dimensions temporally independent, and therefore suffer from none of said anomalies. In our research, we also discovered the twine, a new concept that may significantly improve performance when loading dimensions. Code samples, along with query results showing the positive impact of implementing temporal dimensions compared to slowly changing dimensions are also presented.
The experiments in which performance was measured was done using Transact-SQL in Microsoft SQL Server. The code for reproducing the tests is available on GitHub.
Hi,
Thanks for sharing.
I was reading the paper “Temporal Dimensional Modeling” and looking at the query for the twine (Fig. 11) and I think I’m getting something the wrong way. Intuitively I would think I’m only looking for dim ValidFrom that are older or equal to factDate which I would expect to be achieved using a frame along the order by Timepoint in the window function. But there’s no such limit. Am I wrong?
The ”order by Timepoint” in the over-clause will (somewhat unintuitively) already make the MAX cumulative. I don’t think adding a frame like “preceding unbounded and current row” will change the execution plan, but it’s worth checking.
Thanks Lars,
Do you know if the running max behavior when window is defined with order by is specific to SQL Server or common to other SQL databases?
I found this related post in a forum:
https://www.sqlservercentral.com/forums/topic/max-window-function-with-order-by