Hi Lars,
Your approach for making anchor modeling bi temporal sound very interesting and useful.
Let me summarize for myself, to be sure that I understand you correctly:
1. Add Recording Time to every attribute, tie and knot table (every table except anchors).
2. Also add reliability bit to every table that has Recording Time.
3. Add the recording time to the primary key.
This allows us to:
1. Store revisions of attribute and knot values. (and tie's). E.g. the rating was set to A, but that was based on a wrong calculation. It should be B.
2. Store revisions of the changing timeline. E.g. the rating was valid from the 17th of March, but it should be valid starting from the 16th of march.
It seems like an elegant solution, but I do have some questions and suggestions.
Suggestion 1: retracting and asserting. The insert statements with reliability = 0 retract previous knowledge, while the statements with reliability = 1 assert new knowledge. (makes me think about Prolog...).
Maybe it's useful to use this terminology. A property of these two concepts is that something that's asserted can always be retracted and
something that's retracted cannot be retracted, only re-asserted.
Observation 1: value is irrelevant for retracted recordsRetraction is done by inserting a new record with identical anchor key + valid date, but a new recording date and a reliability value of 0. The value
doesn't really matter.
E.g. On Day3 some late arriving facts are handled as followed
1. retract B rating on the 16th
2. retract A rating on the 18th
3. retract ? rating on the 19th
4. assert C rating on the 15th
5. assert ? rating on the 20th
1. insert into FI_RAT_FinancialInstrument_Rating values (1, 'B', '2011-03-16', '2011-03-19', 0, 42);
has the same meaning as
1. insert into FI_RAT_FinancialInstrument_Rating values (1, 'Z', '2011-03-16', '2011-03-19', 0, 42);
Suggestion 2: ETLThe ETL should be smart enough to retract all existing records that overlaps the changing time interval of the newly asserted row.
(e.g. in day 3 you insert a new C rating valid from the 16th, but this new record overlaps with rating records on the 16,18 and 19 of March,
which all need to be retracted).
So the ETL should be two phased. Retract overlapping records and insert the new record.
Maybe it's usefull to automate the retraction so that it retracts overlapping records by default.
For example by using a stored procedure retractAll (tableName, FromValidDate, AnchorKey).
Only this degrades performance, because it is row based and this will not work on a batch insert.
Same reasoning applies to triggers.
Of course you can make your ETL in such a way that it batch inserts all retractions first and then batch insert the new values.
Because the same table stores assertions and retractions, it might grow very fast and become slower, depending on the rate of retractions.
It might be better to move all outdated rows (with respect to recording time) into another table (like you suggest in another forum post) in one batch.
Then delete them in another batch. and then insert the new values in a third batch.
Question 1: End-dating an intervalOn Day1 you show a way to end an attribute interval.
1. insert into FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-17', '2011-03-17', 1, 42);
2. insert into FI_RAT_FinancialInstrument_Rating values (1, '?', '2011-03-19', '2011-03-17', 1, 42);
The A rating is not known anymore starting from the 19th of March.
Why use the '?' character and what do we use when the datatype is datetime or int? Isn't it better to allow NULL values for these cases?
If we look at the latest view, do we want to make a distinction between values that were never known (which are NULL) and values which valid
interval ended, like in the example above?
How does this work for ties?
MariedTo
person, mariedTo, ChangedAt , RecordedAt, Reliability
1 ,2 , 2001-04-01, 2009-02-01, 1
Now at 2004-02-01 the marriage is ended. The prescribed way to model this is to make this tie a knotted tie.
person, mariedTo, ChangedAt , RecordedAt, Reliability, isMaried
1 ,2 , 2001-04-01, 2003-02-01, 1 , 1
1 ,2 , 2004-02-01, 2005-10-01, 1 , 0
It seems like someone made a mistake with the papers, so the marriage was not really ended. The mistake is
corrected at 2005-10-12.
person, mariedTo, ChangedAt , RecordedAt, Reliability, isMaried
1 ,2 , 2001-04-01, 2003-02-01, 1 , 1
1 ,2 , 2004-02-01, 2005-10-01, 1 , 0
1 ,2 , 2004-02-01, 2005-10-12, 0 , 0
So the interval was ended, but this was retracted a few days later and now the interval is open again.
The difference between end dating attributes and ties is that we cannot use NULL values in ties, so we need to add a knot.
The disadvantage is that we have to model ties as knotted if you want to be able to end-date an interval.
Built-in support would be nice here, also because we prefer to handle end-dating in a more generic way.
Suggestion 3: End-Dating an intervalHow about adding an isValid bit to every table, that tells us whether the record is valid or not. If we use this for the attribute tables as well,
we don't have to use NULL values.
1. assert A rating from 17th of March.
(anchor key, value, changedAt, recordedAt, Reliability, isValid, metaData)
insert into FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-17', '2011-03-17', 1, 1, 42);
2. assert end of A rating from the 19th of March.
insert into FI_RAT_FinancialInstrument_Rating values (1, '?', '2011-03-19', '2011-03-17', 1, 0, 42);
3. retract 2.
insert into FI_RAT_FinancialInstrument_Rating values (1, '?', '2011-03-19', '2011-03-18', 0, 0, 43);
Suggestion 4: Use different table for retracted values. Instead of adding a Reliability column to every table we could just move all retracted records into another table
(like suggested by you and Jorg Jansen on the AM forum). When a record is in this table you know that there is another record with
the same anchor key and valid date, but a more recent recording date in the original table.
A record being in this table is equivalent to having a record with reliability = 0 in the other approach.
This way we keep the model simple, because we don't need a reliability column and also because we don't have to insert
a new record in order to retract another record.
The above example becomes as follows:
1. assert A rating from 17th of March.
(anchor key, value, changedAt, recordedAt, isValid, metaData)
insert into FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-17', '2011-03-17', 1, 42);
2. assert end of A rating from the 19th of March.
insert into FI_RAT_FinancialInstrument_Rating values (1, '?', '2011-03-19', '2011-03-17', 0, 42);
3. retract 2.
(anchor key, value, changedAt, recordedAt, isValid, metaData)
insert into HISTORY_FI_RAT_FinancialInstrument_Rating values (1, '?', '2011-03-19', '2011-03-17', 0, 44); *
delete from FI_RAT_FinancialInstrument_Rating where FI_ID = 1 and ChangedAt = '2011-03-19' and RecordedAt = '2011-03-17'
* we can get the retracted timestamp from the metaData in the retracted table.
We can simplify it even further, by removing the recordedAt time:
1. assert A rating from 17th of March.
(anchor key, value, changedAt, isValid, metaData)
insert into FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-17', 1, 42);
2. assert end of A rating from the 19th of March.
insert into FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-19', 0, 42);
3. retract 2.
(anchor key, value, changedAt, recordedAt, isValid, metaData)
insert into HISTORY_FI_RAT_FinancialInstrument_Rating values (1, 'A', '2011-03-19', '2011-03-17', 0, 44); *
delete from FI_RAT_FinancialInstrument_Rating where FI_ID = 1 and ChangedAt = '2011-03-19'
* we can get the retracted timestamp from the metaData in the retracted table.
** we can get the recordedAt time from the metaData in the original table.
This way we have the original recorded time and the recorded time of the retraction.
After we retracted the ending of the A rating at 2011-03-19, we can of course assert a new rating starting
on the 19th of march, or even before this date (keeping in mind that
inserting a new rating before or on the 17th involves retracting the rating record on the 17th as well).
FinallyI have modified your sql in order to test my approach and I get exactly the same results.
Get the SQL here:
http://pastebin.com/9dZLAfve Note that I use a stored procedure for retraction, but this can be changed into ETL code that supports batch insertion (if performance is an issue).
Please let me know what you think.
Kind regards,
Bas van den Berg