Background
Some database vendors are adding support for ranges as new data types in their database engines. Ranges are intervals whose boundaries are defined using some other data type. For example [22, 42) is an integer range starting at and including the value 22 and ending with but excluding the value 42. When creating a table, vendors will add some way of specifying that a data type is a range, somewhat similar to the following:
CREATE TABLE myTable ( myNormalColumn int, myRangeColumn int range)For example PostgreSQL will have this support in the upcoming 9.2 version. With the new ANSI/ISO standard SQL:2011, periods are introduced, and a periods are ranges over time. Once vendors implement support for this, periods will be defined using the following syntax:
CREATE TABLE myTable ( myNormalColumn int, myPeriodStart date, myPeriodEnd date, PERIOD FOR myPeriod(myPeriodStart, myPeriodEnd))With the addition of ranges/periods a lot of functionality is also added in order to work with these. For example, specifying constraints that ranges may or may not overlap, conditional operators for comparing if one range follows after another and functions for getting the start or end point of a range.
Motivation
In a certain type of data modeling called
bitemporal modeling, currently gaining a lot of traction due to its ability to handle data according to new legislations like the Sarbanes-Oxley Act and BASEL3, ranges play an important role. Bitemporal modeling use two temporal ranges to keep track of the history of the information. One for when different versions of the information was valid, and one for when that information was known to some agent (or stored in some memory). This way, a financial report can for example at any time be recreated exactly as it looked at its original time of creation, even if it contained errors that later were corrected. The same report can also be recreated as it should have looked at its original time of creation given the corrections made.
In order to retrieve the temporally stored information very complex queries have to be written. These can, however, be drastically simplified with the addition of a new type of operation on ranges in the database.
Innovation
Given a number of ranges and some way of ordering these, it is possible to do a parallel projection in the plane to retrieve what these ranges look like seen from the top or bottom.
Say we have the following set of ranges and an additional column used for the ordering:
1, [12, 16]2, [10, 18]3, [13, 20] These ranges can be viewed as an ordered stack of lines:
10 11 12 13 14 15 16 17 18 19 201 --------------2 --------------------------3 -----------------------Seeing this stack from the top (using a plane parallel projection) we see the following (now non-overlapping) ranges:
1, [12, 16] (the whole range can be seen)2, [10, 11] (one of the protruding parts)2, [17, 18] (the other protruding part)3, [19, 20] (most of the range is hidden from view)Seeing this stack from the bottom we instead see the following ranges:
3, [13, 20]2, [10, 12]Changing the ordering from ascending to descending produces the same result as for viewing from top or bottom, so these two methods are interchangeable for producing the same result.
There are several different algorithms for doing parallel projections, all of which are low cost in comparison to what would have to be done in a database lacking this functionality to achieve the same result. How to syntactically specify that a projection should be done in a query is left up to the vendor or the SQL standardization committee.
The above example of a projection does not cover all cases. We also need to deal with non-inclusive intervals, for example [12, 16), in which the end point is not included. Open-ended intervals should also be handled, like [12, *), in which there is no specified end point. Finally, the values used in the ordering may not be unique, possibly causing ranges to merge for some ordinal. Neither of these cases is particularly complicated, but the above should convey the idea for now.
Applicability
Assume that a third column is added, for example containing the rating of a financial instrument.
1, [12, 16], A2, [10, 18], B3, [13, 20], CThis could represent bitemporal data, where the first column indicates the time when we received the information and the second column the period during which the rating was valid. For example, at time 1 we are told that the rating was A during the period between 12 and 16. In a real-life scenario, times would most likely not be integers but dates.
A parallel projection from the top when ordering over the first column can now be used to find the latest available information:
1, [12, 16], A2, [10, 11], B2, [17, 18], B3, [19, 20], CWhich seen as a timeline becomes:
10 11 12 13 14 15 16 17 18 19 20B B A A A A A B B C CThe same procedure can be used on a subset, say where the first column is > 1, in order to find what the latest available information looked like at a previous recording time. Such a condition is very simple to add using available database language constructs. Parallel projections in a database makes it very simple to “time travel” in bitemporally modeled data. There may of course also be many other areas in which such projections can be useful.