Somewhere in between data and metadata there is another kind of information, which we will name peridata. Perhaps you have found yourself looking at some piece of information and thinking, is this data or metadata? In this article, not only will you get a precise definition of what is what, but also a term for data living on the fringe of its classification. In order to achieve these definitions, we will turn to the posit, which is the fundamental building block of transitional modeling.
Posits
A posit essentially captures a piece of information. Here are two examples:
p1 = [{(Archie, beard)}, fluffy red, 2020-01-01] p2 = [{(Archie, husband), (Bella, wife)}, married, 2004-06-19]
The first posit, p1, captures the information that Archie had a fluffy red beard on the 1st of January 2020. The second posit, p2, captures the information that Archie and Bella are married since the 19th of June 2004. Posits can express properties, as in p1, and relationships, as in p2. In transitional modeling, relationships are properties that require more than one thing to take on a value. Such an approach may be unfamiliar, since in most other modeling techniques there are separate constructs for properties and relationships. The proper way to read those two posits, using the notion of roles, is:
When Archie filled the beard role the value ‘fluffy red‘ appeared on 2020-01-01.
When Archie filled the husband role and Bella the wife role the value ‘married‘ appeared on 2004-06-19.
A singular thing filling a singular role gives rise to what we usually call properties or attributes, whereas a combination of things filling a combination of roles give rise to relationships. Whenever roles are filled, some value appears. In the case of Bella and Archie it could just as well have been ‘divorced’, ‘planned’, or ‘not applicable’. In fact, for the vast majority of people we could fill the roles with the relationship is ‘not applicable’, but we tend to document these only in the rare cases such posits carry valuable information.
Given the terminology of things (Archie, Bellla) and roles (beard, husband, wife), the structure of a posit can be formalized as:
posit = [ {(thing 1, role 1), ..., (thing n, role n)}, appearing value, time of appearance ]
The set in the first position of the posit is called an appearance set, followed by the for that set appearing value and its time of appearance. Posits are just pieces of information and there is no requirement that they must be true. After all, there is a lot of untrue information out there and much more, maybe even most, that is uncertain to some degree. We do not want to disqualify any information from being recorded based on its certainty.
Data and Metadata
We will now make the distinction between data and metadata. Given an appearance set, if all the things it contains are not posits, then posits containing that set are classified as data. Correspondingly, given an appearance set, if at least one of the things it contains is a posit, then posits containing the set are classified as metadata. The examples given so far are data, since neither Archie nor Bella is a posit. Instead, one of the most important examples of metadata in transitional modeling is:
p3 = [{(p1, posit), (Bella, ascertains)}, 1.00, 2020-01-02]
There is no way to determine its truthfulness from a posit alone, so an additional construct is needed. An assertion is a posit that assigns a certainty to another posit. In the example above, Bella ascertains the posit about Archie’s beard, with absolute certainty on the 2nd of January 2020. This is metadata, since p1 is a posit. Assertions are subjective, and so far we only have Bella’s view of p1. Certainty is expressed by a real number in the interval [-1, 1], where 1 is being absolutely certain of what the posit is stating, 0 is having no idea whatsoever, and -1 being certain of the opposite of what the posit is stating. If you want to delve deeper into the expressiveness given by this machinery, you can read the paper “Modeling Conflicting, Unreliable, and Varying Information“.
Another common type of metadata, particularly in data warehouses, has to do with from which source posits originated.
p4 = [{(p3, source)}, The Horse's Mouth, 2020-01-01]
There could be a whole range of information related to the posit itself, like who or what recorded it, when it was entered into a database, its associated security or sensitivity, effective constraints at the time, or rules to apply in certain scenarios. These are just some examples, but all of which would be classified as metadata, because they involve a posit in their appearance sets.
Since metadata is also expressed using posits, these can be parts of appearance sets as well. For example, in p4 the assertion p3 is a part of its appearance set, so p4 is also metadata, but on a different “level” than the already metadata p3. In such a case it makes sense to distinguish these as level-1 metadata and level-2 metadata, which could be extended up to any level-n metadata. I believe that going beyond level-1 metadata is unusual in existing implementations, and that there may be few use cases that need additional levels. However, when they are needed, they are probably also very important.
Peridata
While the rules separating data and metadata are clear cut, the way to tell data from peridata is less straightforward. In transitional modeling it is possible to reserve roles for particular purposes. One such example is used for classification.
p5 = [{(Archie, thing), (Person, class)}, active, 1972-08-20]
This posit tells us that Archie belongs to the Person class since 1972-08-20, using the reserved class role. Thanks to classification being expressed through posits, it is possible to disagree on these using assertions. It is also possible to have multiple classifications at once and to let classifications expire or become active at different points in time.
As you can see, there is no posit in the appearance set of p5, so it is not metadata by our previous definition. Although, the model is likely something that traditionally would have been classified as metadata. In order to distinguish this type of data from regular data, we will use the concept of reserved roles. But then, what are reserved roles? Well, you can think of them as being similar to reserved keywords in a programming language. In fact, in the examples so far, the roles posit, ascertains, thing, and class are already reserved in transitional modeling. The roles beard, husband, and wife depend on your domain and are instead something you as a modeler will have to bring into existance.
With this we can get definitions for all three categories.
- If at least one of the things contained in an appearance set is a posit, then all posits with this set are classified as metadata.
- If at least one of the roles contained in an appearance set is reserved, then all posits with this set are classified as peridata.
- If neither of above applies to an appearance set, then all posits with such sets are classified as data.
Peridata exists among your data, but sort of on the fringe, given that it requires these reserved roles. Note that it is possible to have peridata for your metadata as well, when both 1 and 2 apply. Transitional modeling will come with a set of reserved roles, all of which are domain independent, but there will also be an option for end users to reserve roles of their own.
Remarks
Thanks to transitional modeling, we have been able to break down what is traditionally thought of as a single metadata concept into two categories, metadata and peridata. On the fringe of your data you will find peridata, short for peripheral data, which capture such things as the classifications in your domain. Metadata is restricted to those pieces of information that explicitly talk about other pieces of information. Whether this distinction is useful remains to be seen, but it is certainly interesting. In a relational database, for example, the classifications in the modeled domain exists as a schema. Schemas are therefore peridata. Perhaps you can think of other commonly used model artifacts that fall within the scope of peridata or metadata?
On a side note, there are already some indications that the use of reserved roles can improve performance in a database engine based on posits. If you are interested in following the developement of such an engine, check out bareclad.
Ok, New name’s for old problems. In FBM/ORM this pattern has been observed for several decades. It has its own name, object to role transformations. There are several “solutions ” for this, based on what you try to achieve, but basically it is about deciding where to draw your modeling abstractions. Since transitional modeling is used as a baseline here, you can classify it from there. But this is certainly not a generic modeling approach.
I have to look at the “object to role transformations”. Do you have any reading material you can suggest?