We want to thank Certus for their guest blog. Part 3 in a 3-part series on revisiting the definition of Information.
In this third blog post in a three-part series, we’re exploring the implications of both the definition of information and the Laws of Nature and how it applies to your Data Vault. Read the first and second blog posts.
Here’s a quick recap of the complete characterisation for reference:
And the 10 Laws of Nature for information:
LAW 1: Anything material, such as physical/chemical processes, cannot create something non-material.
LAW 2: Information is a non-material fundamental entity and not a property of matter.
LAW 3: Information requires a material medium for storage and transmission.
LAW 4: Information cannot originate in statistical processes.
LAW 5: There can be no information without a code.
LAW 6: All code results from an intentional choice and agreement between sender and recipient.
LAW 7: The determination of meaning for and from a set of symbols is a mental process that requires intelligence.
LAW 8: There can be no new information without a purposeful and intelligent sender.
LAW 9: Any given chain of information can be traced back to an intelligent source.
LAW 10: Information comprises the non-material foundation for all:
- Technological systems
- Works of art (music, visual etc.)
- Biological systems.
Let’s explore the implications of both the definition and the laws. The first two layers we call data.
When we add the last three layers it becomes information.
When you evaluate a platform such as SAP, Peoplesoft, TechnologyOne, or basically any operational system within your organisation, you measure it against the definition of information, which often yields very interesting results:
- Do we have repeated symbols within the system (alphabet and numerals) – Yes
- Do they appear syntactically as phrases and sentences – yes
- Does the data hold meaning in context of the business process and definitions – Yes
- Are there practical tasks and follow-ups required in those processes – Yes
- Are senders and receivers within business processes and across business processes present with an expected result or action – Yes
However, if we take the data out of the overall system, we have just data. Even though there may be 1000’s of business rules and complex orchestrations, the only element transferable is the data.
This is what we struggle with when we deal with data and information. It keeps on phasing in and out of the full definition. We are confusing the necessary mechanical processes of transmission and sending (data) with the notion of information.
Business rules in business intelligence suffers the same issue. If we run more business rules (which requires data as inputs), then momentarily we have a fraction of information while the rules execute, but the result – the only evidence of that whole process – is just more data.
To truly deliver information, in any data warehouse world, we must build the metadata to capture the true context additional to the received data (like definitions, ownership, classification, lineage, appropriate use and more) to be able to truly call it information.
So we can observe how these laws play out in the architecture of the Data Vault.
Data Vault receives data, integrates it, addresses missing data (technical debt/business rules – resulting data in business vault) and delivers data to end users at which point the data becomes information. Because of the involvement of the end user, who has a specific purpose in mind and an expected action sequence and therefore result, only then does the full definition of information apply.
That is why the consumption layer of the Data Vault refers to Information Marts – and the warehouse is called a Data Vault – recognising the shift in state of information.
There are further parallels from the Data Vault definition with other science themes. We tend to find that everything stable, reliable and universally defined and constant we can express or describe via their components:
- We describe the universe in context of time, space (where matter is managed in relation to other matter – relational) and matter (identifying).
- In matter (with all the laws that apply) we have components that build up matter – electrons, protons and neurons. These can be expressed in combination or relationships and are described by the number or electrons it holds or attracts.
- Music can only exist if all three its components are present – melody (identifying), harmony (relational) and rhythm (time context)
We find that all the above components happen to be described in context of their phases as well.
- Matter can exist in three phases – solid, liquid and gas.
- Time can be expressed as past, present and future.
We also find that information (with all the laws that apply) have components as well on which Data Vault is based – the business key (the identifying element), relationships between those keys and descriptive elements over time, describing the key or the relationships.
Where Data Vault goes one step further in helping to preserve as much of the information state as possible is by:
- Integrating on the business key
- Providing the opportunity to associate data around the same component, recording a partial representation of the full business process by capturing how these keys are used within business processes
- Providing a timeline of change of the particulars over time for such a key or relationship and how they are expressed
- Not only does it deal with what is now evident, it can adapt without re-engineering to ingest new or changed data and relationships
Data Vault still requires to be augmented by a good metadata framework of glossaries, lineage, definitions, ownership, sensitivity classifications, access requirements and other governance aspects to ensure it has the most value when it is used. The Laws of Nature exist and govern the expression of these components that make up the building blocks of all that we observe.