The amazing world of active metadata

Active Metadata blog scaled

By Tom Khabaza, Principal AI Consultant at Solidatus

Earlier this year, we kicked off a blog series on active metadata. In that first blog post, entitled From data to metadata to active metadata, I provided an overview of what it is and the challenges it addresses. 

To recap, metadata is data about an organization’s data and systems, about its business structures, and about the relationships between them. How can such data be active? It is active when it is used to produce insight or recommendations in an automated fashion. Below I’ll use a very simple example to demonstrate the automation of insight with active metadata.

In this – the second of this ongoing four-part series – we expand on the subject to look at the role of metadata in organizations, and how dynamic visualization and inference across metadata are essential to provide an up-to-date picture of the business. This is active metadata, and it is fundamental to the well-being of organizations.

Let’s start with a look at metadata in the context of an organization.

An organization and its metadata

Imagine a simple commercial organization. It has operational departments for sales, production and other operations, and corresponding systems which record their function. Transactions from each of these systems are copied into an accounting system, and from there into a financial report. The systems and the flow of data between them are depicted in a metadata model below:  

Blog02 screenshot01

Metadata showing systems and information flow 

Data quality is also metadata

Financial reports must be accurate, both because they are used by executives in decision-making, and because they are part of the public face of the company, enabling it to attract investment and comply with regulations. The quality of the data in all the systems which feed these reports are therefore monitored closely, in this example using a data quality metric “DQ”, a property of each metadata entity, is scored from 1 (very poor quality) to 10 (top quality). In the metadata view below, these DQ values are shown as tags; in this example all the data is of top quality (DQ 10). Assuming that data quality measurement is automatic, these values will be updated automatically on a regular basis. 

Blog02 screenshot02

Metrics showing top-quality data 

Data quality flows through the organization

A data quality metric is calculated from the data in the system to which it applies, but such metrics will often assume that the data being fed into the system is correct. This means that upstream data quality problems can have an undetectable impact on downstream data quality, so it’s important for data quality monitoring to take account of data flowing between systems. The metadata view below indicates the flow of good-quality data by a green arrow, and the presence of good-quality data by a green box. Again, this view indicates that top-quality data is present throughout. Unlike the previous view, however, constructing this view requires inference across the metadata: an entity will be displayed as green not only if it contains top-quality data but also only if all its incoming connections are providing top-quality data. This is a more informative view that will show the propagation of data quality problems when they occur. 

Blog02 screenshot03

Every data flow is top-quality 

Spotting local data quality problems

Now suppose that a data quality problem occurs in the operational systems: the data quality metric is reduced from 10 to 9. The data quality metrics view shows the problem clearly but does not show its consequences; local data quality metrics imply that everything is all right downstream from the problem. 

Blog02 screenshot04

A data quality problem shows up locally 

Inferring data quality problems actively

However, the more dynamic view of data quality shows the problem immediately. The active, inferential nature of this metadata allows us to color data flows red if they may be carrying low-quality data, and the systems red if they may hold low-quality data as a result of upstream data quality problems.  

Blog02 screenshot05

Active metadata inference shows the flow of data quality problems 

The downstream transmission of data quality problems is obvious in this view: the financial report cannot be trusted. This dynamic alert is the first step towards fixing the problem, but it goes further than that. An alert system of this kind means that, when no alert appears, the data and reports are trustworthy, an important attribute of data systems both for management decision-making and for regulatory compliance. 

Active metadata, inference and dynamic insight

To sum up, it is important that an organization recognizes metadata to be dynamic in nature and that this is made easy to visualize and react accordingly. It’s not enough to hold static information about the qualities of our systems and data; we must also keep it up to date with automatic metadata-feeds, infer continuously the consequences of any changes, and make these consequences visible immediately. Here I have illustrated the consequences of this approach for data quality and trust in financial reporting, but the same principles apply in a wide variety of contexts. 

Wherever data is used – which today is almost everywhere – active, dynamic, inferential metadata makes it more informative, trustworthy, controllable and, ultimately, governable. Active, inferential, visual metadata is essential for the well-governed organization. 

Quick Answer: What Is Active Metadata?

In the latest Gartner® report, find out what active metadata is, how to use it, and how to get started.

Infinity light HIGH RES scaled

Datactics, an award-winning self-service data quality and matching software vendor and Solidatus, the leading data lineage discovery, visualization and management solution, are today announcing a new technology partnership.

The partnership will see clients able to leverage the combined capabilities of the two firms to visualize and understand data quality throughout the enterprise, without being tied to one single enterprise data governance and quality platform. This provides greater flexibility to firms exploring their data governance strategies, empowering them to make use of ‘best of breed’ technology that aligns with their business operational models.

Additionally, the partnership aligns well with industry definitions of a data fabric, complementing specialist off-the-shelf data software with own-built technology stacks.

Datactics and Solidatus were delighted to both win awards at the Data Management Insight Awards Europe 2022 – a testament to our respective technology teams. Solidatus won ‘Best Data Governance Solution’ and Datactics won ‘Best Data Quality Analysis Tool’. You can read the full winners report here.

Datactics CEO, Stuart Harvey commented,

“By combining Datactics’ Self-Service Data Quality, a platform that identifies broken critical data elements and returns them to a data steward for fixing or automatic correction, and Solidatus’ superior data lineage visualization, the client gains a unique, reliable, and high-quality insight into their data. It will empower them to make better decisions, prioritize and fix key data quality issues, and address the negative impact of poor data quality wherever the data resides or flows throughout the enterprise.”

Solidatus CEO, Phil Dutton said,

“This partnership underlines our vision to revolutionize the data economy. Our commitment to lead market-disrupting innovation and first-class service is something we prioritize in all our partnerships and we’re eagerly looking forward to the opportunities this new relationship will bring, particularly in the fast-growing demand for best-in-class data fabric architecture.”

– Ends –

For more information please contact:

Solidatus press office
press@solidatus.com

Website
www.solidatus.com

About Solidatus

Solidatus is an innovative data management solution that empowers organizations to connect and visualize their data relationships, simplifying how they identify, access, and understand them. With a sustainable data foundation in place, data-rich enterprises can meet regulatory requirements, drive digital transformation, capture business insights, and make better, less risky and more informed data-driven decisions. We provide solutions to several key areas of endeavor, including: governance and regulatory compliance; data risk and controls; business integration; environment, social, governance (ESG); and data sharing. Our clients and investors include top-tier global financial services brands such as Citi and HSBC, healthcare, and retail organizations as well as government institutions.

About Datactics

Datactics offers a self-service, low-code data quality platform for non-technical users by providing AI/ML-driven automation in many data quality processes, especially in data profiling, monitoring and detection, and entity matching. ML augmentation includes auto suggestion of data quality rules based on analysis of the underlying dataset and many different prebuilt algorithms for outlier detection.

For more information please contact:

Roisin Floyd
roisin.floyd@datactics.com
+44 (0) 2890 233900

London Connections scaled

Back in 1995, a young Futures Trader Nick Leeson was working in Singapore on arbitrage trading on the main Tokyo index – the Nikkei 250 – for Barings Bank when he fraudulently hid massive financial losses from the bank in both London and Singapore. The losses incurred by the 200-year-old bank were estimated at $1.3 billion in unauthorised trades.

“I’m sorry” – two words left on a note in his apartment was the only admission of wrongdoing on Leeson’s part. And on the face of it, the story was simple: a rogue trader extorts a bank for millions. His plan, like that of a James Bond villain, at the time was thought to be so ingenious that the senior management of the bank were powerless. And there was no way for Peter Norris, the Head of Barings Bank, to discover the fraud until it was too late. Norris later called Leeson ‘An Agent of Destruction’. Further investigations revealed that Norris probably overstated Leeson’s capabilities; as everything started with a simple entitlement error, and was followed by a systemic failure at the bank.

During his time at Barings, Leeson was promoted from bookkeeper to general manager and chief trader, whilst also being responsible for settling his own trades. These jobs are typically held by two different people; one running the back office and one running the front. But as Leeson was able to run both through an entitlement error, he had the capability to hide his losses from both his superiors in Singapore and London. There was no grand plan, nor was Leeson a hyper-intelligent villain. If you remove just one of the positions from Leeson then Barings Bank, in some form, would probably still be here today.

With access to both front and back-office systems, Leeson was actively defrauding the bank. He had already accrued huge losses when he decided to sell TOPIX volatility, which then imploded with the Kobe earthquake, creating financial losses that the bank could not recover from. All his losses were funnelled through the infamous ‘five 8s’ account, while at the same time he managed to falsify profits back to London. Despite suspecting something was seriously wrong, London ultimately fell prey to Leeson’s inflated profit claims which leaves us to wonder: why did they keep advancing Leeson more money when the settlements were so low?

There is a famous phrase that states ‘In the Land of the Blind, the One-Eyed Man is King’ and this is probably the best analogy I can give. The senior managers were blinded by profits and did not understand the complexities of the markets and details of the trades, giving Leeson a place to hide his growing losses. To everyone’s amazement, Leeson was the only one who understood how the system worked and, ultimately, how to move data to exploit it.

When looking at banking data lineage now, we can appreciate how difficult it is to track a single thread of data through the data landscape without a solution like Solidatus. What would Barings have given to visualise how the data moved through their organisation, and with additional data quality scoring, tracking data anomalies up and down stream would have high-lighted Leeson’s hidden account.

The Solidatus Solution

Think of a large international bank. Let’s say they have 20,000 traders and back-office staff across the world, each of those individuals have access to hundreds of systems and hundreds of thousands of data points. With hundreds of organisational changes at a role and access level a day, just as it was back in 1995, this is an extremely complex and often manual process requiring an exacting focus from everyone involved in the process.

To understand the complexity of what a bank is facing, we have modelled an Entitlement Process in Solidatus. We imported users and their business roles from a HR system and associated them with the Active Directory (AD) Groups and AD Group capabilities. Business rales have been applied to confirm if individuals have appropriate access to systems/data based on location, function or role.

Entitlement Process:

Entitlements Process Model

Looking at the model, you can see where a user is based by the country code and the name, and country of employment. Selecting one user highlights their assigned AD Groups. Adding Rules to highlight access to multiple domains can also highlight toxic access rights combinations. When you can visualise the complexity of the Entitlement Process, it becomes obvious why the high risks of providing Leeson a toxic access rights combination at Barings was not appreciated.

Data Quality:

Data Quality Model

As we look back on the 25th anniversary of the Barings Bank collapse, if there is a lesson Leeson taught us, it is that complexity creates opportunity, both good and bad. Complexity devolves us of responsibility, creates shadows and dark places to hide data. So we have to ask ourselves, if we don’t understand the complexity of our data, are we creating a perfect storm for the next Nick Leeson?