Data lineage: The cornerstone of modern data governance

What's data lineage?

Data lineage maps the origin and flow of data across enterprise systems, showing upstream sources, downstream usage, interconnected applications, and dependencies. It has a history spanning several decades, originating in the early days of database management systems. Yet, only in the last two decades has it become mainstream with the increasing complexity of data ecosystems and the growing importance of data governance and compliance. 

Legacy vs modern data lineage

Traditional data lineage tools offer static diagrams that mostly trace technical flows, quickly becoming outdated and providing limited governance. In contrast, tools like Solidatus enhance these flows by incorporating business context, ownership and accountability – bridging the business-technical divide. 

A new era of data lineage

As enterprise data ecosystems grow more complex, comprehensive visibility into lineage becomes critical, shifting businesses from a fragmented view of data assets to total visibility across interconnected systems. 

This form of ‘connected governance’ allows leaders to see where information enters, where it’s stored, and what impact it has on operations and key decisions. Enhanced control and deeper understanding lead to risk reduction, improved operational efficiency, and substantial time and cost saving.

The Solidatus way

While there are numerous quality data lineage tools in the market that offer enterprise visualizations, Solidatus distinguishes itself in several key areas:

1. Enterprise data blueprints that are highly customizable

Solidatus transforms data blueprints from static diagrams to dynamic, interactive reflections of the enterprise, enabling the exploration of evolving interdependencies. With user-driven highlighting and manipulation of connections, teams can intuitively explore and reveal powerful insights across siloed systems. Evergreen data blueprints bring understanding by unveiling the living dynamics between data and decisions. 

2. Deep impact analysis that packs business power

Solidatus brings Git-style version control to data models, enabling decentralized collaboration. Groups can modify data models, advancing independently, before integrating updates into an authoritative enterprise data blueprint. This facilitates efficient communication and exploration of impact change scenarios for both business and IT users. For example, users can evaluate the downstream impacts and integration points of replacing a legacy system with newer technologies or simulate what impact a supply chain disruption might have on downstream manufacturing processes and business units.

3. Evergreen data intelligence through supercharged connectivity

Connectors link data lineage tools with applications and infrastructure across enterprise tech stacks; unlike typical approaches, Solidatus connectors don’t just copy schema, they conduct deep, real-time analysis to map system architecture and define relationships. Through identifying metadata changes by comparing them to previous versions, users can assess the impact of fine-grained changes on connected systems, maximizing insights. 

Discover how data lineage can help overcome critical business challenges in our new whitepaper:  Next-gen data lineage: A blueprint for success.

Solidatus joins the AWS Marketplace to bring connected governance to AWS customers

Unlocking the true power of AWS with the world’s leading data lineage solution 

Solidatus, the leading enterprise data lineage and metadata management provider, today announced its availability in the Amazon Web Services (AWS) Marketplace, allowing AWS customers to seamlessly discover, purchase, and deploy Solidatus. With integrations to AWS services like S3, Redshift, and Glue, plus connectivity to other AWS data sources, Solidatus delivers an integrated data landscape view across AWS and non-AWS environments. This unified approach allows for effective governance and risk management by bridging cloud, on-premises, and hybrid systems on one pane of glass. 

Simon Bustamante-Dick, Partner Analytics Leader, EMEA AWS, said: “AWS Marketplace transforms how enterprises worldwide find, subscribe to, deploy, and govern third-party software, data, containers, machine learning models, and professional services. It’s also become the most strategic channel for ISVs, data providers, and resellers to acquire new customers, migrate existing customers to the cloud, and grow revenue. I’m excited to welcome Solidatus to the AWS Marketplace. Their solution lets customers create integrated data maps spanning AWS and non-AWS environments, and this consolidated visibility allows organizations to simplify governance across AWS cloud, multicloud, hybrid, and on premises systems. Solidatus fills a key need for our customers seeking end-to-end data insight.”

With an extensive list of trusted partnerships and out-of-the-box integrations with leading data management vendors like Collibra, Snowflake, Alteryx, Google BigQuery, BigID, Denodo, Data.World, Informatica, Oracle, Microsoft, and IBM Redhat, Solidatus gives AWS customers confidence in leveraging their data from end to end.

Philip Dutton, Solidatus CEO, added: “Through AWS Marketplace, it’s now simpler than ever for customers to leverage Solidatus to visualize their data journeys and unlock deeper insights across their business. Our technology’s flexibility is a key asset, and we look forward to supporting AWS users with diverse data intelligence needs, from governance to analytics.”  

Key benefits of Solidatus for AWS users:

It doesn’t matter how data enters your business, how it’s transported, or where it’s stored, Solidatus consolidates your enterprise metadata into a single, manageable view. This gives users the power to create dynamic blueprints that visualize real-time data flow, providing an operational control center for comprehensive data management and analysis at scale.   

  • Automate end-to-end data lineage across AWS services  
  • Connect data sources for integrated analytics and improved decisions 
  • Map data to critical business processes to streamline operations 
  • Streamline data governance workflows  
  • Identify security gaps for enhanced data protection
  • Rapidly embed risk controls into data and processes  
  • Identify data quality issues affecting reporting 

View us on the Marketplace

For more information contact hello@solidatus.com

For AWS customers contact aws-marketplace@solidatus.com 

Solidatus and HSBC shortlisted for prestigious banking technology award

We are thrilled to announce that Solidatus and our customer, HSBC, have been selected as finalists for the Banking Tech Awards by Fintech Future. Our nomination falls under the category of “Best Use of Tech in Business Lending”. 

Our joint submission detailed how HSBC leveraged the power of Solidatus’ next-generation version-controlled graph technology to revolutionize their Wholesale Lending business under a multimillion-dollar transformation program that helped them:  

  • Slash credit decisions from months to minutes 
  • Reduce fund distribution to customers from months to hours 
  • Comply with evolving regulations and internal risk management controls 

Sid Mubashar, Head of Credit Insights & Intelligence, Wholesale Credit & Lending, HSBC, said: “At HSBC, we serve millions of customers across 63 countries and territories, with over one million customers utilising $3.6 trillion in approved credit limits – resulting in a highly complex data infrastructure. With Solidatus, I can visualize the data model for each business outcome, manage the requirements of key stakeholders and provide greater clarity over the intricacies involved in addressing their requests. This new data-led framework plays a major role in HSBC’s global strategy.” 

How HSBC Transformed their Data Capabilities with Solidatus

Within six months of launch, a small business team documented and modeled HSBC’s entire wholesale lending book, demonstrating traceability from source to consumption. The team has now successfully modeled 2,000 source tables with 80,000+ fields, and 20,000+ data linkages across 45 source systems used globally.

By seamlessly ingesting metadata via Solidatus’ connectors, HSBC gained complete lending book transparency with dynamic end-to-end lineage visualization. They can efficiently evaluate change impacts by referencing current, historical, and future data views, substantially reducing manual workload. Additionally, HSBC established a shared data language, eliminating redundant efforts for the bank and its customers, with significant financial benefits.

This self-service, single source of truth is available to hundreds of internal users, visualizing all data requirements across the lending book. In contrast, with the same sized team, given the complexity and scale of the organization, traditional solutions would require an estimated 18-24 months and would not provide the same breadth of transparency and capability.

Philip Dutton, CEO, Solidatus, added: “HSBC understands the complexities of modern banking better than most – and seeing such a small team deliver such a big impact using Solidatus is extremely powerful. Being shortlisted for this prestigious award is a testament to their innovative work and the quality of our technology.”

Read the full case study by following the link below.

Transforming HSBC's Lending Business with Solidatus

Within six months of launch, a small business team documented and modeled HSBC’s entire wholesale lending book, demonstrating traceability from source to consumption. They now have a highly scalable and automated solution that is being applied to several applications from ESG to liquidity calculations and other regulatory uses.

Urban planning is a lineage problem. Well, to be completely honest, at Solidatus, we’re convinced that all manner of things are lineage problems.

What does this have to do with data management? Well, bear with us, as the parallels we explore at the start of this blog post – the third and penultimate part in our lineage series – will resonate with anyone planning to alter or enhance their data estate in any way.

Let’s start by looking at an aerial view of two cities and their approach to their development. The first is Kolkata in India; the second is Shanghai in China.

Above: aerial view of Kolkata in India

Above: Shanghai in China before expansion (left); after expansion (right)

You can clearly see the differences in the planned expansion of the city. Kolkata has expanded into new territory and planned its road systems, fitting the various city zones around it, whereas Shanghai has just replaced the old with the new. SimCity, anyone?

It’s about the availability of resources – just like in SimCity, the urban planner has choices about what they can do – Do I have land? Do I have transport links? Do I have money? What’s most important? In both cases, there’s a river – a vital resource for most of the world’s great cities (New York has the Hudson, Rome has the Tiber, London the Thames, Paris the Seine…). So, building near to this really important asset is essential – nature is the best builder after all. Just as with the expansion of London to the south of the Thames, Kolkata had the option to expand into new territory on the opposite bank and could then plan for this. In the case of Shanghai, they clearly had built-up areas on both sides of their river, meaning a choice was needed on whether to bulldoze and replace. This is what they went with.

Why this is important

Is the project planned or under-planned? Note: very little is totally unplanned!

No one goes into any building project with no plan – they’re always building something. They can be qualified for this and have the right tools, they can consult others and try to make something that fits into the surrounding area’s plans – or they can just put up their building not knowing about utilities, the type of ground (Pisa, we’re looking at you) and hope for the best.

Let’s bring it a little closer to Solidatus’ UK HQ in London and the Thames.

This river was once much wider than it is now (four times wider, as it happens), and when London needed to expand, it exploited this space, most recently to create room for underground tube lines and sewerage (both in the 1860s and again in the 2020s). Below are a couple of images giving you an idea of the construction projects where the cut-and-cover techniques were used to expand the Victoria Embankment, first for the removal of waste – thank you, Joseph Bazalgette – and then for the commuting public. We have a new(ish) road above it as a result, and the north side of the river is that bit bigger.

Above: historical view of the Victoria Embankment on London’s River Thames in the UK

Above: detail of historical cut-and-cover work on the Victoria Embankment

Above: further historical construction work in London

In the planned city, it’s easier to see how to add in utilities, and how to modify things to bring in new systems. One side of Kolkata will be easier to modernize than the other. Shanghai was built with a modern plan and these things in mind.

London certainly wasn’t built like this. It is a city built on a city built on a city approaching the end of its second millennium. That is why it is so expensive and increasingly impossible to add in new facilities, as the environment is so difficult to upgrade.

Cities as enterprises

Let’s bring this back to the ground and think of a large enterprise as a city. It has buildings, utilities, people etc… It has systems and controls, it has legacy and aspiration to modernize. Like London it has archaeology, and decisions made decades (or longer) ago have created the foundations upon which to build the new. Like London, it wants to be modern and be able to use the latest and greatest of everything. Fitting it all together is going to determine whether it will be a successful company that can trusted and governed, indeed whether it can even survive.

We can see how decisions such as the height and width of tunnels determine the dimensions of a London tube train.

So it is with the size of data centres, the choice of technology, and the network connections and locations in an enterprise.

Bringing it back to data

As we know, how we connect everything in an organization is most aptly modelled by lineage. This is best used in the first instance as a planning tool by providing an understanding of how elements are related and how they make the system work. This information helps inform decisions about the architecture, processes, and systems needed to support the data.

It also provides insight into what changes need to be made when new data sources, applications or utilities are introduced. Lineage can also be used to identify potential risks associated with data and ensure that it is managed in accordance with policies and regulations. Finally, lineage can help identify opportunities for optimization, such as reducing redundant processing or combining multiple data sources into a single source.

Putting lineage-centred planning into practice in the data office

Lineage can be used for planning in a variety of ways. First, it can provide an overview of the entire system and identify any weak points or errors in the process. By understanding the full scope of the system, organizations can better plan out their strategy and make decisions about which areas need improvement. Additionally, lineage can be used to track changes over time and identify trends that could be useful when making decisions about future projects or initiatives.

Lineage can be used for planning to help organizations understand how their data is being used by different stakeholders. By understanding where data comes from, who has access to it, and how it is being utilized, organizations can develop strategies that are tailored towards specific groups of users. This will help them identify potential opportunities for growth or areas where they need to focus more attention on improving their processes or systems.

Finally, lineage can also help organizations plan out their resources more effectively and conduct the sort of impact analysis necessary for any transformation project. By understanding where they come from and how they are being used throughout the organization, they can better allocate resources such as personnel or technology so that they are able to meet their goals more efficiently.

Insights for success

At its core, lineage provides us with a framework for understanding how certain events or ideas are connected to one another. This helps us see patterns in our life that may not be immediately obvious. For example, if you look at your family tree, you may notice that certain traits or interests run through generations. If you look at the history of an industry or field of study, you can identify key moments when certain innovations emerged and how they affected the development of that particular area.

By looking at lineage as a planning tool we can also gain insights into how different elements interact with each other in order to achieve success or failure. For instance, if we want to start a business, we need to consider all the factors involved such as market trends, customer preferences, competition etc., and then plan accordingly so that we maximize our chances of success. We can also use lineage to predict potential outcomes by looking at past successes or failures and using them as indicators for what might happen in the future.

Greenfield will always be easier, and if we plan with a lineage-first approach we’ll make a more sustainable environment. However, if we have legacy and stop to map it, then we can make the best of the situation we find ourselves in.

If Bazalgette were an enterprise architect, he might say: The principle in planning with lineage first was to divert the cause of the mischief to a locality where it can do no mischief.

The Happy CDO Project

We asked at the top of this article what this all has to do with data management, and lineage is a huge part of good data management. But it’s more than that. We think that lineage tools, better data management technology more generally, and methodology fit for the 2020s are central to being a happy CDO. These are core findings of proprietary research we recently commissioned, as discussed in a new white paper, Data Distress: Is the Data Office on the Brink of Breakdown? Part of The Happy CDO Project, we highlight in this research that 71% of the 300 senior data leaders in financial services in the US and UK that were surveyed have considered quitting their jobs as a result of a phenomenon that we define as ‘data distress’. This is just one of the findings that we explore – along with suggested remedies.

Data Distress: Is the Data Office on the Brink of Breakdown?

Solidatus extends technology partnership with Alteryx to demonstrate ‘connected governance’

Enhanced alliance will help customers visualize data flow across their Alteryx workflows through Solidatus interface, demonstrating governance and building greater confidence to grow their Alteryx footprint

London and Houston, 4th July 2023. Solidatus, the leading data management solution that empowers organizations to connect, visualize and govern their data relationships through data lineage, has extended its strategic technology partnership with Alteryx, the preeminent analytics automation platform.

Leveraging a partnership with a leading Global Systems Integrator (GSI), the two companies have been able to successfully facilitate two large-scale transformation projects: one for a global systemically important bank (G-SIB); the other, a multinational financial services company.

Both shared customers derived value from Alteryx-enabled workflows, which brought significant improvements to their business-critical analysis capabilities by streamlining the production of regulatory reports and reducing the frustrations and challenges felt by their internal IT groups. In order to grow, they faced a business requirement to address data governance concerns over audit, processing and best practices. Each wanted to demonstrate end-to-end lineage to fulfil regulatory obligations specifically around MIFiD and Solvency 2 (capital adequacy), as well as ESG disclosures.

This is where Solidatus came to the fore: by mapping the Alteryx workflows into Solidatus and then through their companies’ systems, processes, controls and their governance platform, Solidatus could easily demonstrate governance processes, leading the senior business leaders of both customers to extend their Alteryx footprint and bring further value to their organizations.

Following the success of these projects, Solidatus has extended its support for more Alteryx tool sets and fully automated the integration. Customers can not only get fine-grained, field-level lineage but can show that their data is under control by allowing them to complete an end-to-end picture of the whole process and resulting data estate.

Håkan Söderbom, Sr. Director, Technology Alliances, at Alteryx, said: “Working with Solidatus enables shared customers to benefit from lineage to quickly demonstrate that their data is properly governed. Solidatus lineage makes it easy to connect technical and non-technical applications and platforms that maintain policies, controls, CDEs and regulations to visually prove compliance much faster than would typically be possible.”

Howard Travers, Head of Technology Alliances at Solidatus, said: “Alteryx workflows are a critical part of the management of organizational data flows for regulatory reporting at many of the largest banks in the world, especially in the USA. The Solidatus-Alteryx integration creates a field-level lineage-focused view of an Alteryx workflow, which can be automatically linked to upstream and downstream lineage information to provide a clear end-to-end view. We believe the Solidatus-Alteryx connector will provide confidence to any financial or regulated organization wishing to expand their use of Alteryx.

“Our vision is to ‘make the unknown known’ by enabling organizations to rapidly build an enterprise data blueprint. Solidatus is a powerful data management solution in its own right. However, we are being increasingly used as a lineage bridge to other best-of-breed applications like our partners Alteryx, Snowflake, Collibra and BigID, to name but a few. Our relationship with Alteryx proves the value of ‘connected governance’ to demonstrate governance, removing concerns, building confidence and driving expansion.”

– Ends –

For more information please contact:

Solidatus press office

About Solidatus

Solidatus is an innovative data management lineage solution that empowers organizations to connect and visualize their data relationships, simplifying how they identify, access, and understand them. With a sustainable data foundation in place, data-rich enterprises can meet regulatory requirements, drive digital transformation, capture business insights, and make better, less risky and more informed data-driven decisions. We provide solutions to several key areas of endeavor, including: governance and regulatory compliance; data risk and controls; business integration; environment, social, governance (ESG); and data sharing. Our clients and investors include top-tier global financial services brands such as Citi and HSBC, healthcare, and retail organizations as well as government institutions.

www.solidatus.com

June is LGBT Pride Month.

If you’re reading this in the West, this is unlikely to have escaped your notice; companies across the internet have proudly changed their corporate colours in their tens of thousands and are publishing supportive feelgood stories on their social media channels left, right and centre.

We don’t disparage this. And Solidatus is no different: this time last year we published blog posts that looked at the foundations of the pride movement and sought the perspectives of a couple of our colleagues from the LGBTQ+ community:

This year, though, we thought to ourselves: we have some cool software that can model data and systems, their lineage and offer all sorts of other insights from active metadata, as used for data governance and other use cases in banking, finance and beyond. But what if we make some interactive models of the LGBTQ+ landscape so we can bring something new to this space?

Well, following our success last month in modeling Eurovision data from 1956 onwards, we’ve done just that.

We drew this information semi-automatically from the sources shown below, something that our connectors hugely simplify. And then we adjusted it to elevate stories that speak for themselves.

So, come with us and take a look!

In this model, we provide a detailed view of different countries’ LGBTQ+ laws, with green and red transition lines indicating whether those rules and permissive or restrictive respectively. Click on the dropdown arrows next to a continent’s name to view info on individual countries, each of which you can click on. (This data was drawn from the International Lesbian, Gay, Bisexual, Trans and Intersex Association (ILGA) via data.world, as detailed in the ‘Inspector’ panel when you click on ‘Acknowledgements’.)

View model.

In this model, we provide a view on different countries’ evolving travel ratings from 2012 to 2021. The lower the number is in the top left-hand corner of the country’s card, the more openly LGBTQ+ people can be when they travel. (This data was also drawn from data.world, as shown in the equivalent ‘Inspector’ panel.)

View model.

Interesting info, we hope you agree, and even if you don’t, we wish you, the whole LGBTQ+ community and all of its many millions of allies a very happy Pride Month.

For more guidance on how to read Solidatus models, there’s a handy primer in our Eurovision blog post under the ‘A note on Solidatus models’ heading.

Solidatus brings attribute lineage to IBM z/OS mainframes

Move will enable users to automatically map mainframe metadata into Solidatus to create active blueprints from which they can derive actionable insights and de-risk their operations

London and Houston, 18th May. Solidatus, the leading data management solution that empowers organizations to connect, visualize and govern their data relationships, can now draw metadata from IBM z/OS mainframe scanners that operate at an enterprise-wide level on applications and systems. This metadata is then automatically mapped into lineage models so that users can fully visualize their systems and plan for change, such as migrating to the cloud.

With IBM z/OS mainframes heavily embedded into so many organizations’ IT infrastructures, and many components susceptible to end-of-life risks, it’s crucial for efficient and effective systems analysts to be able to: understand the business logic and data transformation rules embedded in code; understand data quality and validation rules for comprehensiveness and accuracy; and identify integration points to assess the impact of data migration on these integrations.

A resilient data architecture relies on capabilities that support the availability, integrity, and security of data within an organization.

Software already exists to support these aims but to bring things to the next level, Solidatus now allows users to effortlessly view IBM mainframe lineage through an ‘enterprise inventory’, with content clustered however the customer requires. In turn, this means that users can improve their data governance processes and controls through the sophisticated but easy-to-use models that can be built in Solidatus and the display rules that can be applied to them.

Howard Travers, Head of Technology Alliances at Solidatus, said: “I’m hugely excited by this initiative to automate the process of visualising IBM z/OS attribute lineage, which will save our customers a huge amount of time and cost by reducing manual effort. By visually identifying lost, redundant and orphaned code we can: simplify understanding; demonstrate risk and compliance; and, where required, accelerate the move to the AWS, Google and Azure Clouds. These are all things that Solidatus will make infinitely easier for our customers.”

Travers added: “This work addresses a major pain point for a massive chunk of industry around the world, especially in the USA. In the banking sector alone, 43% of existing systems were built using COBOL, a legacy programming language associated with z/OS.”

++++++++++

Solidatus’ representatives, including CEO and Founder, Philip Dutton, look forward to discussing this, and other connectors and technology partnerships, with delegates at next week’s Gartner® Data & Analytics Summit 2023 in London. Dutton will also co-host a presentation alongside BNY Mellon’s Lewis Reeder entitled Solidatus: How BNY Mellon is Visualizing a Holistic View from a Complex World.

– Ends –

For more information please contact:

Solidatus press office

About Solidatus

Solidatus is an innovative data management solution that empowers organizations to connect and visualize their data relationships, simplifying how they identify, access, and understand them. With a sustainable data foundation in place, data-rich enterprises can meet regulatory requirements, drive digital transformation, capture business insights, and make better, less risky and more informed data-driven decisions. We provide solutions to several key areas of endeavor, including: governance and regulatory compliance; data risk and controls; business integration; environment, social, governance (ESG); and data sharing. Our clients and investors include top-tier global financial services brands such as Citi and HSBC, healthcare, and retail organizations as well as government institutions.

www.solidatus.com

Come closer, come closer and listen.

The beat of my heart keeps on missing.

“Listen to what,” you might reasonably ask, “and, more importantly,” you might go on, “what on earth has your heart got to do with it? Isn’t this a blog about data?”

Well, in a world-first – and, we imagine, much to the joy of 1969-era Lulu – we at Solidatus have linked the Eurovision Song Contest to the realm of data governance and regulatory compliance.

Boom bang-a-bang!

Reality check

“That’s great but so what?”

It’s a fair question. The answer is that this genuinely fascinating work doesn’t just reveal otherwise hard-to-find insights into Eurovision across the ages; the upcoming song contest has provided us with an excuse to develop some models that beautifully illustrate the importance of lineage and how powerful it can be to properly map and visualize data, or rather metadata, whether for business or pleasure.

So back to Eurovision. And in a move that we’re praying won’t equally alienate its two very different target audiences – Eurovision fans and data professionals – we’ve taken a range of rich and granular datasets stretching back to Eurovision’s founding contest in 1956 and fed them into a versatile piece of software that’s more usually used by people working in complex multinational banks and other big businesses.

In this blog post, we’ll:

  • Explain how these models work;
  • Provide access to them so you can dig around in them yourself;
  • Highlight some of our sample findings; and
  • Show the parallels between this work and more typical use cases of this software, such as data governance and regulatory compliance.

But first, and to whet your appetite, here are some sample Eurovision findings, upon which we expand later:

  • Cyclic voting graphics reveal patterns of pairs of countries each awarding the other top marks – and stats that will be familiar on Greece and Cyprus’s history in this practice.
  • In the longest period of consistent voting methodology (1975 to 2015), the highest-scoring winner scored more than three times the points of the lowest-scoring winner, revealing what is arguably the most successful Eurovision song of all time – see who it is below.
  • The UK’s belief that it’s getting progressively worse is exaggerated, and we have the analysis to prove it.

But how did we get there? Let’s first take a look at what a Solidatus model is.

A note on Solidatus models

Solidatus models aren’t databases and they don’t store data, or at least not the primary data you’d find in a typical database, with row upon row of similar information on names, addresses, dates of birth etc on thousands of similar records. Rather, they display metadata – data about data – through visualizations that enable users to see how data and systems relate to each other, and how data flows between them, its journey and how it impacts with other data.

In the case of Solidatus, we can meaningfully and with justification describe this metadata as ‘active metadata’, a concept you can read about in our blog post, From data to metadata to active metadata. You can read about other concepts in this field in Key concepts in data governance and managemeny: an A to not-quite-Z guide.

But lest we stray off topic, let’s take a quick look at what a model looks like.

Below is a section of a typical model, not dissimilar to those we’ve used for our Eurovision research. This one, though, was built for a more typical business use case. (We cater for many solutions and sub-solutions.)

A Solidatus model comprises:

  • Layers – these are effectively columns. User-defined, they provide a way of grouping together things that belong in the same broad category, whether by time, sequential ‘position’ in a chain of systems or however the user sees fit; in our three Eurovision models, we layer by year, country and placement (see below).
  • Objects – these are what you might describe as a primary record. In a normal Solidatus use case, they might represent a database or perhaps a table in a more complex database. Users set their own hierarchy.
  • Attributes – these are more granular details of objects.
  • Transitions – these are a special type of metadata, shown here as arrowed lines between attributes in different layers, but they’re also recorded in a model’s ‘Relationships’ panel. Typically, they show the flow of data between systems, but they can be used to describe any relationship.
  • Properties – these allow you to dig into the weeds of attributes, with information being added or views through the ‘Inspector’ tab on the right-hand side of the screen, or on an object or attribute.
  • Relationships – also available through the ‘Inspector’ tab, these show relationship information (transitions) in a tabular form.

When viewing a Solidatus model, bear in mind:

  • All of the rich info that can be found in the aforementioned ‘Inspector’ panel, which can be exported into CSV files.
  • A range of views are available on the left-hand side, based on what the model builder has built.
  • The show trace view and ‘target’ icon will help you isolate flow-related information on specific object and attributes.
  • As with our Eurovision models outlined below, skilled model builders might have set up display rules – here, for example we have gold, silver and bronze boxes around entries that were placed first, second and third, and small arrows in country entries that show whether they did better or worse the following year

And finally, a note on how this information is actually brought into Solidatus in the first place:

This is the clever bit. Models can be built manually, and there’s usually manual intervention. But we also have a series of connectors that can automate much of the process.

In the case of this project and alongside some other data repositories, we drew a lot of the info from Wikidata, the central storage area for the structured data of Wikimedia and all the many records on Eurovision it holds. And the connector we built for wider Wikipedia-related metadata ingestion and used for this project is for a query language called SPARQL.

Because here’s the thing: Solidatus doesn’t deal in data that’s unavailable to you if, crucially, you know where to look; rather, it elevates it into a more visually digestible environment, where it can be interrogated in a meaningful context.

But enough of the sales pitch. Let’s dive into the models!

Our three models

By year
This model shows layers arranged by year:

In this view, we’ve scrolled to the right of the screen so that the years 2020, 2021 and 2022 can be seen. But there’s more to the left, going back to 1956, and more below – in each layer, the countries are arranged in descending order of votes garnered.

Here, we’ve clicked on Ukraine in 2022, which shows a transition line from its position the previous year (and the one before that and so on), and changes the focus of the ‘Inspector’ on the right. This panel shows info such as:

  • The artist’s name (Kalush);
  • The song name (Stefania);
  • The language it was sung in (Ukrainian); and
  • Its genre(s) (contemporary folk music and, if you can believe it, hip hop).

If a user clicks on ‘Views’ on the left-hand side of the screen, you can also isolate data along the lines shown below, as built by our data modeller.

Cyclic voting, based on who gave whom the top score of douze points, is an interesting one to explore, as illustrated by the transition lines here for the years 2013, 2014 and 2015:

So, when you dig into the model, you can see, for example, that in 2013, Sweden’s judges awarded their two top scores to Denmark and Norway, whereas their entry received only one top score, which they got from Norway’s judges.

By country
This model shows layers arranged by country:

In this view, we’ve scrolled to the left, where we can see the first three countries by alphabetical order – Albania, Andorra and Armenia. By layer, each object is arranged by year, so Albania, for example, first took part in 2004.

At random, we’ve clicked on its record for 2008, the card itself showing that it came 17th that year. The transitions lines pointing into it show which countries (to the right of the model) gave it points. And in the ‘Inspector’, the usual info is available along the lines of the bullets listed in the ‘by year’ model.

‘Zemrën e lamë peng’ was its entry’s catchy title, for example.

By placement
This model shows layers arranged by placement from first to tenth:

Here, we’ve clicked on Sweden’s winning entry in 2015, the transition lines showing whom Sweden gave its votes to and who voted for its winning song, Heroes, sung in English by Måns Zelmerlöw.

And the opportunities for data analysis just go on.

The point of this exercise is to illustrate the richness and granularity of this easily visualizable data, something that of course has more practical applications in the world of big business, rather than to home in on any particular stats. Nonetheless, we feel compelled to highlight a handful of key findings, which you can supplement with your own digging around.

Some sample findings

By choosing the ‘Cyclic Voting (top points)’ view in the ‘by year’ model, we can see that pairs of countries all gave each other 12 points:

  • The UK and Switzerland, and Italy and Ireland in 1976, the first year this phenomenon arose;
  • Cyprus and Greece in 1986, 1987, 1994, 1997, 1998, 2002, 2003, 2004, 2005, 2010, 2012, 2017 and 2019; and
  • Many others we won’t list but you can find yourself, which might raise an eyebrow, given the geopolitical landscape now and in the past.

Cyclic voting in 1976, 1978 and 1979, extracted from a wider model

By exporting the ‘Inspector’ info from the ‘1st’ layer in the ‘by placement’ model, we can see that:

  • These are the most successful languages in terms of winning song: English (with 33 wins), Dutch, Hebrew and Italian (all with 3 a piece), German, Norwegian, Serbo-Croat, Spanish, Swedish and Ukrainian (all with 2), and Crimean Tatar, Danish and Portuguese (with 1 each);
  • Grand final points for the winning entry have a range of 18 all the way up to Portugal’s barnstorming 758 in 2017, but the huge disparity is in part explained by changing point-awarding scores over the years;
  • To conduct a more meaningful analysis, we could look at the years 1975 to 2015, the longest period of consistent methodology, for which the range is 123 (Norway in 1985) to 387 (also Norway, this time in 2009)
  • On that last point, with more than three times the score of the lowest-scoring winner in this 40-year period, there’s an argument that Alexander Rybak’s Fairytale (also sung in English) is the ‘best’ song in Eurovision history, although, despite moderate chart success for his song, Rybak is hardly a household name beyond Norway’s shores; and
  • The winning song with the shortest title is Netta’s Toy, Israel’s entry for 2018, and the longest title is Poupée de cire, poupée de son, sung by Luxembourg’s France Gall in 1965, who demonstrated that French is sometimes better than English, which would have rendered the song Wax doll, sound doll.

The United Kingdom’s sense that it has done progressively worse in recent years (last year’s second place aside) is exaggerated, given the increasing number of participating countries. The two graphics below show:

  • A random portion of the ‘by year’ model, in this case for the period 1964 to 1968, which shows how easy it is to trace the progress – or lineage – of the UK’s performance; and
  • A graph derived from this model, which shows the UK’s absolute position alongside a line indicating the percentage of countries that finished above it. While the trend has been downwards, the orange line has been more constant because the number of countries competing has gone up.

Lineage on the UK’s position from 1964 to 1968 inclusive, extracted from a wider model

The UK’s position (blue) and percentage of countries that finished above it (orange) vs year from 1957 to 2022 (with the years it didn’t compete removed)

But take a look at these models yourself – and if you happen to be a journalist writing about Eurovision, give us a shout at hello.marketing@solidatus.com and we can walk you through our work.

Lineage, metadata and the world of data compliance

Now, if you’re a data professional with little interest in Eurovision, we’re grateful, frankly, that you’ve stuck with us. Maybe we’ve converted you along the way.

But let’s bring this back to the real world, or at least your world.

The beauty of a visualization tool like Solidatus is that there are virtually no limits to the applications its graph technology can be put to, all of it exploiting and promoting active metadata.

We have, though, found that Solidatus particularly lends itself to these solutions: governance and regulatory compliance; data risk and controls; data sharing; business integration; and environmental, social and governance (ESG).

We’re going to end with a quick review of governance and regulatory compliance.

Using Solidatus, you create living blueprints that map how your data flows – a.k.a. lineage – as it moves through your systems – both now and at other points in time. You can connect your data to the processes that create it, to the policies that guide it, and to the obligations that regulate it. With this framework in place, you can maintain transparency across your business, meet ever-evolving regulatory requirements, and accelerate change programs.

That’s the boilerplate. But what does it mean in practice?

Well, let’s finish with a few excerpts from our recently published case study, Solidatus models HSBC’s global lending book (PDF), this use case – alongside many others, including business integrations, and data risks and controls – being a key component of their several objectives.

In under six months, a team of two was able to document and model the global bank’s entire credit and lending book, demonstrating traceability from source to consumption. They now have a highly scalable and automated solution that is being applied to several applications from ESG to liquidity calculations and other regulatory uses.

Do read the case study (PDF) to see how they reduced a project’s cost from $5,000,000 to under $500,000, a saving of more than 90%.

And don’t let inefficient data management practices be your Waterloo.

By Philip Miller, Co-Founder and Chief Innovation Officer

In the first part of this series on data lineage, we talked about how lineage is a tool for context and how if we use it first, we get better outcomes. In this second instalment, let’s take this a bit further and say that without lineage, governance is a false economy.

I consider myself a bit of an all-around geek, not just of data but of technology in general and of ideas. There is power in ideas, which is why one of my favourite films is Inception.

You’ll see how this relates to lineage and your work in a moment. Until then, please bear with me and enjoy the ride.

Back to Inception

For the uninitiated (how?), Inception’s premise is that there can be a tiny moment of inspiration that can affect everything. Another is one is The Terminator (and Terminator 2 – that is as far as I am willing to go!), where the protection of one single life echoes into the future.

These two films have much in common, but they clearly demonstrate the value of understanding reality.

To an extent, both deal with time and the proposition of ‘what if’, raising questions such as…

Inception: what if I question and change someone’s reality?

The Terminator: what if I could kill Hitler before he committed his crimes?

Christopher Nolan, director of the first, spent almost ten years polishing off a grand idea and created a clear closed loop in which he could simply explain the whole premise.

His original diagram is shown below (there is a cleaned-up version at the end):

For those unfamiliar with the film, there are many levels of simulated realities in which the hero characters have to unlock the secrets to a character’s deepest desires. There is a messiness as one of the characters (Cobb) hasn’t disclosed some of his own secrets, leaving the reality at risk.

In The Terminator a ‘mad’ computer decides to wipe out his only nemesis by killing his mother before he was born. The computer goes into this with partial information and in the end ensures that his enemy is actually born – whoops. From there, the Terminator film series goes on to prove what happens when you don’t plan properly. As you can see below, there is a real confusion about reality, and anyone trying to pick up the pieces and add in more lore is left with less and less sense.

Back to lineage

Let’s bring this all back to the point in hand – lineage.

Lineage is an expression of reality, it is a record of what is actually happening. If you take time to build up context as you are going along, you find yourself in a good place where you can show what’s actually happening around you.

The Law of Unintended Consequences comes into play here – The Terminator’s mission is to travel back in time and kill Sarah Connor, the mother of John Connor, who is destined to lead humanity in a future war against machines. By doing this, he hopes to change the course of history and prevent his own destruction at John’s hands. The Terminator didn’t have enough information and brought about its own demise. It was missing reference data (context) – the identity of his foe’s father!

Lineage is important to the film Inception because it helps to explain the origins of the characters and their relationships. It also provides a sense of continuity, which is essential when trying to piece together the complex plot of the film. Ultimately, Cobb didn’t disclose all his relationships to his teammates and created a situation where he created risk to his mission.

Data lineage applications

Now let’s apply these lessons.

There are relationships everywhere; they can be simple, they can be complex, but they all have context. When you spend the time to build up these connections, you find that you can describe what’s going on better and better. The less in the way of holes there are in your lineage view, the more chance you have to manipulate the system.

For example, the machines missed the fact that a child has a father as well as a mother and never sought to complete that information (which we know was freely available from the later films). We know that no one really questioned why Cobb didn’t dream more than superficially; those that did, didn’t grasp the implications properly.

We should always be looking for gaps in our reality, noting them and investigating them. To do this we need a mechanism that exposes the gaps clearly, visually, transparently. If you don’t know, you are missing an important piece of information and will end up making bad decisions. Sometimes these bad decisions will compound over time – building up until they cause big problems, often catastrophic and wide-reaching in nature. Acting on incomplete intelligence is sometimes needed, but it’ll more likely than not to lead to a paradox of some sort.

The world is literally overflowing with examples of not taking lineage seriously. It shows up as unmanaged risks, cyber defence failures, over budget/unsuccessful transformation projects, corporate failures.

Ultimately, who pays? In the end it’s penny-wise, pound foolish to ignore this. It’s always the innocent that pay – with their cash, their jobs, their prospects. Does that sound dramatic? Well, in 26 years of IT projects, I don’t think there’s one that wouldn’t have been more efficient, more sustainable, more pleasant without lineage baked in.

The reality of lineage is that when you do it right, you’re planning to succeed.

If Cobb were working in data governance, I think he’d say: “Solidatus specializes in a very specific type of governance. Lineage governance!”

If Kyle Reese were in the same field, he might comment: “The lineage is not set. There is no governance but what we make for ourselves.”

…………………………………………………….

Below: simplified Inception diagram

Running a tight data governance and regulatory compliance regime is nigh on impossible without good data lineage and active metadata management software.

In fact, in-depth knowledge of data and how it’s used is crucial to all organizations above a certain size, pretty much regardless of their data- or systems-related area of endeavor. Apart from a regulatory imperative to understand the data flows within your organization, data lineage is fundamental to getting the right data to the right people, at the right time – this is the basis of sound decision-making, risk management and new data-driven initiatives.

So we’re agreed on the value of data lineage. But how do you map your data and systems into lineage models – or end-to-end metadata maps – in the first place? And, how can you derive end-to-end data flows where documentation doesn’t exist or is outdated?

Mapping, fast and slow

Well, one way is to do it manually. This is tried-and-tested and usually the starting point. With user-friendly software and a good UI, it can be done, and manual intervention almost always plays a part in model-building.

But it’s not always very quick.

So what about when you’re mapping the Big Beasts of the data world? Your Snowflakes, Oracles and Google BigQuerys? Your MySQLs, SAPs and Salesforces? The list goes on. These platforms and systems can be so vast and complex that relying solely on manual methods isn’t viable.

This is where automatic connectors come in. But not all connectors are created equal.

Let’s take a look at five of the most important attributes that the best data lineage and metadata connectors bring to your work.

1. Depth and automation

The best connectors are supercharged, providing exceptional depth and automation. Instead of just copying schema into a model and cataloging your current systems, they perform deep, real-time analysis to map the architecture of your systems and the relationships that define them.

To illustrate the value of this attribute, let’s take Google BigQuery by way of example; with the right connector, you have the ability to comprehend intricate modifications to the extracted metadata from this popular data warehouse. Rather than merely updating metadata, it analyzes and visualizes the exact fine-grained change within your lineage model, something that’s critical in a data world that never stops moving.

Philosophically speaking, code is metadata. So a good connector will harvest, parse and analyse data flows within complex SQL code so you can understand exactly what’s happening and how data is moving and mutating around your systems.

2. Detecting and recording change

Top-quality automatic connectors harvest and discover metadata, and they identify and document the exact changes to your metadata by comparing it to a previous version before incorporating it so you can assess the impact of fine-grained changes on connected systems.

Harvesting deep-level connections ensures that the business-friendly metadata views you need are backed up by sound information under the surface about the systems you map. This provides a richer understanding of the ‘before’, ‘during’ and ‘after’, which is essential for reliable planning and risk mitigation.

To add more context, if a connector is executing every day, this sort of connector isn’t just copying the schema into your metadata graph, it’s doing a daily comparison – or ‘diff’ – to help you understand the change and merge it into your graph in a safe and conflict-free way. So not only are you synched with the systems you’re connecting to, you’re detecting exactly when and what is being changed.

Pulling the latest structure and that alone, which is what some providers offer, is a very poor cousin of this functionality.

3. Code analysis

Expanding on our first point, a deep-level connector worth its salt will analyze code, ETLs, BI reports, schemas, catalogs, glossaries, dictionaries, data types, data quality issues and more.

This transports the fruits of data lineage modeling into the world of your colleagues in technical departments, who will derive their own insights from these capabilities, adding to the multifaceted in-house understanding of all of your systems.

After all, as part of holistic commercial organisms, the most effective data analysts don’t work in silos.

4. Automating technical lineage

Perhaps the most important attribute is the automation of technical lineage.

The most effective way to deliver this to a user is by parsing ETL logic and SQL code, and linking data flow and transformations to a standard schema.

This enables simple and quick data-flow capture, modeling, and visualization within and between applications.

Quite simply, there are few better ways of automating data lineage than by capitalizing on this method of parsing.

5. Unlimited metadata ingestion

But we’ve saved one of the most topical attributes for last: metadata ingestion. Or perhaps we should say ‘active metadata ingestion’, because, really, what other metadata is worth working with?

With the right setup, you can take advantage of unlimited metadata ingestion through ‘plug n play’ connectors, open API and SDK framework, along with a suite of file import templates.

Companies thrive or fail on the strength of their intelligence.

To build a metadata fabric, all information about data and its usage must be coalesced, linked together and speak the same language. Ingesting this into a common format means you can easily query, analyze and present it.

A data analyst’s work is at the heart of their enterprise’s intellectual heft. Active metadata feeds into this at a fundamental level, and the quicker and more accurately you can ingest this information into your models such that it can be easily interrogated by your colleagues, the better.

The Solidatus way

Other lineage software platforms provide connectors but don’t be fooled into thinking that they all value these attributes as highly as we do at Solidatus or even do them at all. We live by them.

No matter where your metadata sits, we can harvest it.

Whether you’re looking to map data and catalogs, databases, or BI, ERP and CRM tools, we’re the best place to start that journey.

At the time of writing, our connector-count is hovering close to the 60-mark – and rising. Where will it be when you next check our connectors page?