In fact, in-depth knowledge of data and how it’s used is crucial to all organizations above a certain size, pretty much regardless of their data- or systems-related area of endeavor. Apart from a regulatory imperative to understand the data flows within your organization, data lineage is fundamental to getting the right data to the right people, at the right time – this is the basis of sound decision-making, risk management and new data-driven initiatives.
So we’re agreed on the value of data lineage. But how do you map your data and systems into lineage models – or end-to-end metadata maps – in the first place? And, how can you derive end-to-end data flows where documentation doesn’t exist or is outdated?
Mapping, fast and slow
Well, one way is to do it manually. This is tried-and-tested and usually the starting point. With user-friendly software and a good UI, it can be done, and manual intervention almost always plays a part in model-building.
But it’s not always very quick.
So what about when you’re mapping the Big Beasts of the data world? Your Snowflakes, Oracles and Google BigQuerys? Your MySQLs, SAPs and Salesforces? The list goes on. These platforms and systems can be so vast and complex that relying solely on manual methods isn’t viable.
This is where automatic connectors come in. But not all connectors are created equal.
Let’s take a look at five of the most important attributes that the best data lineage and metadata connectors bring to your work.
1. Depth and automation
The best connectors are supercharged, providing exceptional depth and automation. Instead of just copying schema into a model and cataloging your current systems, they perform deep, real-time analysis to map the architecture of your systems and the relationships that define them.
To illustrate the value of this attribute, let’s take Google BigQuery by way of example; with the right connector, you have the ability to comprehend intricate modifications to the extracted metadata from this popular data warehouse. Rather than merely updating metadata, it analyzes and visualizes the exact fine-grained change within your lineage model, something that’s critical in a data world that never stops moving.
Philosophically speaking, code is metadata. So a good connector will harvest, parse and analyse data flows within complex SQL code so you can understand exactly what’s happening and how data is moving and mutating around your systems.
2. Detecting and recording change
Top-quality automatic connectors harvest and discover metadata, and they identify and document the exact changes to your metadata by comparing it to a previous version before incorporating it so you can assess the impact of fine-grained changes on connected systems.
Harvesting deep-level connections ensures that the business-friendly metadata views you need are backed up by sound information under the surface about the systems you map. This provides a richer understanding of the ‘before’, ‘during’ and ‘after’, which is essential for reliable planning and risk mitigation.
To add more context, if a connector is executing every day, this sort of connector isn’t just copying the schema into your metadata graph, it’s doing a daily comparison – or ‘diff’ – to help you understand the change and merge it into your graph in a safe and conflict-free way. So not only are you synched with the systems you’re connecting to, you’re detecting exactly when and what is being changed.
Pulling the latest structure and that alone, which is what some providers offer, is a very poor cousin of this functionality.
3. Code analysis
Expanding on our first point, a deep-level connector worth its salt will analyze code, ETLs, BI reports, schemas, catalogs, glossaries, dictionaries, data types, data quality issues and more.
This transports the fruits of data lineage modeling into the world of your colleagues in technical departments, who will derive their own insights from these capabilities, adding to the multifaceted in-house understanding of all of your systems.
After all, as part of holistic commercial organisms, the most effective data analysts don’t work in silos.
4. Automating technical lineage
Perhaps the most important attribute is the automation of technical lineage.
The most effective way to deliver this to a user is by parsing ETL logic and SQL code, and linking data flow and transformations to a standard schema.
This enables simple and quick data-flow capture, modeling, and visualization within and between applications.
Quite simply, there are few better ways of automating data lineage than by capitalizing on this method of parsing.
5. Unlimited metadata ingestion
But we’ve saved one of the most topical attributes for last: metadata ingestion. Or perhaps we should say ‘active metadata ingestion’, because, really, what other metadata is worth working with?
With the right setup, you can take advantage of unlimited metadata ingestion through ‘plug n play’ connectors, open API and SDK framework, along with a suite of file import templates.
Companies thrive or fail on the strength of their intelligence.
To build a metadata fabric, all information about data and its usage must be coalesced, linked together and speak the same language. Ingesting this into a common format means you can easily query, analyze and present it.
A data analyst’s work is at the heart of their enterprise’s intellectual heft. Active metadata feeds into this at a fundamental level, and the quicker and more accurately you can ingest this information into your models such that it can be easily interrogated by your colleagues, the better.
The Solidatus way
Other lineage software platforms provide connectors but don’t be fooled into thinking that they all value these attributes as highly as we do at Solidatus or even do them at all. We live by them.
No matter where your metadata sits, we can harvest it.
Whether you’re looking to map data and catalogs, databases, or BI, ERP and CRM tools, we’re the best place to start that journey.
At the time of writing, our connector-count is hovering close to the 60-mark – and rising. Where will it be when you next check our connectors page?