The 48-Hour Test: Does Your AI Have Complete Data Lineage?

How leading banks use data lineage to accelerate AI deployment while others face months of delays

Three institutions receive the same question during Model Risk Management reviews: “Walk us through the complete data lineage for your credit decisioning models. Show us every source, transformation, and quality control from origin to decision.”

Institution A retrieves its current bi-temporal documentation within 48 hours. Every model required a complete lineage before production approval, with quarterly updates mandated, and their documentation matches reality. Institution B has the original lineage from model validation two years ago, but admits they haven’t maintained it. They promise to reconcile documentation with the current state “within the quarter.” Institution C’s documentation is fragmented—created for initial approval but never updated, rendering it obsolete. They need six months to document what their models actually do today versus what they did at launch.

The examiner’s follow-up is rigorous: “How do you validate data quality at each transformation stage? How do you assess the impact of upstream changes on model accuracy? How do you demonstrate compliance with fair lending requirements when you cannot trace input features to authoritative sources?”

Institution A secures immediate approval, while Institution C faces months of enhanced scrutiny.

This scenario plays out weekly across the financial services industry. Model Risk Management examinations increasingly treat data lineage as core documentation under SR 11-7 guidance. Fair lending exposure alone makes this a critical issue. Credit decisioning models require auditable input lineage to demonstrate non-discrimination. Yet most institutions still scramble when examiners ask these fundamental questions.

Andrew Foster, Chief Data Officer at M&T Bank, frames it simply: Lineage isn’t “in response to gen AI”, but a “core capability” that answers one question: “Do we know where our data comes from and how we use it?”1

  • Bi-temporal traceability. The ability to recreate the exact data state at any point in history. When examiners ask you to prove a model was compliant on June 15, 2024, you need this capability to show them precisely what data existed at that moment.
  • End-to-end visibility. Tracking data from source systems through every transformation. Major banks rely on Solidatus to interpret legacy mainframe COBOL code because visibility must extend even into legacy systems that predate modern documentation.
  • Dependency mapping. Knowing which models consume any data element. When a data source changes, you instantly see every affected model downstream, ensuring no surprises or hidden breaks.
  • Business context integration. Linking technical flows to ownership, policies, and risk classifications. Chris Skinner, Data Governance Director at LSEG, describes this as a shift from ‘spreadsheets and hearsay’ to documented accountability.2

The EU AI Act now mandates data lineage for high-risk AI systems, effectively codifying data lineage into law. US regulators require it for AI model validation, a process taking “at minimum nine months, but typically 12 months,” according to Solidatus VP of Product Tina Chace. This complexity crisis makes AI deployment without lineage impossible.

Without tracing data through transformations and proving source, usage rights, and compliance under the EU AI Act and NIST AI RMF, institutions cannot deploy AI at scale. Institutions with comprehensive lineage deploy confidently, while those lacking it remain stalled in pilot phases

Achieving speed through certainty

HSBC’s approval for cross-border data sharing took months, creating bottlenecks across 64 countries. After implementing Solidatus, the same process takes minutes. They achieved up to 90% efficiency gain and redeployed 400 operational staff to strategic work.3

M&T Bank eliminated time spent on ‘data archaeology,’ removing the need to hunt for sources and validate trustworthiness. They now answer in minutes what took weeks: “If Standard & Poor’s changes their data feed format tomorrow, which models break?”

Consider what happens when someone upstream changes how they calculate a derived variable for their own needs. A credit risk score that combined five factors now uses four. A different team required faster processing. A customer lifetime value metric shifts from 36-month to 24-month projections because finance wanted quarterly comparisons. Downstream model teams are not notified, causing models to ingest corrupted inputs. Accuracy degrades from highly reliable to dangerously incorrect.

Impact analysis prevents these avoidable failures. Before any upstream modification, comprehensive lineage automatically identifies every affected model. Manual assessment becomes automated certainty.

Credit models extract dozens of features from multiple systems and perform complex calculations. AML monitoring uses hundreds of derived variables through transformation chains. Third-party feeds change without warning. BCBS 239 already requires lineage for risk data. Yet most institutions haven’t extended these capabilities to AI/ML pipelines.

Real implementation shows dramatic results. Approval cycles drop from 87 days to single digits. Model governance maps lineage directly to ownership and risk classification, satisfying SR 11-7 requirements. Trust scoring automatically restricts models to high-quality sources. This eliminates the validation bottlenecks that consume data science teams.

Eric Hirschhorn at BNY found that Solidatus helped “drive better data outcomes” by integrating disparate metadata. Governance enables rather than obstructs. Andrew Foster describes lineage as “the spine that runs through the organization.” It allows “contextually relevant” conversations about quality and impact.

The fastest AI deployers built infrastructure, automating governance. They achieve speed through certainty.

GenAI Disrupts Traditional Governance

Financial institutions deployed traditional ML without comprehensive lineage, and now they need to catch up. Historically, they relied on manual controls and institutional knowledge; however, GenAI eliminates this option. Retrieval-Augmented Generation (RAG) and agentic AI make lineage mandatory for operations, not just compliance.

Why RAG can’t operate without lineage
Consider a scenario in which a risk team deploys a GenAI assistant to answer regulatory compliance questions using RAG. Testing goes perfectly with accurate, context-aware responses. Then someone asks: “What documents or data sets did the model access for that answer?”

This inability to verify sources can halt the project immediately. The model mixed current policies with draft documents, deprecated procedures, random emails, and SharePoint files of unknown origin.

Without lineage, RAG systems face critical operational failures:

  • Verify if answers came from authoritative sources
  • Troubleshoot when the model gives wrong information
  • Prevent access to inappropriate or outdated data
  • Explain to users or auditors how it reached conclusions

Every RAG query potentially accesses different sources. This requires real-time, adaptable governance. Applications for regulatory Q&A, customer service, and compliance assistance all require provenance tracking for every context that influences outputs.

Agentic AI amplifies the problem
Autonomous agents make sequential decisions, invoke tools, and retrieve data based on intermediate results. Without lineage showing exactly what data influenced each decision, you can’t:

  • Reproduce why the agent took specific actions
  • Debug when automated processes go wrong
  • Prove the agent acted appropriately
  • Control what data it accesses at each step

Model Risk Management guidance now covers AI agents. Documentation must explain “what actions it can take” and “where that data comes from,” according to Solidatus CTO Danny Waddington.

The infrastructure reality
These operational requirements demand purpose-built infrastructure with both technical and business lineage. You need systems that record every dataset retrieved for RAG workflows, tag sources by trust scores before they reach the model, and capture complete decision chains for autonomous agents—including why specific data was selected at each step. Manual processes can’t scale to handle the volume and velocity of GenAI data access.

BCBS 239 provides unexpected leverage. Institutions with risk data aggregation infrastructure can extend to AI governance rather than starting from scratch. The same proven capabilities that ensure accurate risk reporting can also extend to AI governance.

GenAI makes comprehensive lineage unavoidable. Build it proactively with proper infrastructure or watch manual controls fail at scale.

The Widening Competitive Divide

With the EU AI Act effectively codifying data lineage for high-risk systems, the separation between leaders and laggards is becoming permanent. Institutions lacking this infrastructure face a stark choice: urgently build under regulatory duress, restrict AI to low-risk applications, or exit markets with strict governance requirements. Unfortunately, none of these paths offers a sustainable competitive advantage.

While forward-thinking banks capitalize on their lineage investments to deploy AI safely, other organizations face reviews and examinations that extend for quarters rather than days. In these environments, teams invest months in “data archaeology” exercises that rarely reach completion, stalling AI scalability because compliance remains unproven.

Regulatory expectations are shifting
Under SR 11-7 and Model Risk Management guidance, examiners now expect AI/ML lineage documentation to match the rigor of traditional models. This includes covering dynamic access, autonomous decisions, and real-time retrieval. Institutions that can demonstrate this level of comprehensive lineage pass quickly, while others face scrutiny, delays, and potential enforcement.

Making lineage a strategic asset
Success requires treating lineage as core infrastructure rather than a compliance checklist. This shift enables institutions to operate at a different velocity: deploying AI in weeks, responding to regulators in hours, and surfacing quality issues before they impact production.

To determine your institution’s readiness, ask these five questions:

  • Speed: Can you trace the complete lineage for high-risk models within 48 hours?
  • Impact: Do upstream changes automatically surface affected models?
  • History: Can you demonstrate historical data states to regulators?
  • Workflow: Do data scientists spend more time validating sources than building models?
  • Control: Can GenAI applications access only approved, context-appropriate data?

Answering “yes” to all five indicates a maturity level that transforms governance from a bottleneck into an accelerator.

The data lineage imperative
When institutions controlling $50 trillion in assets align on the same solution—and partners like Microsoft validate it—it signals that comprehensive lineage has shifted from a differentiator to table stakes.

First-movers are already compounding their advantages through operational experience, regulatory trust, and the ability to attract talent seeking modern infrastructure rather than legacy workarounds. Your institution faces a clear choice: build the infrastructure that enables AI leadership now, or risk documenting why you fell behind later.

The divide sharpens daily. HSBC, M&T Bank, and BNY prove that lineage infrastructure accelerates rather than constrains innovation. It removes uncertainty, automates governance, and enables confident scale. Your institution faces a choice. Build the infrastructure that enables AI leadership, or risk being left behind as competitors pull ahead while you document why you can’t keep up.

1.“Unifying the Enterprise: How M&T Bank Is Rewriting the Rules of Data Governance with Solidatus.” 2025. Solidatus. September 30, 2025. https://www.solidatus.com/resource/unifying-the-enterprise-how-mt-bank-is-rewriting-the-rules-of-data-governance-with-solidatus,/

2.“Building Data Trust in LSEG.” 2025. Solidatus.” July 2025. https://www.solidatus.com/resource/building-data-trucame into effectst-in-lseg/.

3.“Solidatus Models HSBC’s Global Lending Book.” 2024. https://www.solidatus.com/wp-content/uploads/2024/01/Case-study-HSBC.pdf.

Published on: January 14, 2026

Author Name: Tina Chace

Author Title: VP Product Management

Contents

Related articles