Proving data lineage to regulators

What regulators expect when they ask you to prove data lineage

SR 26-2, the ECB’s May 2024 guide, and what data catalogs miss.

On April 17, 2026, the Federal Reserve, OCC, and FDIC jointly released SR 26-2, the first major update to US model risk management guidance in fifteen years.¹ The new tri-agency guidance is principles-based, applies to banking organizations with over $30 billion in total assets, and excludes generative and agentic AI from its scope. Footnote 3 of the attachment adds the twist. Banks are still expected to build their own governance frameworks for those systems based on the principles laid out in the guidance.

SR 26-2 never uses the phrase “data lineage,” yet every principle in it rests on evidence that only granular lineage can produce. The European Central Bank’s May 2024 guide said the same thing in plain language eighteen months earlier, calling for “complete and up-to-date data lineages on data attribute level.”² Different regulatory styles land on the same evidentiary floor, and catalog-level lineage will not clear it. The rest of this piece walks through the five questions an examiner will ask and the evidence a compliance team needs to produce under exam conditions.

The principles SR 26-2 quietly rest on

The guidance reads as a high-level framework, yet the data requirements sit in almost every section. Inherent risk (Section III) depends on “the quality of inputs for the model, and data constraints.” Model testing (Section IV) must include a “a critical assessment of data quality, relevance, and inputs.” Validation work (Section V) documents “data selection” as part of conceptual soundness, then monitors “data relevance” over time. Even vendor products fall under the principle, with Section VII requiring evaluation of “data, parameter values, or complete models” supplied by third parties. Each of these assumes an underlying evidence trail.

“Generative AI and agentic AI models are novel and rapidly evolving. As such, they are not within the scope of this guidance.”
— SR 26-2 Attachment, Footnote 3, April 17, 2026

The ECB put it plainly eighteen months earlier. The May 2024 Guide on Effective Risk Data Aggregation and Risk Reporting requires “complete and up-to-date data lineages on data attribute level” as the minimum for an effective data governance framework. The Bank Policy Institute reported “whispers” among modeling teams that examiners had already begun changing data lineage requirements for certain models under SR 11-7.³ SR 26-2 formalizes what the field was already hearing in audit rooms.

Five questions an examiner will ask

Inside the audit room, the abstraction becomes specific. When the Head of Risk Data at a G-SIB faces her first validation review under SR 26-2, the examiner does not ask which system holds the credit portfolio data. She asks how a specific probability-of-default (PD) estimate arrived at the model, and what the lineage looked like six months ago versus today. The examiner expects a path from source to model input, not a dataset label.

“Where did this data come from?”

Catalogs answer this question with a source system tag. An examiner applying SR 26-2’s “data selection” requirement needs more than that. She needs the full column-level path from the PD value back to the point of capture, through every staging layer, ETL job, and rating-to-PD mapping that touched it. The ECB guide spells out the standard. Data lineage has to be complete, up-to-date, and “on data attribute level (starting from data capture and including extraction, transformation and loading).”⁴ A dataset-level answer cannot demonstrate provenance for the single PD figure the examiner just named. That evidence comes from a data lineage platform built around attribute-level provenance, not a catalog built around the dataset.

“What transformed it, and when?”

A catalog records the schema at the time the connector last ran. The transformation logic lives elsewhere, and so does a time stamp on the rules. SR 26-2’s model-testing principle wants “a critical assessment of data quality, relevance, and inputs,” and there’s no way to satisfy that without knowing which rule applied on which date. A PD estimate isn’t produced by one calculation. It passes through rating-scale translations and through-the-cycle adjustments, then regulatory floors and economic-cycle overlays, before reaching the model. Without a rule-level change history, the examiner has nothing to check. The fix is a lineage platform that treats transformation logic as its own object, versioned every time a rule changes.

“Who owns it, and what policy governs its use?”

SR 26-2’s governance section calls for clear roles and responsibilities across the model lifecycle. EU AI Act Article 10 extends that requirement to every high-risk AI system that processes training, validation, or test data.⁵ Catalogs hold stewardship fields that name a team or a person, yet the examiner’s question goes beyond that. A credible answer pulls three things together for each PD estimate: who owns it, which Internal Ratings-Based (IRB) policy governs the calculation, and which regulatory regime applies. Then it pushes that context upward to every risk-weighted asset (RWA) figure and every Pillar 3 public-risk disclosure downstream. A catalog entry doesn’t reach that far. A lineage layer can, if it carries ownership, policy, and regime on every element, and can traverse to every downstream consumer.

“Ongoing model monitoring involves an evaluation of the extent to which a model is performing as expected given potential changes in products, exposures, activities, clients, data relevance, or market conditions.”

— SR 26-2 Attachment, Section V, April 17, 2026

“What breaks if this data changes?”

Take a third-party ratings vendor that moves from a 15-grade to an 18-grade scale. The rating-to-PD mapping table breaks silently, and every downstream consumer inherits the shift. A catalog can list which tables reference the feed, but it can’t simulate what breaks downstream, whether that’s the PD calculation, the RWA figure, or the Pillar 3 disclosure. A lineage-first platform connects the feed all the way through to the disclosure and runs impact analysis before the change hits production.

“Can you show me this same picture six months ago?”

An examiner does not want today’s map. She wants the map as of the date the model was last validated, so that she can compare the PD inputs at validation against the PD inputs today. Generally, a catalog overwrites the historical state with each scan, and there is no way to reproduce the map on validation day. Bi-temporal lineage, by contrast, restores the full map to any point in time, shows the data as it was, and contrasts it against today. SR 26-2’s “data relevance” monitoring principle depends on exactly that capability, which is the evidence chain a bank needs to defend a model years after it went live.

Why catalog-level data lineage is not enough on its own

Data catalogs are purpose-built to index assets, and they do that work well for users who need to search, discover, and classify. Data lineage in most catalogs is a view derived from connector-captured metadata, which means the map is only as complete as the automation behind it. That leaves manual flows, legacy systems without machine-readable schemas, and column-level transformations buried in business rules all outside the map. An examiner has no reason to care about that limitation. She still has to apply SR 26-2’s principles on data quality, data selection, and data relevance, which are not questions a discovery catalog can answer.

The US and EU are converging on the same evidence

Different regulatory styles land on the same architectural floor. The table below maps the five regimes a $30B+ international bank is likely to answer to in 2026, and what each one asks of lineage.

Regulation	Effective	What it asks of data lineage
BCBS 239⁶	2013 (G-SIBs)	Risk data aggregation and reporting, reinforced by the ECB 2024 guide
ECB Guide on Effective Risk Data Aggregation ⁷	May 2024	Complete, up-to-date, attribute-level data lineage is a minimum requirement
DORA⁸	January 2025	ICT risk management that covers traceability for third-party data
EU AI Act, Article 10⁹	2024–2026 phased	Data governance and traceability for high-risk AI training, validation, and test data
SR 26-2¹⁰	April 2026	Principles-based data quality, selection, relevance, and inputs across the model lifecycle

A bank that builds to the ECB’s attribute-level requirement already clears the SR 26-2 principles-based standard, but the reverse does not hold. A bank optimizing for the SR 26-2 text alone falls short of the ECB’s specificity, and every bank with European operations has to meet both. The real decision for a CDO is which data lineage architecture can satisfy the most specific requirement while remaining defensible under a principles-based review.

The GenAI carve-out amplifies the point. SR 26-2 tells $30B+ US banks to build their own governance for generative and agentic AI without any specification for what that looks like. Article 10 of the EU AI Act is the only prescriptive rulebook either regulator has actually published for training data governance, and it already works at the attribute level. A bank that wants to clear both exams needs column-level lineage that also carries a bi-temporal view.

What to produce

The five questions map to a concrete set of evidence. A bank that can produce these items clears both regimes:

End-to-end column-level lineage from the point of capture through every transformation to the model input
Transformation logic as a first-class object, time-stamped at the rule level
Bi-temporal snapshots that restore any historical state and contrast it against today
Impact analysis from any data element to every downstream feature, report, and disclosure
A governed change record tracking recommendations, responses, and exceptions across the model lifecycle

The fifth item is the one most banks underestimate. SR 26-2 asks for a record of how the state moved, not a static map of today. The record must name the proposed changes, the ones the team approved, and when each landed. Solidatus’s AI Lineage Assistant was built for that job. Every AI-proposed lineage change gets logged with a reviewer, and the record survives the next audit. The ECB put the same expectation at the attribute level back in May 2024, and SR 26-2 §VI brings it into US banks.

Any missing item on the checklist turns into a finding waiting to happen at the next exam. Benchmark your readiness against the full Solidatus BCBS 239 whitepaper.

¹Board of Governors of the Federal Reserve System, Federal Deposit Insurance Corporation, and Office of the Comptroller of the Currency. “Supervisory uidance on Model Risk Management.” SR Letter 26-2 Attachment. April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf.

²European Central Bank Banking Supervision. Guide on Effective Risk Data Aggregation and Risk Reporting. May 2024.
https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides240503_riskreporting.en.pdf.

³Bank Policy Institute. “The Most Damaging ‘Guidance’ in Banking.” 2025. https://bpi.com/the-most-damaging-guidance-in-banking/.

⁴European Central Bank Banking Supervision. Guide on Effective Risk Data Aggregation and Risk Reporting. May 2024. https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_g,uides240503_riskreporting.en.pdf.

⁵European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). Article 10. 2024. https://eur-lex.europa.eu/eli/reg/2024/1689/oj.

⁶Bank for International Settlements, Basel Committee on Banking Supervision. “Principles for Effective Risk Data Aggregation and Risk Reporting” (BCBS 239). January 2013. https://www.bis.org/publ/bcbs239.pdf.

⁷European Central Bank Banking Supervision. Guide on Effective Risk Data Aggregation and Risk Reporting. May 2024.
https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides240503_riskreporting.en.pdf.

⁸European Union. Regulation (EU) 2022/2554 on Digital Operational Resilience for the Financial Sector (DORA). Effective January 17, 2025. https://eur-lex.europa.eu/eli/reg/2022/2554/oj.

⁹European Union. Regulation (EU) 2024/1689 of the European Parliament and of the Council Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act). Article 10. 2024. https://eur-lex.europa.eu/eli/reg/2024/1689/oj.

¹⁰Board of Governors of the Federal Reserve System, Federal Deposit Insurance Corporation, and Office of the Comptroller of the Currency. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment. April 17, 2026. https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf.

Frequently asked questions

01.

Does SR 26-2 require data lineage?

SR 26-2 never uses the phrase “data lineage,” yet its principles-based requirements for model risk management cannot be met without it. Section III ties inherent risk to “the quality of inputs for the model, and data constraints.” Section IV requires testing to include “a critical assessment of data quality, relevance, and inputs.” Section V requires validation to document “data selection” and monitor “data relevance.” Producing evidence for each of those principles requires column-level, time-stamped data lineage. The guidance is non-enforceable, but examiners apply the principles as their evaluation lens.

02.

What does the ECB's May 2024 guide say about data lineage?

The European Central Bank’s Guide on Effective Risk Data Aggregation and Risk Reporting, published in May 2024, specifies “complete and up-to-date data lineages on data attribute level (starting from data capture and including extraction, transformation, and loading)” as a minimum requirement of an effective data governance framework. Attribute level means column level — not dataset level. The guide applies to Eurozone banks under ECB supervision and reinforces the BCBS 239 principles published in 2013.

03.

How does SR 26-2 differ from SR 11-7?

SR 26-2, issued April 17, 2026, by the Federal Reserve, OCC, and FDIC, supersedes SR 11-7 (2011) and SR 21-8 (2021). It retains the fundamental framework covering model development, validation, governance, and vendor oversight, but introduces a risk-based, materiality-tailored approach. Generative AI and agentic AI are explicitly carved out of scope under Footnote 3, with banks expected to build their own governance frameworks based on the guidance’s principles.

04.

Is a data catalog enough to satisfy regulatory data lineage requirements?

No. Data catalogs are purpose-built to index assets for search, discovery, and classification. The lineage a catalog produces is a view derived from connector-captured metadata, which means the map ends where the connectors end. Manual flows, legacy systems without machine-readable schemas, column-level transformations inside business rules, and historical state all sit outside catalog coverage. Regulatory lineage evidence requires a purpose-built data lineage layer that models transformation logic as a first-class object and supports bi-temporal restoration.

05.

What does "attribute-level" data lineage mean in practice?

Attribute-level lineage traces a single column (for example, a probability-of-default value) from the point of data capture through every staging layer, ETL job, calculated field, transformation rule, and business override, to the exact attribute that feeds a model or report. Dataset-level lineage stops at the table or dataset object, which cannot answer provenance questions about a single value inside that dataset. The ECB’s May 2024 guide explicitly requires attribute-level coverage.

06.

How should banks prepare data lineage for a model validation review?

Produce five pieces of evidence: end-to-end column-level lineage from source to model input; transformation logic modeled as a first-class, time-stamped object; bi-temporal snapshots that restore any historical state; impact analysis from any data element to every downstream consumer; and a governed change record tracking recommendations, responses, and exceptions across the model lifecycle. Banks that can present this set meet both the SR 26-2 principles lens and the ECB’s attribute-level requirement.

Published on: June 9, 2026

Blog

Proving data lineage to regulators

What regulators expect when they ask you to prove data lineage

The principles SR 26-2 quietly rest on

Five questions an examiner will ask

“Where did this data come from?”

“What transformed it, and when?”

“Who owns it, and what policy governs its use?”

“What breaks if this data changes?”

“Can you show me this same picture six months ago?”

Why catalog-level data lineage is not enough on its own

The US and EU are converging on the same evidence

What to produce

Frequently asked questions

Does SR 26-2 require data lineage?

What does the ECB's May 2024 guide say about data lineage?

How does SR 26-2 differ from SR 11-7?

Is a data catalog enough to satisfy regulatory data lineage requirements?

What does "attribute-level" data lineage mean in practice?

How should banks prepare data lineage for a model validation review?

Contents

Related articles

Your riskiest change ticket says low risk

Why banks can’t meet modern regulations without data lineage

Your Data Governance Team Already Built Your AI Governance Foundation

The Data Fairy is Dead

Detailed Expectations Around End-to-End Data Lineage and BCBS 239 From the European Central Bank RDARR Guide

How Data Lineage Supports Regulations Such as BCBS 239, DORA, GDPR and the EU AI Act

Why is Advanced Data Lineage Fundamental for Financial Services Organizations?

Continuing Innovation in Advanced Data Lineage to Help Answer Business Questions

Unveiling the Path: Why Data Lineage is Crucial for Building Effective AI Products

The ECB is “Intensifying its Supervisory Approach” on BCBS239 Compliance – and Basic Lineage is Not Sufficient

Solidatus & Microsoft Purview: Elevating Data Governance in the AI Era

How Solidatus is Enabling Everyone to be a Data Expert

Blasting Off: Why Proactive Data Governance is Propelling Innovation

Live Demo: Explore our All-new Interface

Visualize Snowflake Horizon and Enhance its Impact

Advanced Data Lineage: The Cornerstone of Modern Data Governance

Achieving Basel III Compliance: A 3-Step Action Plan

Building a Data Community

Data Lineage for Better Planning

Data Distress: Data Leaders on Brink of Quitting Jobs

Supercharging your Snowflake Governance with Solidatus

The Value of Data: Reflections from Attending Gartner

Douze Points for New Way to Visualize Eurovision Data

Quick Answer: What is Active Metadata?

The Amazing World of Active Metadata

5 Ways Great Metadata Connectors are Game-Changers

Combining Software to Ensure your Regulatory Compliance

5 Things we Can’t Wait to Do at the Gartner Summit, Orlando

Newsletter