Lineage tools forget what your AI model saw

Bi-temporal data lineage is the foundation of forensic AI investigation

Most enterprise lineage tools record some history. Edits leave change logs, and many catalogs version the metadata they scan. What almost none of them can do is let an investigator query the lineage as it existed three weeks ago, diff that view against today, and model a proposed fix in a sandbox before touching production.

Point-in-time query, version comparison, and sandboxed change modeling sit atop a bi-temporal record. Every fact in your data lineage carries two timestamps. Valid time is when the fact was true in the world; transaction time is when your systems recorded it. Those timestamps must be recorded in the metadata layer when the change occurs. Reconstructing them later from change logs leaves the comparison and sandbox with nothing to compare or sandbox against.

When an AI or machine learning model produces a bad outcome, say a declined loan or a missed fraud signal, the first question an investigator has to answer is what data the model saw at decision time. The second is what has changed since. The third is what would happen if the upstream feed were corrected. Answering any of the three takes a platform that can run those queries on demand, against a record that has not been overwritten between the decision and the question.

Why retrospective reconstruction breaks at most enterprises

Anyone who runs credit models at a large bank will recognize the scene that follows, even though the specifics here are illustrative. A model risk team flags a credit decisioning system that started behaving differently three weeks ago. Customer complaints have surfaced, fair-lending counsel wants a written account in 48 hours, and the validator opens the current lineage view to reconstruct what the model saw. The view is useless. Three reasons it breaks at most enterprises:

  1. The pipeline has moved: The model consumed a third-party vendor file alongside two internal sources three weeks ago. Since then, the vendor restructured its feed, a platform consolidation migrated the schema, and an internal team swapped the golden source for one input without notifying the model team. The validator opens the current lineage view and sees today’s pipeline, not the one that fed the disputed decisions.
  2. The model of the pipeline has moved: Lineage records describing those flows have been edited many times since the decisions were made. The edits were a mix of refinements, schema reconciliations after last month’s platform change, and classification updates. Each one overwrote what the lineage said when the model ran. The validator can read the current model of the pipeline, but not the version the decisioning system was consulting three weeks ago.
  3. Operational logs don’t carry business context: Pipeline logs confirm that rows from the vendor file arrived on the days in question. They do not record that the file’s classification changed between the day the decisions were made and the day the validator opens the lineage view. A validator who starts there cannot reconstruct what the model believed about its inputs.

Model teams can detect data drift in the present, when monitoring tools flag the change as it happens. Reconstructing the data state at the moment of a specific decision three weeks ago is a different problem. Most platforms don’t solve it, which is why retrospective investigations stall before they reach the data.

The U.S. Government Accountability Office reports that financial institutions “may find it difficult to evaluate the data sources used to train AI models, especially if the sources are opaque or unavailable.” The Office of the Comptroller of the Currency (OCC) has issued 17 matters requiring attention (MRAs) related to AI use since fiscal year 2020. The Securities and Exchange Commission (SEC) brought at least eight AI-related enforcement actions in 2023 and 2024.1

The validator’s real question has shifted from “did the model drift?” to “what did it drift away from, and when?”

The forensic playbook: four questions

In the illustrative case, the credit decisioning model has been declining 30% more applications from one metropolitan area than its baseline for the past three weeks. The validation team has 48 hours to produce a written account before fair-lending counsel takes over the conversation. Four sequential questions take the investigator from “the model behaved oddly” to “here is the data state at decision time and here is what changed.” Each question maps to a regulatory expectation and a lineage capability that catalog-level tools cannot answer.

Q1. What data did this specific decision actually see?

The investigator needs column-level provenance for the seven features that drove the declined applications, not just the dataset name. Three came from a third-party vendor’s monthly file, two from internal deposit history, and two were derived from the loan origination system. Catalog-style lineage, which tags datasets at the table- or object-level, does not answer this. Physical-only tools map technical flows; they do not carry the business meaning, policy classification, or process ownership tied to each column. A scan-only tool that covers a selected set of technologies and stops at the first system it cannot connect to leaves the trace back to source incomplete. The Federal Reserve’s revised model risk guidance (SR 26-2) expects this depth. Section IV calls out “the selection of data” as a fundamental piece of conceptual soundness, and validators are expected to document it.2

Q2. What state was that data in three weeks ago?

The bi-temporal question is made concrete. The vendor’s monthly file was classified as an “authoritative source, golden record” on the day the decisions in question were made. Five days later, an internal review reclassified it to “secondary, pending validation.” A current-state lineage view shows “secondary.” The model treated the file as authoritative. Without both classifications recorded, the only one the investigator can see is the current one. The European Central Bank’s May 2024 guide on effective risk data aggregation requires “complete and up-to-date data lineages on data attribute level,” with footnote 24 clarifying that an attribute is “often stored as a column in a table.”3 Attribute-level lineage that loses its history at every overwrite cannot meet that bar.

Q3. What changed upstream between training and the bad decisions?

The vendor restructured the FICO score field encoding on March 8. The model was last retrained on March 1. Three weeks of inferences happened against a feed that no longer matched training assumptions. A point-in-time diff between two lineage snapshots, one from March 1 and one from March 8, shows the change, the affected columns, and the downstream consumers. SR 26-2 Section V names this directly, requiring “ongoing model monitoring” of “data relevance.”4

Q4. What else drank from the same pipe?

The vendor file feeds four other models, two regulatory reports, and a customer-facing dashboard. Three of those consumers have been operating on the same stale assumption. SR 26-2 Section III addresses this as “aggregate risk,” describing the situation where models share “reliance on common assumptions, data, or methodologies” and degrade in concert.5 The same impact analysis query that prevents damage before deployment can cause damage after a failure. The capability is identical; only the direction through the lineage graph changes.

“Why wouldn’t I just build this with AI?”

— The architect’s question, 2026

Why an LLM cannot reconstruct this for you

Building it with AI sounds like a reasonable answer until you ask what “it” refers to. The natural-language interface is the easy part. An LLM, a vector store, and a copy of the current metadata can mimic the conversation. That stack cannot invent a history that was never recorded. If your platform overwrites prior lineage states every time someone edits the model, no LLM can reconstruct what those states were three weeks ago. The bi-temporal property has to live in the platform’s metadata write path itself. An assistant cannot retrofit it from above. An LLM running on single-timestamp lineage will produce confident answers about a past that has already been erased, and those answers will not survive examiner review.

What forensic lineage requires of your architecture

Answering any of the playbook questions in minutes rather than weeks requires four properties of the system underneath the assistant, the dashboards, and the model risk paperwork:

  1. Append-only metadata: Every change to the lineage adds a new fact and preserves the prior version. Most catalog platforms violate this by design. Their default view shows the current model of the data and erases the trail behind it. Architects own this decision at design time. Once a platform is overwritten by default, retrofitting it to be append-only is closer to a rebuild than a refactor. Platforms designed bi-temporally from the start, like Solidatus, treat this as the default behavior.
  2. Every reference carries a snapshot ID: Every inference, every retrain, and every audit-relevant decision should carry an immutable lineage snapshot identifier rather than a dataset name. The snapshot is what makes Q2 in the playbook answerable in minutes instead of months. MLOps teams enforce this at the model serving layer. Data engineers enforce it inside the pipeline.
  3. Point-in-time comparison and propose-change sandbox: Append-only storage is the precondition. Without querying, diffing, and sandboxing on top, the record is documentation no one can act on. The validator needs to pull up the lineage as it existed on a specific date and diff it against today, visually and technically, without exporting CSVs or stitching change logs together. Before correcting the upstream feed, the architect needs to model the proposed change in a sandbox, see which models, reports, and dashboards depend on the affected columns, and circulate the plan for review before anyone touches production.
  4. Investigation as a first-class workflow: Monitoring tells you a model has drifted. Your lineage platform has to tell you what it drifted from, and those two systems have to share a vocabulary. Without that wiring, you have observability without reproducibility, which is the trap most banks are sitting in right now. Model validators should require this from infrastructure before approving any production deployment.

This is a shared foundation. Build it once, and every team downstream inherits the same auditable record.

From investigation to assurance

Regardless of jurisdiction, the next question carries the same demand. Show me what the model saw at decision time, prove the lineage I am reading today is the one that existed then, and show me what the fix will affect before you deploy it. Answering on demand takes a bi-temporal record at the metadata layer, point-in-time version comparison, and a sandbox for proposed changes. Most platforms today lack all three. The question is becoming routine.

Solidatus is a bi-temporal data lineage platform for regulated enterprises. Its foundation records both timestamps, and the workspace on top lets teams compare any two points in time, model proposed changes in a sandbox, and circulate plans for review before anyone touches production. The Solidatus AI Lineage Assistant reconstructs decision-time state in seconds rather than weeks. The natural-language interface is the surface; the bi-temporal record and the sandbox underneath are what the examiner reads. Request a demo.

1U.S. Government Accountability Office. “Artificial Intelligence: Use and Oversight in Financial Services.” GAO-25-107197, 2025.
https://www.gao.gov/products/gao-25-107197

2Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf

3European Central Bank. “Guide on Effective Risk Data Aggregation and Risk Reporting.” Banking Supervision, May 3, 2024, Section 3.4, footnote 24.
https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides240503_riskreporting.en.pdf

4Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf

5Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf

Frequently asked questions

01.

What is AI forensics, and how is it different from AI observability?

AI forensics is the practice of reconstructing what data an AI or machine learning model consumed at the moment it made a specific decision. Observability tells you that a model has drifted; forensics tells you what the data state was at decision time and what has changed since. The discipline draws on accident investigation, where the goal is to reconstruct the operating conditions at the moment of failure rather than to characterize current behavior. Bi-temporal data lineage is the foundation that makes AI forensics practical at enterprise scale.

02.

What does bi-temporal mean in the context of data lineage?

A bi-temporal record carries two timestamps for every fact. Valid time marks when the fact was true in the world; transaction time marks when the system recorded it. A bi-temporal data lineage platform preserves both and lets you query the lineage as it existed at any point in the past. Most lineage tools record only transaction time and overwrite the past with each update, leaving retrospective queries dependent on manual archaeology through change logs.

03.

Which regulations require this kind of retrospective lineage?

Several regulatory frameworks now require the capability to reconstruct what data fed a model at a specific point in time. SR 26-2, the Federal Reserve’s April 2026 model risk guidance, requires documentation of data selection, ongoing monitoring of data relevance, and assessment of aggregate risk across models sharing common data dependencies. The European Central Bank’s May 2024 guide on risk data aggregation requires complete and up-to-date attribute-level data lineages. Both frameworks treat retrospective lineage as a regulatory expectation rather than an architectural nice-to-have.

04.

Can a data catalog be retrofitted to support forensic investigation?

Generally not, at least not natively. Most data catalogs treat lineage as a current-state view, overwriting earlier states with each update. Retrofitting a catalog to support bi-temporal queries means rebuilding the metadata write path so that every change appends a new fact rather than overwriting the previous one. Catalogs that were not built bi-temporally from the start can sometimes mine their change logs to reconstruct past states, but the process is manual, slow, and rarely defensible to a regulator.

05.

Where should an enterprise architecture team start?

Before evaluating tools, audit the metadata write path of your current lineage platform. Ask whether every change to a lineage record creates a new fact or overwrites the prior one. If the answer is overwrite, no amount of AI tooling on top will deliver forensic reconstruction. From there, the priorities are append-only metadata storage, immutable lineage snapshot identifiers attached to every model inference, and a workflow that treats post-incident investigation as a first-class capability rather than an emergency procedure.

Published on: July 2, 2026

Contents

Related articles

Blog

Shadow AI is a sovereignty problem

Why an AI use policy cannot enforce itself.

Blog

Data lineage vs metadata management: The architecture behind AI governance

Why catalog-first tools fall short when regulators ask how your training data moved

Blog

Model Risk Starts in the Data Supply Chain

When an AI model starts producing unexplainable results, the first instinct is to blame the model. Teams often rush to...

Blog

Four AI governance questions your data catalog cannot answer

What regulators are already asking about your AI, and what it takes to respond

Blog

The Engineering of Trust: Why Metadata isn’t enough for AI

Three questions your AI governance approach must answer

Blog

The 48-Hour Test: Does Your AI Have Complete Data Lineage?

Three institutions receive the same question during Model Risk Management reviews: “Walk us through the complete data lineage for your...

Blog

How data lineage prevents AI failures in financial services

Solidatus’ Tina Chace and fellow experts reveal why 90% of AI model failures trace back to upstream data changes

Blog

Why Data Lineage is Essential for AI: 7 Governance Challenges Solved by AI-Ready Lineage

AI-ready data lineage is a comprehensive, auditable record of how data flows through your organization, designed to support AI governance...

Blog

Continuing Innovation in Advanced Data Lineage to Help Answer Business Questions

An update on some recent developments in our latest product releases

Blog

Unveiling the Path: Why Data Lineage is Crucial for Building Effective AI Products

Read more about data lineage and its business impact, including on AI, BCBS 239 and more

Blog

Solidatus & Microsoft Purview: Elevating Data Governance in the AI Era

Solidatus data lineage partners with Microsoft Purview to help enterprises trust their data

Blog

The Value of Data: Reflections from Attending Gartner

VP Product, Tina Chace, reflects on the Gartner conference, covering data governance and AI