Lineage tools forget what your AI model saw
Bi-temporal data lineage is the foundation of forensic AI investigation
Most enterprise lineage tools record some history. Edits leave change logs, and many catalogs version the metadata they scan. What almost none of them can do is let an investigator query the lineage as it existed three weeks ago, diff that view against today, and model a proposed fix in a sandbox before touching production.
Point-in-time query, version comparison, and sandboxed change modeling sit atop a bi-temporal record. Every fact in your data lineage carries two timestamps. Valid time is when the fact was true in the world; transaction time is when your systems recorded it. Those timestamps must be recorded in the metadata layer when the change occurs. Reconstructing them later from change logs leaves the comparison and sandbox with nothing to compare or sandbox against.
When an AI or machine learning model produces a bad outcome, say a declined loan or a missed fraud signal, the first question an investigator has to answer is what data the model saw at decision time. The second is what has changed since. The third is what would happen if the upstream feed were corrected. Answering any of the three takes a platform that can run those queries on demand, against a record that has not been overwritten between the decision and the question.
Anyone who runs credit models at a large bank will recognize the scene that follows, even though the specifics here are illustrative. A model risk team flags a credit decisioning system that started behaving differently three weeks ago. Customer complaints have surfaced, fair-lending counsel wants a written account in 48 hours, and the validator opens the current lineage view to reconstruct what the model saw. The view is useless. Three reasons it breaks at most enterprises:
Model teams can detect data drift in the present, when monitoring tools flag the change as it happens. Reconstructing the data state at the moment of a specific decision three weeks ago is a different problem. Most platforms don’t solve it, which is why retrospective investigations stall before they reach the data.
The U.S. Government Accountability Office reports that financial institutions “may find it difficult to evaluate the data sources used to train AI models, especially if the sources are opaque or unavailable.” The Office of the Comptroller of the Currency (OCC) has issued 17 matters requiring attention (MRAs) related to AI use since fiscal year 2020. The Securities and Exchange Commission (SEC) brought at least eight AI-related enforcement actions in 2023 and 2024.1
The validator’s real question has shifted from “did the model drift?” to “what did it drift away from, and when?”
In the illustrative case, the credit decisioning model has been declining 30% more applications from one metropolitan area than its baseline for the past three weeks. The validation team has 48 hours to produce a written account before fair-lending counsel takes over the conversation. Four sequential questions take the investigator from “the model behaved oddly” to “here is the data state at decision time and here is what changed.” Each question maps to a regulatory expectation and a lineage capability that catalog-level tools cannot answer.
The investigator needs column-level provenance for the seven features that drove the declined applications, not just the dataset name. Three came from a third-party vendor’s monthly file, two from internal deposit history, and two were derived from the loan origination system. Catalog-style lineage, which tags datasets at the table- or object-level, does not answer this. Physical-only tools map technical flows; they do not carry the business meaning, policy classification, or process ownership tied to each column. A scan-only tool that covers a selected set of technologies and stops at the first system it cannot connect to leaves the trace back to source incomplete. The Federal Reserve’s revised model risk guidance (SR 26-2) expects this depth. Section IV calls out “the selection of data” as a fundamental piece of conceptual soundness, and validators are expected to document it.2
The bi-temporal question is made concrete. The vendor’s monthly file was classified as an “authoritative source, golden record” on the day the decisions in question were made. Five days later, an internal review reclassified it to “secondary, pending validation.” A current-state lineage view shows “secondary.” The model treated the file as authoritative. Without both classifications recorded, the only one the investigator can see is the current one. The European Central Bank’s May 2024 guide on effective risk data aggregation requires “complete and up-to-date data lineages on data attribute level,” with footnote 24 clarifying that an attribute is “often stored as a column in a table.”3 Attribute-level lineage that loses its history at every overwrite cannot meet that bar.
The vendor restructured the FICO score field encoding on March 8. The model was last retrained on March 1. Three weeks of inferences happened against a feed that no longer matched training assumptions. A point-in-time diff between two lineage snapshots, one from March 1 and one from March 8, shows the change, the affected columns, and the downstream consumers. SR 26-2 Section V names this directly, requiring “ongoing model monitoring” of “data relevance.”4
The vendor file feeds four other models, two regulatory reports, and a customer-facing dashboard. Three of those consumers have been operating on the same stale assumption. SR 26-2 Section III addresses this as “aggregate risk,” describing the situation where models share “reliance on common assumptions, data, or methodologies” and degrade in concert.5 The same impact analysis query that prevents damage before deployment can cause damage after a failure. The capability is identical; only the direction through the lineage graph changes.
“Why wouldn’t I just build this with AI?”
— The architect’s question, 2026
Building it with AI sounds like a reasonable answer until you ask what “it” refers to. The natural-language interface is the easy part. An LLM, a vector store, and a copy of the current metadata can mimic the conversation. That stack cannot invent a history that was never recorded. If your platform overwrites prior lineage states every time someone edits the model, no LLM can reconstruct what those states were three weeks ago. The bi-temporal property has to live in the platform’s metadata write path itself. An assistant cannot retrofit it from above. An LLM running on single-timestamp lineage will produce confident answers about a past that has already been erased, and those answers will not survive examiner review.
Answering any of the playbook questions in minutes rather than weeks requires four properties of the system underneath the assistant, the dashboards, and the model risk paperwork:
This is a shared foundation. Build it once, and every team downstream inherits the same auditable record.
Regardless of jurisdiction, the next question carries the same demand. Show me what the model saw at decision time, prove the lineage I am reading today is the one that existed then, and show me what the fix will affect before you deploy it. Answering on demand takes a bi-temporal record at the metadata layer, point-in-time version comparison, and a sandbox for proposed changes. Most platforms today lack all three. The question is becoming routine.
Solidatus is a bi-temporal data lineage platform for regulated enterprises. Its foundation records both timestamps, and the workspace on top lets teams compare any two points in time, model proposed changes in a sandbox, and circulate plans for review before anyone touches production. The Solidatus AI Lineage Assistant reconstructs decision-time state in seconds rather than weeks. The natural-language interface is the surface; the bi-temporal record and the sandbox underneath are what the examiner reads. Request a demo.
1U.S. Government Accountability Office. “Artificial Intelligence: Use and Oversight in Financial Services.” GAO-25-107197, 2025.
https://www.gao.gov/products/gao-25-107197
2Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf
3European Central Bank. “Guide on Effective Risk Data Aggregation and Risk Reporting.” Banking Supervision, May 3, 2024, Section 3.4, footnote 24.
https://www.bankingsupervision.europa.eu/ecb/pub/pdf/ssm.supervisory_guides240503_riskreporting.en.pdf
4Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf
5Board of Governors of the Federal Reserve System, FDIC, OCC. “Supervisory Guidance on Model Risk Management.” SR Letter 26-2 Attachment, April 17, 2026.
https://www.federalreserve.gov/supervisionreg/srletters/SR2602a1.pdf
01.
AI forensics is the practice of reconstructing what data an AI or machine learning model consumed at the moment it made a specific decision. Observability tells you that a model has drifted; forensics tells you what the data state was at decision time and what has changed since. The discipline draws on accident investigation, where the goal is to reconstruct the operating conditions at the moment of failure rather than to characterize current behavior. Bi-temporal data lineage is the foundation that makes AI forensics practical at enterprise scale.
02.
A bi-temporal record carries two timestamps for every fact. Valid time marks when the fact was true in the world; transaction time marks when the system recorded it. A bi-temporal data lineage platform preserves both and lets you query the lineage as it existed at any point in the past. Most lineage tools record only transaction time and overwrite the past with each update, leaving retrospective queries dependent on manual archaeology through change logs.
03.
Several regulatory frameworks now require the capability to reconstruct what data fed a model at a specific point in time. SR 26-2, the Federal Reserve’s April 2026 model risk guidance, requires documentation of data selection, ongoing monitoring of data relevance, and assessment of aggregate risk across models sharing common data dependencies. The European Central Bank’s May 2024 guide on risk data aggregation requires complete and up-to-date attribute-level data lineages. Both frameworks treat retrospective lineage as a regulatory expectation rather than an architectural nice-to-have.
04.
Generally not, at least not natively. Most data catalogs treat lineage as a current-state view, overwriting earlier states with each update. Retrofitting a catalog to support bi-temporal queries means rebuilding the metadata write path so that every change appends a new fact rather than overwriting the previous one. Catalogs that were not built bi-temporally from the start can sometimes mine their change logs to reconstruct past states, but the process is manual, slow, and rarely defensible to a regulator.
05.
Before evaluating tools, audit the metadata write path of your current lineage platform. Ask whether every change to a lineage record creates a new fact or overwrites the prior one. If the answer is overwrite, no amount of AI tooling on top will deliver forensic reconstruction. From there, the priorities are append-only metadata storage, immutable lineage snapshot identifiers attached to every model inference, and a workflow that treats post-incident investigation as a first-class capability rather than an emergency procedure.
Published on: July 2, 2026