Model Risk Starts in the Data Supply Chain
When an AI model starts producing unexplainable results, the first instinct is to blame the model. Teams often rush to retrain it, fine-tune the parameters, and rebuild the inputs. But in most enterprise environments, the model was never the problem. The data underneath it changed, and nobody told the data scientists.
Consider a common scenario at a large bank. A third-party vendor restructures a credit risk data feed. Fields shift, column formats change, and the data keeps arriving on schedule without a single error flag. Three weeks later, customers start reporting that legitimate transactions are being blocked while fraudulent ones slip through. The model team spends days investigating the algorithm before someone traces the issue back to the upstream data feed change. The model was working exactly as designed. It was the inputs that moved.
This kind of undetected drift repeats across financial services, insurance, and every regulated industry where AI models depend on complex, multi-source data pipelines. Model risk, more often than not, originates in the data supply chain. And the organizations that recognize this are approaching the problem differently.
KPMG defines model risk as “the potential for adverse outcomes stemming from models producing incorrect or misleading results,” and lists data inaccuracies alongside design flaws and misuse as root causes.1 When a model fails, organizations tend to focus on the code, not the data feeding it. When was the last time a model failure investigation at your organization started with the data pipeline instead of the algorithm?
The numbers bear this out. Gartner predicts that through 2026, organizations will abandon 60% of AI projects unsupported by AI-ready data.2 That statistic is usually read as a data quality problem. The assumption is that the data was bad from the start. In most enterprise environments, the data was fine when the model was built. Something upstream changed after deployment, and the model never knew about it.
Tina Chace, VP of Product at Solidatus, spent six years deploying AI and ML models in regulated banking environments before joining the company. She saw these bugaboos firsthand.
“They wouldn’t tell people, and then it would break the model, or the model would start coming out with different responses that were unexplainable.”
— Tina Chace, VP of Product, Solidatus
While generative AI and agentic systems get most of the attention, traditional ML models remain the majority of production AI in financial services, and they are particularly vulnerable to upstream data changes. They are trained on fixed assumptions about input structure and data distribution. When those assumptions break, the model’s outputs degrade without any error message or alert. As Chace noted, teams build models, and within a year, they go stale because the underlying data shifts and nobody prioritizes or knows to update them.
How many of your production models could you trace back to their source data in under an hour? If you’re not sure, the next section will explain why that matters.
Model monitoring and observability tools can effectively detect symptoms. They flag changes in accuracy, distribution shifts, and statistical anomalies in model inputs. But detecting that something changed is not the same as knowing where the change came from, why it happened, or what else it affected. Monitoring tells you the patient has a fever. Data lineage tells you the cause and whether other patients were exposed to the same thing.
There is also a timing problem. Monitoring is inherently reactive. It runs after the data has already changed and the model has already made an inference. Impact analysis through data lineage works upstream. It lets you simulate the blast radius of a proposed change before it reaches production, so you can intervene at design time rather than during an incident response.
The most common model failures originate upstream and tend to fall into three categories.
1. Data feed substitution. A procurement team renegotiates a contract with a market data provider and switches to a cheaper alternative. The new data feed covers the same instruments but samples at different intervals, uses different rounding conventions, and classifies certain derivatives under a different taxonomy. The schema validates and the data flows on schedule. But the pricing models consuming that feed start producing subtly different valuations, and nobody connects the drift back to the vendor swap. Chace described this dynamic from her experience in financial services. Data feed purchasers would switch sources or renegotiate contracts, and nobody would notify the teams whose models depended on those feeds.
2. Schema drift. Internal teams modify a database schema during a routine migration or platform upgrade. Columns get renamed, data types change, or fields get dropped. The migration team tests their own systems and moves on. Weeks later, a model team discovers that their feature engineering pipeline has been ingesting null values where customer tenure data used to be. At enterprise scale, with 30-plus years of legacy technology and tens of thousands of entities spread across homegrown banking applications, these dependencies are nearly impossible to trace by hand.
3. Business context loss. A data source is marked as authoritative in one governance system but deprecated in another. Or a team retires a golden source and replaces it with a new one, updating their own documentation but not the metadata that other teams rely on. The model pulls from the wrong source, and no technical error surfaces because the connection still works. The data is just no longer what the model was trained to expect.
In each case, the root cause is the same. The dependency between the upstream change and the downstream model was invisible, and no one knew the connection existed, so no one thought to check. End-to-end data lineage makes those dependencies visible, turning an unknowable risk into a manageable one.
So how do you catch these failures before they reach production?
KPMG’s model risk management framework outlines six lifecycle stages, from planning and development through validation, monitoring, and adjustment.3 Most organizations invest heavily in planning, development, and validation. Where things fall apart is in monitoring and adjustment, when the model is live, and the data environment around it keeps changing. Without visibility into upstream dependencies, teams have no systematic way to assess whether a proposed change will break something downstream.
“It’s about context, understanding, meaning, purpose, trust, and data quality.
That metadata sits in a graph.”
— Danny Waddington, CTO, Solidatus
This is where data lineage transforms model risk management from reactive to proactive. Instead of waiting for monitoring alerts to tell you a model has degraded, you run impact analysis at design time. As Chace described, the key is integrating lineage checks into the change management process itself. When someone proposes a change request, the system can detect downstream effects in real time rather than after something breaks in production. Solidatus can integrate with pre-merge and pre-implementation workflows so that impact analysis happens automatically as part of the approval process.
The challenge has always been scale. Running impact analysis across tens of thousands of entities, spanning decades of legacy systems, is not something a governance team can do manually for every change request. This is where the Solidatus AI Lineage Assistant changes the equation. Business users and data engineers can ask a question in natural language: “What breaks if we deprecate this database?” The AI agent analyzes the complete lineage model, including technical metadata, business relationships, and regulatory frameworks. It returns confidence-scored recommendations staged for human review. Hundreds of thousands of entities assessed in minutes rather than the weeks or months that manual analysis would require.4
That combination of end-to-end lineage and AI-powered impact analysis makes pre-deployment change assessment practical at enterprise scale for the first time. The model risk management lifecycle no longer has to break down at stages five and six because the tooling has finally caught up to the complexity of the problem.
And this problem is about to get more complex. As organizations deploy agentic AI systems that consume data, create new data flows, call APIs, and trigger downstream processes on their own, the dependency map becomes dynamic and far harder to trace manually. The same lineage-driven impact analysis that protects traditional ML models today will become essential infrastructure for governing agentic systems tomorrow.
The regulatory environment is now codifying what practitioners have known for years. You cannot govern models without governing the data that feeds them.
In February 2026, the U.S. Department of the Treasury released the Financial Services AI Risk Management Framework, laying out 230 control objectives for AI use in financial services. Data lineage, encryption, and security validation must be integrated into all model risk management processes. The framework is not legally binding, but it signals the direction regulators are heading and gives compliance teams a concrete benchmark to measure against.5
The investment numbers reflect this shift. Gartner projects that AI governance spending will reach $492 million in 2026 and surpass $1 billion by 2030. Organizations that adopt AI governance platforms are 3.4 times more likely to govern AI effectively.6 Meanwhile, KPMG reports that half of executives plan to allocate $10 to $50 million to improve data lineage and harden model governance over the next budget cycle.7
For regulated industries, the question is no longer whether to invest in data lineage for model risk. It is when and how fast. The organizations that treat lineage as infrastructure today will be better positioned to meet both current compliance requirements, like BCBS 239 and the EU AI Act, and the emerging standards that follow.
Your models are probably fine. Your data supply chain is the risk you are not managing, and here are three places to start.
1. Map your model dependencies. For each production model, identify all upstream data sources, transformations, and handoffs it depends on. If your team cannot produce that map in under an hour, you have a lineage problem.
This exercise alone will reveal how much of your model risk is actually data supply chain risk.
2. Shift impact analysis to design time. Integrate lineage-driven impact checks into your change management process. Every proposed schema change, vendor migration, or data feed modification should trigger an automated assessment of downstream effects before it reaches production. Catching a breaking change during a change request review costs a fraction of what it costs during an incident response.
3. Treat lineage as infrastructure, not a compliance exercise. Data lineage is not a box you check during an audit. It is the connective tissue between your data estate and every model, report, and decision that depends on it. Organizations that build lineage into their operational workflows, rather than bolting it on for regulators, are the ones that will scale AI with confidence.
The data beneath your models will continue to flow. Vendors will restructure feeds. Internal teams will migrate systems, and schemas will evolve. The question is whether you will see those changes before your models feel them.
See how the Solidatus AI Lineage Assistant turns impact analysis into a natural language question.
1KPMG. “Model Risk Management: A Prudent Blueprint Outside of Financial Services.” KPMG LLP, 2025.
https://kpmg.com/us/en/articles/2024/artificial-intelligence-and-model-risk-management.html
2Gartner. “Lack of AI-Ready Data Puts AI Projects at Risk.” Gartner Newsroom, February 26, 2025.
https://www.gartner.com/en/newsroom/press-releases/2025-02-26-lack-of-ai-ready-data-puts-ai-projects-at-risk
3KPMG. “Model Risk Management: A Prudent Blueprint Outside of Financial Services.” KPMG LLP, 2025.
https://kpmg.com/us/en/articles/2024/artificial-intelligence-and-model-risk-management.html
4Solidatus. “Solidatus Launches AI Lineage Assistant to Deliver Regulatory Data Lineage in Hours, Not Weeks.” March 10, 2026.
https://www.solidatus.com/news/solidatus-launches-ai-lineage-assistant/
5U.S. Department of the Treasury. “Treasury Releases Two New Resources to Guide AI Use in the Financial Sector.” February 2026.
https://home.treasury.gov/news/press-releases/sb0401
6Gartner. “Global AI Regulations Fuel Billion-Dollar Market for AI Governance Platforms.” Gartner Newsroom, February 17, 2026.
https://www.gartner.com/en/newsroom/press-releases/2026-02-17-gartner-global-ai-regulations-fuel-billion-dollar-market-for-ai-governance-platforms
7KPMG. “How AI Is Changing Model Risk Management.” KPMG, 2026.
https://kpmg.com/us/en/articles/2026/ai-model-risk.html
01.
Model risk refers to the potential for adverse outcomes when models produce incorrect or misleading results. KPMG identifies data inaccuracies as a primary root cause alongside design flaws and misuse. Data lineage addresses this by mapping every upstream data source, transformation, and dependency that feeds a model, enabling traceability of failures back to their source and assessment of the blast radius of upstream changes before they reach production.
02.
Models are trained on fixed assumptions about input structure and data distribution. When upstream data changes after deployment, those assumptions break, and the model’s outputs degrade without generating errors. The three most common failure modes are data feed substitution, schema drift, and loss of business context. Gartner predicts that by 2026, organizations will abandon 60% of AI projects that lack AI-ready data. The data was often fine at build time. Something upstream shifted, and no one told the model team. For a deeper look at this dynamic, see how data lineage prevents AI failures in financial services.
03.
Model monitoring detects symptoms after the fact. It can flag drops in accuracy and shifts in distribution in model inputs. Data lineage works upstream, mapping the dependencies between data sources and downstream consumers. Combined with impact analysis, lineage lets teams simulate the effects of a proposed change before deployment rather than diagnosing failures after they occur.
04.
Instead of waiting for a model to degrade in production, teams integrate lineage-driven impact checks into their change management workflows. When someone proposes a schema change, vendor migration, or data feed modification, the system flags every downstream model, report, and dashboard that depends on the affected data. This shifts model risk management from reactive incident response to proactive change assessment.
05.
The Solidatus AI Lineage Assistant is an agentic AI capability that automates impact analysis across complex enterprise data estates. Users can ask questions in natural language, such as “What breaks if we deprecate this database?” The AI agent analyzes the complete lineage model, returns confidence-scored recommendations, and stages all changes for human review. It can assess hundreds of thousands of entities in minutes rather than the weeks or months required by manual analysis.
Published on: April 17, 2026