How data lineage prevents AI failures in financial services
Solidatus’ Tina Chace and fellow experts reveal why 90% of AI model failures trace back to upstream data changes
The pressure to deploy AI has caught many organizations off guard, leaving teams that assumed they had more time to prepare scrambling to demonstrate capabilities. During a recent InsideAnalysis and DMRadio show, Meeting of the Minds: Getting AI Ready, Eric Kavanagh encapsulated this urgency:
“AI in the enterprise is kind of like a flash final exam halfway through the semester. All of a sudden, boom—final exam. Organizations are scrambling.”— Eric Kavanagh, Inside Analysis
The panel brought together data governance practitioners and industry analysts to tackle a question that has become urgent across financial services and beyond. How do you know if your enterprise is AI-ready? The conversation explored data architecture, governance frameworks, and lessons from enterprise AI projects. A central theme surfaced throughout the discussion: without data lineage, AI readiness remains out of reach.
Organizations are responding to the pressure by deploying chatbots, generative AI tools, and machine learning models into production. Some of these deployments fail almost immediately. Kavanagh referenced stories of chatbots that “went wacky out of the box” on their very first day in production. Tina Chace, who joined the panel from Solidatus, explained why this happens. “When we’re talking about a chatbot going rogue on day one, while potentially the chatbot’s fault, it’s likely the data that they have access to.”
The principle hasn’t changed—bad data in, bad data out. But in the age of AI, the consequences are amplified and far more visible. The organizations succeeding in this moment aren’t distinguished by the sophistication of their algorithms. They’re the ones who can answer a fundamental question about every AI initiative they deploy. Where did that data come from, and where is it going?
When AI/ML models in production start producing unreliable outputs, the instinct is often to examine the model itself. Teams review the algorithm, inspect the training process, and search for signs of model drift. But according to Tina Chace, who spent more than 5 years as an AI practitioner building and maintaining models at large organizations, the real culprit is usually elsewhere.
“Every time we came in to troubleshoot, in 90% of troubleshooting cases, something upstream of the dataset had changed—and we couldn’t really identify it easily because we didn’t have data lineage.”— Tina Chace, Solidatus
Data science and machine learning teams have traditionally focused their attention on validated, attested datasets. As long as a dataset was approved for experimentation, model building, and training, that was where their thinking ended. But once a model moves into production, the assumptions underlying that dataset become critical. Understanding the provenance, quality, and ownership of the data feeding the model is no longer optional.
Many organizations worry about data drift, the gradual or sudden changes in data distributions that can degrade model performance over time. Chace pointed out a simpler, often undetected problem: intentional upstream changes. A field gets reformatted, a table dropped, a calculation updated without notice. These changes happen routinely, and without lineage, the teams downstream have no way to connect the dots when their models start behaving unexpectedly.
The visibility gap runs in both directions. Malcolm Chisholm, founder of Data Millennium, raised a scenario familiar to anyone who has managed enterprise data infrastructure. Imagine a database administrator who decides to delete a table, assuming nobody important is using it. “I don’t have anything in terms of data lineage to look at,” Chisholm explained. “I don’t know if this is going into an AI solution, and I blow them up.”
The problem extends beyond inputs. Chace emphasized that data lineage is equally critical for understanding where AI model outputs flow after they leave the model. “So you know if there’s an issue with this model, it is actually contributing to these five key reports, or that the customer-facing team is using data from this dataset to answer customer questions.”
Malcolm Chisholm added that AI outputs often circle back into operational databases, feeding customer propensity scores or preference data that chatbots then use to personalize interactions. Many people still think of data as a linear flow from source to output, but in reality, enterprise systems are far more interconnected.
Operational efficiency alone justifies investment in data lineage. But when regulators, lawyers, and auditors enter the picture, insufficient lineage can mean failed audits, legal exposure, and regulatory penalties. Financial services organizations in the United States and the European Union face growing model risk management requirements that demand clear documentation of data flows. As Chace explained, regulators now expect organizations to “understand data flows and any upstream transformations that are going into critical models.”
This is not unprecedented. Malcolm Chisholm noted that regulations such as BCBS 239 have required banks to prove that data flows correctly from operational systems into regulatory reports for years. What has changed is the complexity introduced by AI and the scrutiny that accompanies it. The questions regulators ask have expanded, and the consequences of inadequate answers have grown more severe.
Legal exposure adds another dimension. Chisholm described projects in which lineage documentation was presented directly to legal counsel to answer fundamental questions about data use. “Is it legal? Is this licensed data from vendor X? Are we allowed to use it in this context? Is this confidential? Is this the trade secrets of our company that we’re putting out there to the world?” These questions demand precise answers, not educated guesses. Chisholm acknowledged that compiling this information manually was exhausting and left him uncertain about the results. “It was a lot of work to do that manually. And then it was like, I’m not sure I did it correctly?”
Generative AI introduces yet another accountability challenge. When models produce confident-sounding but factually wrong outputs, organizations need a way to verify what happened. Chace framed lineage as the solution. “If you fear that there’s a hallucination, you can go check the facts.”
“From a legal perspective, from a regulatory perspective, even from an operational resilience perspective—I want to be able to point back to actual facts. The data lineage provides those facts.” — Tina Chace, Solidatus
Meeting these requirements through manual processes was difficult in a pre-AI world. Doing so at the speed and scale that AI demands requires automation.
For many organizations, data lineage has been a compliance artifact. Teams compile documentation when auditors request it, creating static snapshots that capture a moment in time and then gather dust until the next review cycle. That approach cannot keep pace with AI.
Chace described the shift AI demands: lineage must evolve from “a retroactive document” to “a living, breathing document” that organizations can rely on as they scale their AI initiatives. AI/ML models don’t operate in isolation. They depend on continuously changing data pipelines, and they feed their outputs into systems that other processes depend on. A lineage snapshot from six months ago tells you very little about what is happening today.
The challenge is scale. Enterprise data infrastructure has grown so complex that manual documentation cannot keep pace. Malcolm Chisholm put it bluntly.
“The data infrastructure we have is incredibly complicated. We’re asking human beings to go out and document it—the scale is just too immense. It has to be automated.” — Malcolm Chisholm, Data Millennium
Automation is now making comprehensive lineage achievable in ways that weren’t feasible even a few years ago. Chace described how modern lineage tools can scan over 100 different types of systems in a matter of weeks, mapping data flows across an enterprise, and surfacing “systems that you didn’t know were actually part of this process.” The scanning process itself often reveals blind spots that organizations weren’t aware of.
Generative AI is also expanding the scope of what lineage can cover. Unstructured data, including documents, emails, and other content that resists traditional classification, has historically been difficult to incorporate into lineage efforts. Chace noted that AI can now “generate metadata for unstructured data programmatically, whereas previously you might have had to do that manually.” Sources that were once invisible to governance efforts are now available for proper tracking and documentation.
These same capabilities can make lineage accessible to stakeholders who aren’t technical specialists. Rather than presenting lawyers or compliance officers with dense visualizations they struggle to interpret, organizations can use AI to summarize lineage in ways that answer their specific questions. A lawyer asking whether data usage complies with licensing terms needs a different view than an engineer troubleshooting model performance, and both views can now be generated from the same underlying lineage.
The conversation about AI readiness often assumes that readiness is a state organizations can achieve before they begin deploying AI. Adrian Astala of Starburst offered a more honest assessment during the panel. “I don’t think anybody’s AI-ready,” he said. “When you sit down, and the bosses are out of the room, you look at each other, we’re like, we’re not ready.”
This admission resonates with governance teams, architecture, and business leaders alike. Everyone recognizes the gap between where they are and where they need to be. Astala suggested that the framing itself might be the problem.
“The question about being AI-ready is probably the wrong question. Maybe a different way to think about it is: are we ready to start?” — Adrian Astala, Starburst
Starting doesn’t mean mapping every data flow across the entire enterprise before launching a single AI initiative. That approach leads to paralysis. Chace recommended a use-case-driven strategy instead. “It would be extremely challenging to boil the ocean to try and just map your entire ecosystem. But if you’re very intentional about data flows that are going to a critical report, or an AI model—that is actually what we recommend.”
One practical framework involves tiered data certification. Organizations can classify datasets as gold, silver, or bronze based on the level of governance and documentation they have achieved. Gold-standard datasets, suitable for production AI models, require the most rigorous lineage. “The amount of data lineage you need for your gold standard datasets—that can be used in AI—should let you trace all the way from the original source with every single calculation,” Chace explained.
Malcolm Chisholm asked whether organizations should establish enterprise-wide lineage policies specifically for AI initiatives. Chace agreed. “Absolutely, especially for your most critical applications and processes, or most highly visible models.”
The goal extends beyond technical compliance. Astala described what happens when business teams participate in building AI agents and see their input translate into working solutions. “What I’ve created for you and in you is confidence,” he said. “Then you walk out of the room like John Travolta with a paint can, you just built an agent.” That confidence transforms a cautious pilot into organizational momentum.
Before your next AI initiative moves to production, consider whether you can answer these questions.
AI readiness isn’t a box to check. It’s a discipline that must be built and maintained, and data lineage is the foundation. The question worth asking isn’t whether your organization is ready. It’s whether you’re ready to start. Begin by mapping lineage for your single highest-risk AI model. If you can’t complete that exercise in a week, you’ve found your gap.
Published on: December 9, 2025