Four AI governance questions your data catalog cannot answer
At a regulated financial institution, an employee used an AI agent to generate a summary for a regulatory attestation. The agent pulled from a corpus of internal documents, produced a plausible answer, and the employee copied it into the filing. No one verified which documents the agent accessed, whether any were draft or deprecated, or whether the output contained information derived from restricted datasets. Tina Chace, VP of Product Management at Solidatus, has seen this pattern across multiple institutions. “The result is generated by an agent. It should be double-checked, but they then insert the result into their attestation, bypassing the necessary double-checking. That’s kind of the scary stuff that happens now.” According to a Delinea survey, 44% of organizations struggle with business units deploying AI solutions without involving IT or security teams, which means scenarios like this are playing out in environments where no one is monitoring the data supply chain.1
The scale of AI adoption makes that governance gap consequential. Gallup found that 45% of U.S. employees now use AI at work, rising to 58% in financial services.2 IBM’s 2025 Cost of a Data Breach Report found that 97% of organizations reporting AI-related security incidents lacked proper AI access controls, and 63% had no AI governance policies in place.3 The tools most organizations rely on to manage their data were not designed for this. Data catalogs solved a genuine problem by enabling enterprises to discover, classify, and assign ownership to data assets. Those investments continue to deliver value for search, documentation, and stewardship. But a catalog tells you what data you have. It was never built to answer what an AI system did with that data, which version it retrieved, or where the output ended up.
In January 2026, Gartner predicted that by 2028, 50% of organizations will adopt zero-trust data governance because unverified AI-generated data is proliferating across enterprise systems.4 The reasoning is straightforward. When AI-generated content enters your data supply chain and gets recycled into training sets, you can no longer take data provenance at face value. You need lineage that proves where the data originated and how it was transformed before any model consumed it.
The enforcement timelines are converging from multiple directions. The EU AI Act’s obligations for high-risk AI systems, including those used in credit scoring, insurance underwriting, and hiring decisions, take full effect on August 2, 2026, carrying penalties of up to €35 million or 7% of global annual revenue.5 In the United States, a May 2025 Government Accountability Office report found that AI tools in financial services can “amplify” existing risks to fair lending, privacy, and data accuracy, and that regulatory oversight has not kept pace with the rate of adoption.6 Regulators on both sides of the Atlantic are asking for documentation that most organizations cannot produce with their current tooling, and the window for voluntary preparation is narrowing.
The most practical way to evaluate whether your tools are ready for AI governance is to ask them a set of questions that regulators and risk officers are already asking. Each one below reflects an active regulatory requirement or a documented operational failure pattern at major financial institutions.
Can you prove what data an AI system accessed for a specific decision, and where the output went?
At a large banking customer, Chace describes an entirely manual governance process. “You have to permission every single data set that you would allow an LLM to use for a specific use case, and that’s currently manual.” The permissioning challenge is only half the equation. Organizations also need to trace what happens after the model produces its output, because a summary generated by an AI agent that ends up in a regulatory filing carries a different risk profile than one shared in an internal meeting. A catalog can classify data at rest, but it cannot capture what a model or agent retrieves at runtime, nor can it track the output as it moves into downstream reports, internal systems, or customer-facing documents. End-to-end lineage makes both the input and output chains visible in a single view.
Can you detect when an upstream change has silently degraded a model?
Chace spent six years deploying machine learning models in production at large financial institutions before joining Solidatus. She points to a problem that governance teams consistently underestimate. “People are really concerned about data drift or concept drift. But one of the easiest problems to solve is, did someone upstream make a potentially intentional change?”7 When a data engineering team modifies a schema, adjusts a transformation, or updates a vendor feed for their own valid reasons, every downstream model consuming that data is affected. Without dependency mapping across systems, teams discover the damage weeks later when outputs have already degraded, and no one can identify the root cause. Lineage across the full data supply chain surfaces those dependencies before a change reaches production, allowing teams to assess impact and adjust.
Can you trace the full provenance of your AI’s training data to its original source?
Consider a model trained on customer transaction data that has passed through three internal systems, two vendor transformations, and a data quality layer before reaching the training pipeline. If a regulator asks you to demonstrate how a specific input feature was sourced, transformed, and validated at each step, you need more than a catalog entry showing dataset ownership and a sensitivity tag. That requires a map of every transformation in the chain at the individual field level. With the EU AI Act requiring provenance documentation for high-risk AI systems and U.S. regulators now integrating AI into their own supervisory examinations, the ability to reconstruct that chain on demand is moving from best practice to regulatory expectation.
Can you enforce data policy against actual data flows, not just metadata tags?
Eric Hirschhorn, former Chief Data Officer at BNY, frames this in operational terms. “I have to make sure that the security and the privacy and the ethics follow the information regardless of the transformation.”8 A policy that says “this dataset is restricted from external AI systems” is necessary. Still, it is only enforceable when you can verify that the data is not flowing into one. PwC’s 2025 Responsible AI survey found that 50% of organizations cite “difficulty translating principles into scaled and operational processes” as their primary barrier to effective AI governance.9 Policies written in governance documents and policies reflected in actual data flows are often two different realities, and the distance between them is where exposure accumulates. Lineage closes that distance by verifying whether the rules on paper match the movement of data through production systems.
BNY offers a practical model for how governance and deployment speed can coexist. The bank established what Hirschhorn calls a “data risk usage board,” a cross-functional body spanning ethics, privacy, the CDO office, AI teams, and legal counsel that reviews every AI model before it reaches the proof-of-concept stage. “It’s a large cohort of folks that sit in that steer co. and look at things before we start to experiment, before we put hands to keyboard,” Hirschhorn explains. Because the review catches problems during design rather than after deployment, the models that reach production have already cleared the governance bar, which means fewer incidents downstream and a faster time to production.
The governance infrastructure did not start with AI. BNY built it to meet existing regulatory requirements, and the existing lineage maps became the foundation for governing AI data flows. For organizations evaluating where to begin, that sequence matters. You don’t need to solve AI governance as a separate initiative if you build the lineage layer first.
Run the four questions above against your current tooling. Ask your team how long it takes to produce a complete answer for each one. Where the response is silence, partial answers, or a promise to get back to you in a few weeks, you have identified your exposure. With the EU AI Act’s August 2026 enforcement deadline approaching and U.S. regulators expanding their AI oversight capabilities, the distance between your response time and what regulators expect is the measure of your governance gap.
Catalogs remain valuable for what they were designed to do. But AI governance requires what catalogs were never built to provide. It requires a living map of how data actually moves, what depends on what, and whether the policies you wrote are reflected in the flows you run. Organizations that want to see what that looks like in practice, including how the Solidatus AI Lineage Assistant maps training data provenance and downstream impact, can request a demo at solidatus.com. The next article in this series will explore how leading banks are using BCBS 239 compliance as the on-ramp for building the lineage infrastructure that AI demands.
[1] Tina Chace, interview on Data Faces Podcast, hosted by David Sweenor, November 2025.
[2] Delinea, “2025 AI in Identity Security Demands a New Playbook,” as reported by Help Net Security, November 2025.
https://www.helpnetsecurity.com/2025/11/12/delinea-shadow-ai-governance/
[3] Gallup, “AI Use at Work Rises,” August 2025 (n=23,068 U.S. adults, ±1.0 pp margin of error).
https://www.gallup.com/workplace/699689/ai-use-at-work-rises.aspx
[4]: IBM, “Cost of a Data Breach Report 2025,” 2025.
https://www.ibm.com/reports/data-breach
[5] Gartner, “Gartner Predicts by 2028, 50% of Organizations Will Adopt Zero-Trust Data Governance as Unverified AI-Generated Data Grows,” January 21, 2026.
https://www.gartner.com/en/newsroom/press-releases/2026-01-21-gartner-predicts-by-2028-50-percent-of-organizations-will-adopt-zero-trust-data-governance-as-unverified-ai-generated-data-grows
[6] European Commission, “AI Act: Shaping Europe’s Digital Future,” 2024.
https://digital-strategy.ec.europa.eu/en/policies/regulatory-framework-ai
[7] U.S. Government Accountability Office, “Artificial Intelligence: Use and Oversight in Financial Services,” GAO-25-107197, May 2025.
https://www.gao.gov/products/gao-25-107197
[8] Tina Chace, DMRadio: Meeting of the Minds — Getting AI Ready, hosted by Eric Kavanagh, Inside Analysis, November 2025.
[9] Eric Hirschhorn, “Pioneering Data Strategies: How Bank of New York Is Shaping Business Success in the Age of AI,” Solidatus webinar, 2025.
https://www.solidatus.com/resource/pioneering-data-strategies-how-bank-of-new-york-is-shaping-business-success-in-the-age-of-ai-webinar-recording/
[10] PwC, “2025 Responsible AI Survey: From Policy to Practice,” 2025.
https://www.pwc.com/us/en/tech-effect/ai-analytics/responsible-ai-survey.html
Published on: March 2, 2026