Shadow AI is a sovereignty problem

Why an AI use policy cannot enforce itself.

Every chief risk officer and data privacy officer at a regulated bank already knows the answer to one question. Yes, their employees are putting company data into public AI tools, especially large language models (LLMs). The harder question is whether their governance can see it happening.

Across industries, IBM’s 2025 Cost of a Data Breach Report found that 1 in 5 organizations had a breach tied to shadow AI, with high-shadow-AI environments paying an average of $670,000 more per incident than firms with stronger oversight.1

Eric Hirschhorn, former Chief Data Officer at Bank of New York, has plainly set the principle. Proprietary data does not enter public LLMs, because once it does, the firm no longer controls it.2 That stance is a starting commitment. Holding it in practice is the harder problem, and it is where most AI use policies stop being useful.

An AI use policy states the rule. Enforcement is a separate capability, and holding the rule across thousands of undocumented data movements, performed by employees acting in good faith, is a sovereignty problem the policy itself cannot solve.

What sovereignty means for enterprise data

Two definitions of data sovereignty are in active use in 2026, and they describe different problems. Most policy-level coverage treats them as a single term, which confuses what a regulated firm should do regarding sovereignty.

The narrow definition is jurisdictional. Data is subject to the laws of the country where it sits, and the work of sovereignty is to keep it within the right borders. Compliance teams have lived with this version for a decade.

The broader definition is operational. Sovereignty is your firm’s ability to enforce its own rules on its own data, wherever it sits. Gartner frames sovereignty as a three-part requirement: data sovereignty, operational independence, and technological autonomy.3 A chief risk officer at a regulated firm has direct authority over operational sovereignty, and the work aligns with every regulatory enforcement mechanism the firm already has.

The architecture matters as much as the tooling. A data catalog tags assets at rest with classification, ownership, and purpose, while a technical lineage tool traces the flow of data through systems. Both capabilities are necessary inputs to operational sovereignty. Neither one carries meaning alongside the data into the systems that employees and AI agents use. Operational sovereignty needs flow and meaning to be together in a single layer that stays with the data.

Why policy on paper cannot enforce sovereignty

A working control has three parts. These are the stated rule, the observation that detects whether the rule is being followed, and the action that responds when it is broken. Most AI use policies stop at the first.

Hirschhorn names the principle that connects governance to the data itself. “I have to make sure that the security and the privacy and the ethics follow the information regardless of the transformation.”4 A rule that does not follow the data into the systems where it is used reduces to a statement of intent.

The numbers across industries match the framing. IBM’s 2025 Cost of a Data Breach Report found that 97% of organizations reporting an AI-related security incident lacked proper AI access controls, and that 63% had no AI governance policies in place at all.5 These numbers describe the modal state of enterprise AI governance in 2025, where the rule exists on paper without the layer of observation underneath it.

Bank of New York’s posture, set out in Hirschhorn’s webinar, works because the observation layer existed before the policy. That order is hard to reverse. BNY’s experience is the model for what operational sovereignty looks like when a firm builds it rather than declares it.

How BNY operationalizes sovereignty

Bank of New York treats AI sovereignty as a continuation of its existing data governance program. The central mechanism, which Hirschhorn calls a data risk usage board, convenes before any AI model reaches the proof-of-concept stage. The question on the table is whether the firm has the authority to use the proposed data for the proposed purpose.

The board works because the lineage infrastructure underneath it predates generative AI. BNY built its lineage maps on Solidatus during earlier regulatory work, including the data quality and risk reporting requirements of BCBS 239 and DORA. By the time generative AI arrived, the bank had the machinery to ask where data originated and who owned it. It could also trace which jurisdiction governed the data and which downstream systems consumed it. AI sovereignty, in this view, continues the work on data sovereignty that came before it.

“We do not put proprietary data into public LLMs… we no longer have control over it.”

— Eric Hirschhorn, Chief Data Officer, Bank of New York6

The board’s review remit operationalizes that stance. Each proposed AI model is mapped onto the lineage infrastructure before it reaches production. The board can see whether the data flowing into the model was permitted for that purpose, whether the training corpus crosses any jurisdictional or contractual lines, and what downstream systems will consume the model’s outputs.

What sits on a data risk usage board

The board includes ethics and privacy specialists, AI engineering, and legal, all convened by the chief data officer’s office before any AI model reaches proof of concept. The remit covers whether the firm has authority to use the proposed data for the proposed purpose, in the proposed context, with the proposed downstream effects.

BNY built an observation first, then wrote policy on top of it. A firm reversing the order, with policy first and lineage retrofitted later, ends up in the same place most enterprises already are. There is a rule on paper, but no enforcement layer underneath it.

Three places where sovereignty breaks

Operational sovereignty (data into public LLMs)

Operational sovereignty breaks first at the user level. Customer records, internal financial models, and draft regulatory filings enter public models through accounts that look ordinary. A data catalog tagged the source dataset as restricted, but it cannot track what an analyst pulled from a downstream view, reshaped in a spreadsheet, and then pasted into a prompt window. By the time the data reaches the model, the classification, the owner, and the regulatory tag are gone. IBM’s 2025 study found that 65% of shadow-AI breaches compromised personal information and that 40% compromised intellectual property.7 What the policy cannot observe, the policy cannot stop.

Jurisdictional sovereignty (training data crossing borders)

Jurisdictional sovereignty breaks when training data crosses a border that the firm cannot trace. A model trained on EU customer records for one purpose may make inferences for US users under a different purpose. The technical lineage tool shows the pipe, but it cannot tell the model whether the data carried permission for the new purpose. Article 44 of the General Data Protection Regulation governs the transfer; the model itself becomes the transfer mechanism. The EU-US Data Privacy Framework, the current adequacy mechanism, remains in force pending further legal challenge. Gartner forecasts that 75% of European and Middle Eastern enterprises will geopatriate workloads by 2030, up from less than 5% in 2025.8 Geopatriation means moving cloud workloads into in-country infrastructure to reduce geopolitical risk. The migration is happening with or without your AI policy.

Output sovereignty (AI agents in regulated artifacts)

Output sovereignty breaks when an AI agent’s output lands in a regulated artifact. An earlier post on AI governance described the pattern. An employee asks an agent to draft a regulatory attestation, the agent pulls from internal documents, and the output goes into the filing without anyone reconstructing where each statement came from. Tina Chace, VP of Product Management at Solidatus, has called this “the scary stuff that happens now.”9

Catalog-and-lineage tooling on its own runs out of evidence at this point. The Solidatus AI Lineage Assistant is built to map the training-data provenance and downstream impact of every AI output, which is the missing layer for agent-derived artifacts.

three-sovereignty-crossings

A governed warehouse sits at the center, with three outbound data flows. One reaches a sanctioned internal model and tests operational sovereignty. Another crosses an EU-US border to a downstream inference service, where jurisdictional sovereignty is at stake. The third leaves an AI agent and lands in a regulated filing, putting output sovereignty at risk. Each crossing point is annotated with the sovereignty layer that breaks when the firm cannot see what the data is doing at that moment.

Three sovereignty questions for your data team

Three questions to run against your own organization this quarter.

  1. Can we show, on a single map, every system where regulated data flows and which AI tools are connected to those systems?
  2. When an AI model is trained, can we prove every dataset in its training corpus was permitted for that purpose and that jurisdiction?
  3. When an AI agent produces an output that ends up in a regulated artifact, can we reconstruct the data path that produced it?

The team should be able to answer each question in minutes from a single map of data flows, model inputs, and agent outputs. Where the answers come back as partial information, week-long investigations, or “we are working on it,” that response is the sovereignty exposure.

These are governance questions, and the data team is the right place to answer them. All three depend on the same capability. The firm needs lineage that carries business context alongside flow, end-to-end, in a single layer. A platform that does both delivers the substrate that every question on the list presumes.

Operational sovereignty in practice

Operational sovereignty arises when the rule and the observation operate within the same layer. A regulated firm gets there by combining a complete lineage that traces the data with the business context that accompanies it. Solidatus is the platform that regulated firms use to put both into one operational layer, and the AI Lineage Assistant extends that layer to AI training-data provenance and downstream impact. McKinsey forecasts the sovereign AI market at $600 billion by 2030, with regulated enterprises serving as the demand signal that defines the category.10 The next post in this series turns to AI forensics, focused on reconstructing an agent-derived output when a regulator asks for it after the fact.

1IBM, “Cost of a Data Breach Report 2025,” July 30, 2025. https://www.ibm.com/reports/data-breach

2Eric Hirschhorn, “Pioneering Data Strategies: How Bank of New York Is Shaping Business Success in the Age of AI,” Solidatus webinar, 2025. https://www.solidatus.com/resource/pioneering-data-strategies-how-bank-of-new-york-is-shaping-business-success-in-the-age-of-ai-webinar-recording/

3Gartner, “Sovereign Cloud: Understand and Address Sovereignty Requirements,” 2025.
https://www.gartner.com/en/documents/5417163

4Eric Hirschhorn, “Pioneering Data Strategies: How Bank of New York Is Shaping Business Success in the Age of AI,” Solidatus webinar, 2025. https://www.solidatus.com/resource/pioneering-data-strategies-how-bank-of-new-york-is-shaping-business-success-in-the-age-of-ai-webinar-recording/

5IBM, “Cost of a Data Breach Report 2025,” July 30, 2025. https://www.ibm.com/reports/data-breach

6Eric Hirschhorn, “Pioneering Data Strategies: How Bank of New York Is Shaping Business Success in the Age of AI,” Solidatus webinar, 2025. https://www.solidatus.com/resource/pioneering-data-strategies-how-bank-of-new-york-is-shaping-business-success-in-the-age-of-ai-webinar-recording/

7IBM, “Cost of a Data Breach Report 2025,” July 30, 2025. https://www.ibm.com/reports/data-breach

8Gartner, “Gartner Identifies the Top 10 Strategic Technology Trends for 2026,” October 20, 2025.
https://www.gartner.com/en/newsroom/press-releases/2025-10-20-gartner-identifies-the-top-strategic-technology-trends-for-2026

9Tina Chace, interview on DMRadio: Meeting of the Minds — Getting AI Ready, hosted by Eric Kavanagh, Inside Analysis, November 2025.

10McKinsey & Company, “Sovereign AI: Building Ecosystems for Strategic Resilience and Impact,” 2025.
https://www.mckinsey.com/industries/technology-media-and-telecommunications/our-insights/sovereign-ai-building-ecosystems-for-strategic-resilience-and-impact

Published on: June 25, 2026

Contents

Related articles

Blog

Data lineage vs metadata management: The architecture behind AI governance

Why catalog-first tools fall short when regulators ask how your training data moved

Blog

Model Risk Starts in the Data Supply Chain

When an AI model starts producing unexplainable results, the first instinct is to blame the model. Teams often rush to...

Blog

Four AI governance questions your data catalog cannot answer

What regulators are already asking about your AI, and what it takes to respond

Blog

The Engineering of Trust: Why Metadata isn’t enough for AI

Three questions your AI governance approach must answer

Blog

The 48-Hour Test: Does Your AI Have Complete Data Lineage?

Three institutions receive the same question during Model Risk Management reviews: “Walk us through the complete data lineage for your...

Blog

How data lineage prevents AI failures in financial services

Solidatus’ Tina Chace and fellow experts reveal why 90% of AI model failures trace back to upstream data changes

Blog

Why Data Lineage is Essential for AI: 7 Governance Challenges Solved by AI-Ready Lineage

AI-ready data lineage is a comprehensive, auditable record of how data flows through your organization, designed to support AI governance...

Blog

Continuing Innovation in Advanced Data Lineage to Help Answer Business Questions

An update on some recent developments in our latest product releases

Blog

Unveiling the Path: Why Data Lineage is Crucial for Building Effective AI Products

Read more about data lineage and its business impact, including on AI, BCBS 239 and more

Blog

Solidatus & Microsoft Purview: Elevating Data Governance in the AI Era

Solidatus data lineage partners with Microsoft Purview to help enterprises trust their data

Blog

The Value of Data: Reflections from Attending Gartner

VP Product, Tina Chace, reflects on the Gartner conference, covering data governance and AI