Skip to main content
← All controls
GOVERN-1.2 / MAP-5.1 NIST AI Risk Management Framework

No customer data in shared / global models

Demonstrate that customer data is architecturally and operationally segregated from shared or global model training pipelines, ensuring no cross-tenant data contamination or inference risk.

Description

What this control does

This control ensures that customer-specific data, including personally identifiable information (PII), protected health information (PHI), or proprietary business data, is never used to train, tune, or improve shared machine learning models or global AI systems that serve multiple customers or tenants. Customer data must remain logically and physically isolated from training pipelines feeding multi-tenant models. This prevents cross-customer data leakage through model inference, memorization attacks, or accidental exposure when models generate outputs based on learned patterns from other customers' datasets.

Control objective

What auditing this proves

Demonstrate that customer data is architecturally and operationally segregated from shared or global model training pipelines, ensuring no cross-tenant data contamination or inference risk.

Associated risks

Risks this control addresses

  • Cross-tenant data leakage where one customer's sensitive information is memorized by a shared model and exposed to another customer through inference queries
  • Model inversion or membership inference attacks that reconstruct customer data from shared model weights or outputs
  • Regulatory non-compliance with data sovereignty, GDPR Article 32, HIPAA Privacy Rule, or contractual data isolation commitments
  • Intellectual property theft where proprietary customer algorithms, trade secrets, or business logic embedded in training data becomes accessible via shared model behavior
  • Loss of customer trust and contractual breach when terms-of-service explicitly guarantee data isolation but implementation allows commingling
  • Inadequate data lineage tracking causing accidental inclusion of customer datasets in global model retraining workflows
  • Prompt injection or adversarial queries extracting customer-specific training examples from shared foundation models

Testing procedure

How an auditor verifies this control

  1. Obtain and review the organization's data classification policy and AI/ML model inventory, identifying all shared, multi-tenant, or global models serving multiple customers.
  2. Review architecture diagrams and data flow documentation for each shared model, tracing data ingestion, preprocessing, training, and inference pipelines to confirm customer data sources are excluded.
  3. Interview data science, ML engineering, and platform teams to understand data sourcing procedures, training dataset composition policies, and controls preventing customer data inclusion.
  4. Select a representative sample of 3-5 shared models and examine their training configuration files, data manifests, and provenance logs to verify source datasets exclude customer-specific repositories.
  5. Review access control lists, IAM policies, and data store permissions to confirm shared model training jobs lack read access to customer data buckets, databases, or file systems.
  6. Examine model versioning and experiment tracking systems (e.g., MLflow, Weights & Biases) for the sampled models, verifying training metadata explicitly documents exclusion of customer data and data sources used.
  7. Test a sample shared model by submitting queries designed to elicit customer-specific information, documenting whether outputs reveal any customer identifiers, proprietary terms, or data patterns inconsistent with public/synthetic training data.
  8. Review incident response logs and model audit trails from the past 12 months for any reported data leakage events, retraining incidents, or customer complaints related to data exposure through shared models.
Evidence required Collect architecture diagrams with data flow annotations, data classification policy excerpts defining customer data scope, model training configuration files and dataset manifests for sampled models, IAM policy exports showing access restrictions between customer data stores and model training environments, experiment tracking system screenshots with training metadata, and incident logs or customer support tickets related to data isolation concerns.
Pass criteria All sampled shared models demonstrate documented exclusion of customer data from training pipelines, technical access controls prevent customer data ingestion, and no evidence exists of customer-specific information in model outputs or incident records showing data leakage.

Where this control is tested

Audit programs including this control