Enterprise tier — no training on customer code
Demonstrate that enterprise-tier AI/ML models are architecturally and procedurally prevented from training on customer code, production data, or proprietary client content.
Description
What this control does
This control ensures that AI/ML models deployed at the enterprise tier are trained exclusively on sanitized, anonymized, or synthetic datasets and explicitly prohibited from training on live customer data, production code repositories, or proprietary client intellectual property. Organizations implement technical guardrails such as data classification enforcement, training pipeline access restrictions, and automated scanning to prevent customer code or PII from entering model training workflows. This separation protects customer confidentiality, reduces model leakage risks, and ensures compliance with data sovereignty and contractual obligations.
Control objective
What auditing this proves
Demonstrate that enterprise-tier AI/ML models are architecturally and procedurally prevented from training on customer code, production data, or proprietary client content.
Associated risks
Risks this control addresses
- Unauthorized exposure of customer proprietary source code or algorithms through model outputs or inference attacks
- Model memorization of sensitive customer data leading to inadvertent disclosure via generative responses or embeddings
- Violation of contractual data processing agreements or regulatory requirements (GDPR, CCPA, HIPAA) due to unauthorized training on customer PII
- Intellectual property theft or competitive harm if customer trade secrets are encoded into shared or multi-tenant model weights
- Loss of customer trust and legal liability from unintended reproduction of customer-specific code patterns or business logic
- Supply chain contamination where poisoned or malicious customer code influences model behavior across other customers
- Regulatory penalties and breach notification obligations if customer data is used without explicit consent or lawful basis
Testing procedure
How an auditor verifies this control
- Obtain and review the enterprise AI/ML training policy document, identifying explicit prohibitions on customer code or production data ingestion.
- Inventory all enterprise-tier AI/ML models, including training data sources, pipeline configurations, and data lineage documentation.
- Examine data classification and labeling procedures to verify customer data is tagged and segregated from training-eligible datasets.
- Inspect training pipeline access controls, reviewing role-based permissions and verifying customer data repositories are excluded from training service accounts.
- Review automated scanning or data loss prevention (DLP) tool configurations that detect and block customer code or PII from entering training workflows.
- Select a sample of recent model training runs and trace input datasets back to origin, confirming no customer production environments or repositories were accessed.
- Interview ML engineers and data scientists to confirm awareness of customer data prohibitions and verify adherence to approved data sourcing procedures.
- Test technical enforcement by simulating an attempt to add a customer data source to a training pipeline and verify system rejection or alerting occurs.
Where this control is tested