Synthetic / anonymised data preferred where viable
Demonstrate that the organisation systematically identifies opportunities to replace production sensitive data with synthetic or anonymised alternatives, implements technical controls to enforce this preference, and validates that non-production environments do not contain unmasked sensitive data unless a justified exception exists.
Description
What this control does
This control mandates the preferential use of synthetic, anonymised, or pseudonymised datasets in place of production data containing sensitive or personally identifiable information, particularly in non-production environments such as development, testing, staging, and analytics. Synthetic data is artificially generated to mimic the statistical properties and structure of real data without exposing actual individuals or confidential business information. Anonymisation irreversibly removes or transforms identifiers so that individuals cannot be re-identified, while pseudonymisation replaces identifiers with artificial tokens. By reducing exposure of genuine sensitive data, organisations minimise the risk of breaches, insider threats, and regulatory violations while maintaining operational and testing fidelity.
Control objective
What auditing this proves
Demonstrate that the organisation systematically identifies opportunities to replace production sensitive data with synthetic or anonymised alternatives, implements technical controls to enforce this preference, and validates that non-production environments do not contain unmasked sensitive data unless a justified exception exists.
Associated risks
Risks this control addresses
- Unauthorised access to production data copied into less-secure non-production environments leads to data exfiltration or breach
- Developers or testers with excessive access to real customer data use it for unauthorised purposes or inadvertently disclose it
- Third-party vendors or contractors gain access to genuine sensitive data through testing or analytics environments
- Ransomware or malware infections in development environments expose production datasets that were copied for testing
- Regulatory penalties arise from processing real personal data without lawful basis in testing, development, or analytics workflows
- Data leakage through inadequate disposal or poor handling of test datasets containing real sensitive information
- Re-identification attacks succeed when anonymisation techniques are insufficient or reversible, compromising privacy assurances
Testing procedure
How an auditor verifies this control
- Review the organisation's data classification and handling policy to identify requirements and guidance on the use of synthetic or anonymised data in non-production contexts.
- Obtain an inventory of all non-production environments (development, test, staging, QA, analytics, training) and the datasets used within each.
- Select a representative sample of non-production environments and examine database schemas, file stores, and data pipelines to determine if synthetic, anonymised, or production data is in use.
- Interview data owners, developers, and data engineers to understand data provisioning workflows and criteria for deciding whether to use synthetic data versus production data.
- Review technical documentation and configuration of data anonymisation or synthetic data generation tools, including libraries, scripts, or commercial platforms deployed.
- Examine access control logs and data lineage records to confirm that sensitive production data is not copied directly to non-production environments without transformation.
- Test a sample of anonymised or synthetic datasets to verify that re-identification is not feasible and that data utility for testing or analytics purposes is preserved.
- Review exception requests and approvals for cases where production data is used in non-production environments, confirming documented business justification, compensating controls, and time-bound access.
Where this control is tested