Skip to main content
← All controls
GDPR Art. 6 / ISO 27701:2019 7.2.2 / NIST AI RMF GOVERN 1.5 GDPR Articles 6 & 9 / ISO/IEC 27701:2019

Lawful basis recorded for training / fine-tune data

Demonstrate that the organization has identified, documented, and can substantiate the lawful basis for all personal or proprietary data used in AI training and fine-tuning activities, in compliance with applicable data protection regulations.

Description

What this control does

This control requires organizations to document and maintain records of the lawful basis (e.g., consent, legitimate interest, contractual necessity, legal obligation) for collecting, processing, and using data in AI model training and fine-tuning activities. The organization must map each dataset or data source to its corresponding legal justification under applicable privacy regulations (GDPR, CCPA, etc.) and retain evidence of rights assessments prior to processing. This control is critical because unlawful data processing exposes the organization to regulatory fines, litigation, reputational damage, and potential model takedown orders.

Control objective

What auditing this proves

Demonstrate that the organization has identified, documented, and can substantiate the lawful basis for all personal or proprietary data used in AI training and fine-tuning activities, in compliance with applicable data protection regulations.

Associated risks

Risks this control addresses

  • Unauthorized processing of personal data leading to regulatory enforcement actions and financial penalties under GDPR, CCPA, or other privacy laws
  • Use of copyrighted, proprietary, or confidential data without legal right, resulting in intellectual property infringement claims and litigation
  • Inability to respond to data subject access requests (DSARs) or erasure requests due to lack of data provenance records
  • Training data obtained through terms of service violations or web scraping in violation of anti-circumvention laws, exposing the organization to civil and criminal liability
  • Reputational harm and loss of customer trust from public disclosure of unlawful data processing practices in AI development
  • Deployment of AI models trained on unlawfully obtained data, leading to injunctions, mandatory model retraining, or service suspension
  • Inadequate documentation preventing defensible legal position during regulatory investigations or third-party audits

Testing procedure

How an auditor verifies this control

  1. Obtain the inventory of all datasets and data sources used for AI model training and fine-tuning activities within the audit scope period
  2. Review the organization's data processing register or data lineage documentation linking each training dataset to its documented lawful basis
  3. Select a representative sample of 10-15 training datasets spanning different data types (personal data, public data, licensed data, synthetic data)
  4. For each sampled dataset, verify the presence of lawful basis documentation such as consent records, legitimate interest assessments (LIAs), data processing agreements, licensing contracts, or public domain certifications
  5. Examine evidence that legal or privacy teams reviewed and approved the lawful basis determination prior to data ingestion for training purposes
  6. Review any Data Protection Impact Assessments (DPIAs) or privacy reviews conducted for high-risk training data processing activities
  7. Verify that records include data retention periods, processing purposes, and rights fulfillment mechanisms aligned with the claimed lawful basis
  8. Test a sample of data subject requests (access, erasure, objection) related to training data to confirm the organization can identify and action requests based on recorded lawful basis
Evidence required Collect data processing registers or data lineage documentation mapping training datasets to lawful bases; copies of consent management records, legitimate interest assessments, data processing agreements, or software licenses; dated approvals from legal/privacy teams for training data use; completed Data Protection Impact Assessments for high-risk processing; evidence of data subject request handling procedures and logs demonstrating traceability from requests to training datasets.
Pass criteria All sampled training and fine-tuning datasets have documented lawful bases recorded prior to processing, supported by appropriate evidence artifacts, with approval from legal or privacy functions and mechanisms in place to honor data subject rights.

Where this control is tested

Audit programs including this control