GOVERN-1.7 / MEASURE-2.11 / MANAGE-1.1 NIST AI Risk Management Framework

Bias + fairness testing pre-launch

Demonstrate that the organization systematically identifies, measures, and remediates bias and fairness issues in AI/ML systems through structured pre-launch testing before production deployment.

Description

What this control does

Bias and fairness testing pre-launch requires organizations to evaluate AI/ML models for discriminatory outcomes, unintended bias, and fairness issues across protected characteristics (race, gender, age, disability, etc.) before deploying systems into production. Testing involves statistical analysis of model outputs across demographic subgroups, adversarial testing with edge cases, and review of training data representativeness. This control is critical to prevent algorithmic discrimination, regulatory violations (GDPR, ECOA, FHA), and reputational damage from biased automated decision-making in hiring, lending, criminal justice, healthcare, and customer-facing applications.

Control objective

What auditing this proves

Demonstrate that the organization systematically identifies, measures, and remediates bias and fairness issues in AI/ML systems through structured pre-launch testing before production deployment.

Associated risks

Risks this control addresses

Algorithmic discrimination against protected classes resulting in civil rights violations, regulatory penalties, and litigation under anti-discrimination laws
Biased training data propagating historical inequities into automated decisions affecting employment, credit, housing, or legal outcomes
Model drift causing fairness degradation over time when demographic distributions shift but bias testing is not repeated
Reputational damage and customer attrition when biased AI behavior is exposed publicly through media or social networks
Regulatory enforcement actions and fines under GDPR Article 22, EU AI Act, or sector-specific fairness requirements
Inadequate documentation of fairness metrics preventing compliance demonstration during audits or legal discovery
Deployment of models with differential performance across subgroups creating unequal service quality or denial rates

Testing procedure

How an auditor verifies this control

Obtain the organization's AI/ML model inventory and identify systems subject to bias and fairness testing requirements based on use case sensitivity and regulatory scope.
Review the documented bias and fairness testing methodology including selected fairness metrics (demographic parity, equalized odds, disparate impact ratio), protected attributes tested, and acceptance thresholds.
Select a sample of 3-5 AI/ML systems deployed in the last 12 months and retrieve pre-launch bias testing reports, including test data composition, subgroup analysis results, and fairness metric calculations.
Verify that testing included evaluation across relevant protected characteristics (minimally race, gender, age where applicable) and that test datasets contained sufficient representation of minority subgroups.
Examine evidence of remediation actions taken when fairness thresholds were not met, including model retraining, feature engineering changes, post-processing adjustments, or deployment decisions.
Interview data science and compliance personnel to confirm the testing process, approval workflow, and escalation procedures when bias issues are identified during pre-launch evaluation.
Review deployment approval records to confirm that bias and fairness test results were formally reviewed and accepted by appropriate stakeholders (legal, compliance, business owners) before production release.
Assess the completeness of documentation retained for audit purposes, including test scripts, raw results, statistical analysis, remediation decisions, and sign-off records for each deployed model.

Evidence required Collect AI/ML bias testing reports with subgroup performance metrics, fairness statistical calculations, and pass/fail determinations; pre-launch approval records showing compliance and legal review of bias test results; remediation documentation including retraining logs, feature modification records, or deployment delay decisions; testing methodology documents defining protected attributes, fairness metrics, and acceptance thresholds; model inventory with deployment dates and bias testing status.

Pass criteria All sampled AI/ML systems deployed in the last 12 months have documented bias and fairness testing completed pre-launch with defined fairness metrics evaluated across protected characteristics, remediation actions taken or justified when thresholds were not met, and formal approval by compliance stakeholders before production deployment.

Where this control is tested

Audit programs including this control

AI Model Risk & Governance

Whether you train, fine-tune or just consume models, you need governance — inventory, risk classification, human oversight, evaluation.