Skip to main content
โ† All controls
GOVERN-1.3 / MEASURE-2.7 NIST AI Risk Management Framework

Eval harness for safety + grounding

Demonstrate that AI models undergo continuous, automated evaluation against safety policies and factual grounding requirements before and after deployment, with failing outputs triggering remediation workflows.

Description

What this control does

An evaluation harness for safety and grounding validates that AI model outputs adhere to safety policies (preventing harmful, biased, or prohibited content) and remain grounded in authoritative source data rather than hallucinating facts. This control involves automated test suites that systematically prompt models with adversarial inputs, edge cases, and factual queries, then scoring outputs against safety rubrics and factual corpora. The harness runs continuously during model development, fine-tuning, and post-deployment to detect regression or drift in model behavior. Implementation typically includes red-team prompt libraries, ground-truth datasets, scoring algorithms, and automated gates that block unsafe model versions from production.

Control objective

What auditing this proves

Demonstrate that AI models undergo continuous, automated evaluation against safety policies and factual grounding requirements before and after deployment, with failing outputs triggering remediation workflows.

Associated risks

Risks this control addresses

  • Deployment of models that generate harmful, discriminatory, or offensive content due to lack of pre-release safety testing
  • Model hallucination producing factually incorrect information that damages user trust or leads to erroneous decisions
  • Adversarial prompt injection bypassing safety guardrails because evaluation coverage omits edge cases or jailbreak techniques
  • Model drift over time degrading safety or grounding performance without detection due to absence of continuous monitoring
  • Unauthorized model updates reaching production without validation, introducing untested behavioral changes
  • Inconsistent evaluation standards across model versions allowing regressions to pass quality gates
  • Insufficient logging of evaluation failures preventing root-cause analysis and remediation of recurring safety issues

Testing procedure

How an auditor verifies this control

  1. Obtain the current evaluation harness configuration files, including test prompt libraries, safety rubrics, grounding datasets, and scoring thresholds
  2. Review the prompt library inventory to verify coverage of adversarial inputs (jailbreak attempts, bias elicitation), prohibited content categories, and factual queries spanning core use cases
  3. Select a sample of five recent model evaluation runs and retrieve the automated test execution logs, including pass/fail rates, output samples, and score distributions
  4. Trace one failed evaluation from detection through remediation, verifying that the failing model version was blocked from production and a corrective action was documented
  5. Interview the responsible AI team to confirm the frequency of harness execution (pre-commit, pre-release, post-deployment) and escalation procedures for failures
  6. Execute a live test by submitting three adversarial prompts from the red-team library to the production model and comparing outputs against expected safety behavior
  7. Review change control records for the last three model deployments to confirm evaluation harness approval gates were enforced
  8. Verify that evaluation metrics (safety violation rate, hallucination rate, grounding accuracy) are reported to governance stakeholders at least monthly
Evidence required Collect evaluation harness configuration files, prompt library catalogs with version timestamps, automated test execution logs for the past quarter showing pass/fail outcomes and score distributions, change control records linking model deployments to evaluation approvals, screenshots of the governance dashboard displaying trending safety and grounding metrics, and interview notes documenting harness execution frequency and escalation workflows.
Pass criteria The evaluation harness executes automatically before every model deployment, blocks models failing defined safety or grounding thresholds from production, and generates traceable evidence of all test runs with documented remediation for failures within the past 90 days.

Where this control is tested

Audit programs including this control