GOVERN-1.7 / MEASURE-2.7 NIST AI Risk Management Framework 1.0

Prompt-injection corpus tested

Demonstrate that the organization maintains and regularly applies a comprehensive prompt-injection test corpus to AI models to identify and remediate instruction-bypass vulnerabilities before production use.

Description

What this control does

This control verifies that AI language models and conversational agents are systematically tested against a curated corpus of prompt-injection attack patterns before deployment and on a recurring basis. The testing corpus includes techniques such as role-switching, instruction override, delimiter confusion, encoding abuse, and multi-turn manipulation designed to bypass system instructions or exfiltrate restricted data. Organizations maintain a versioned library of adversarial prompts, execute automated and manual testing against candidate models, and track remediation of identified vulnerabilities through model tuning, input sanitization, or output filtering.

Control objective

What auditing this proves

Associated risks

Risks this control addresses

Attackers use prompt injection to bypass system instructions and extract sensitive training data, internal configurations, or user data
Malicious users manipulate model behavior to generate harmful, misleading, or unauthorized content that damages organizational reputation
Adversarial prompts cause the model to ignore content filtering, safety guardrails, or access control policies
Injection attacks enable unauthorized privilege escalation or execution of actions outside the intended scope of the AI application
Lack of systematic testing results in unknown vulnerabilities that persist into production environments
Multi-turn conversation techniques gradually erode system constraints, leading to unintended disclosures over extended sessions
Encoding obfuscation or non-English instruction injection bypasses detection mechanisms designed for straightforward attacks

Testing procedure

How an auditor verifies this control

Request the current version of the organization's prompt-injection test corpus including all attack patterns, test cases, and update history.
Review the corpus composition to verify coverage of known injection techniques including role-switching, delimiter attacks, instruction override, encoding manipulation, and multi-turn exploits.
Select a representative sample of AI models or conversational agents currently in production or pre-production.
Examine testing documentation showing when each sampled model was last tested against the corpus and by whom.
Execute a subset of high-severity test cases from the corpus against one sampled model to validate that testing procedures are operationally effective.
Review documented results from prior corpus testing, including identified vulnerabilities, severity ratings, and remediation actions taken.
Verify that the test corpus is updated on a defined schedule and incorporates newly disclosed prompt-injection techniques from threat intelligence sources.
Confirm that models with identified injection vulnerabilities are prohibited from production deployment until remediation is validated through re-testing.

Evidence required Collect the versioned prompt-injection test corpus document with metadata showing creation date, last update, and coverage of attack categories. Obtain test execution logs or reports for sampled models showing test date, test cases executed, pass/fail status, and vulnerability findings. Gather change control or deployment approval records demonstrating that vulnerability remediation was completed and re-tested before production release.

Pass criteria The organization maintains a documented, regularly updated prompt-injection test corpus covering major attack vectors, applies it systematically to all AI models before production deployment, documents testing results with identified vulnerabilities, and requires remediation validation before release.

Where this control is tested

Audit programs including this control

AI Red Team & Adversarial Testing

You can not trust an AI system you have not tried to break. Audit of red-team capability covering…