Prompt-injection corpus tested
Demonstrate that the organization maintains and regularly applies a comprehensive prompt-injection test corpus to AI models to identify and remediate instruction-bypass vulnerabilities before production use.
Description
What this control does
This control verifies that AI language models and conversational agents are systematically tested against a curated corpus of prompt-injection attack patterns before deployment and on a recurring basis. The testing corpus includes techniques such as role-switching, instruction override, delimiter confusion, encoding abuse, and multi-turn manipulation designed to bypass system instructions or exfiltrate restricted data. Organizations maintain a versioned library of adversarial prompts, execute automated and manual testing against candidate models, and track remediation of identified vulnerabilities through model tuning, input sanitization, or output filtering.
Control objective
What auditing this proves
Demonstrate that the organization maintains and regularly applies a comprehensive prompt-injection test corpus to AI models to identify and remediate instruction-bypass vulnerabilities before production use.
Associated risks
Risks this control addresses
- Attackers use prompt injection to bypass system instructions and extract sensitive training data, internal configurations, or user data
- Malicious users manipulate model behavior to generate harmful, misleading, or unauthorized content that damages organizational reputation
- Adversarial prompts cause the model to ignore content filtering, safety guardrails, or access control policies
- Injection attacks enable unauthorized privilege escalation or execution of actions outside the intended scope of the AI application
- Lack of systematic testing results in unknown vulnerabilities that persist into production environments
- Multi-turn conversation techniques gradually erode system constraints, leading to unintended disclosures over extended sessions
- Encoding obfuscation or non-English instruction injection bypasses detection mechanisms designed for straightforward attacks
Testing procedure
How an auditor verifies this control
- Request the current version of the organization's prompt-injection test corpus including all attack patterns, test cases, and update history.
- Review the corpus composition to verify coverage of known injection techniques including role-switching, delimiter attacks, instruction override, encoding manipulation, and multi-turn exploits.
- Select a representative sample of AI models or conversational agents currently in production or pre-production.
- Examine testing documentation showing when each sampled model was last tested against the corpus and by whom.
- Execute a subset of high-severity test cases from the corpus against one sampled model to validate that testing procedures are operationally effective.
- Review documented results from prior corpus testing, including identified vulnerabilities, severity ratings, and remediation actions taken.
- Verify that the test corpus is updated on a defined schedule and incorporates newly disclosed prompt-injection techniques from threat intelligence sources.
- Confirm that models with identified injection vulnerabilities are prohibited from production deployment until remediation is validated through re-testing.
Where this control is tested