Prompt-injection defenses (system isolation, allowlists)
Demonstrate that prompt-injection defenses are architecturally implemented and operationally enforced to prevent unauthorized manipulation of AI system behavior through crafted input prompts.
Description
What this control does
Prompt-injection defenses protect AI systems (especially large language models) from malicious manipulation of input prompts that could override system instructions, extract sensitive data, or alter intended behavior. This control employs system isolation (separating user inputs from system prompts using delimiters, separate channels, or schema enforcement) and input allowlisting (restricting characters, patterns, or content types to known-safe constructs). These mechanisms prevent attackers from injecting commands disguised as user data that the AI model would execute as if they were legitimate system instructions. Without these defenses, adversaries can exfiltrate training data, bypass content filters, or repurpose the AI for malicious tasks.
Control objective
What auditing this proves
Demonstrate that prompt-injection defenses are architecturally implemented and operationally enforced to prevent unauthorized manipulation of AI system behavior through crafted input prompts.
Associated risks
Risks this control addresses
- Attackers inject malicious prompts to override system instructions and exfiltrate sensitive training data or configuration details
- Adversaries bypass content moderation filters by embedding prohibited instructions within seemingly benign user input
- Prompt injection causes the AI to execute unintended commands, leading to unauthorized data disclosure or system misuse
- Attackers exploit lack of input validation to chain prompts and escalate privileges within AI-powered applications
- Malicious users manipulate AI responses to generate harmful, biased, or legally problematic content that exposes the organization to liability
- Inadequate separation between user input and system prompts allows attackers to modify application logic or access backend resources
- Absence of allowlisting permits attackers to inject special tokens, escape sequences, or formatting directives that alter AI behavior
Testing procedure
How an auditor verifies this control
- Obtain architectural diagrams and technical documentation describing how user prompts are separated from system instructions in all AI-enabled applications.
- Review configuration files and code repositories to identify the specific isolation mechanisms deployed (e.g., role-based message separation, delimiter tokens, dual-channel architectures).
- Extract and examine the current input allowlist policies, including permitted character sets, length limits, pattern restrictions, and content-type constraints.
- Select a representative sample of AI endpoints or user-facing interfaces and perform controlled prompt-injection testing using known attack patterns (e.g., instruction override attempts, delimiter escape sequences, multi-turn manipulation).
- Review application logs and security monitoring alerts to verify that injection attempts trigger detection events and are appropriately blocked or sanitized.
- Interview development and security teams to assess awareness of prompt-injection risks and validate that secure coding practices include input validation and prompt construction standards.
- Examine change-control records for AI model updates or prompt-template modifications to confirm that injection defenses are reviewed and regression-tested during deployments.
- Validate that pre-production testing includes adversarial prompt testing and that results are documented and remediated before production release.
Where this control is tested