Skip to main content
โ† All controls
LLM01 OWASP ML Top 10

Prompt-injection defenses (system isolation, allowlists)

Demonstrate that prompt-injection defenses are architecturally implemented and operationally enforced to prevent unauthorized manipulation of AI system behavior through crafted input prompts.

Description

What this control does

Prompt-injection defenses protect AI systems (especially large language models) from malicious manipulation of input prompts that could override system instructions, extract sensitive data, or alter intended behavior. This control employs system isolation (separating user inputs from system prompts using delimiters, separate channels, or schema enforcement) and input allowlisting (restricting characters, patterns, or content types to known-safe constructs). These mechanisms prevent attackers from injecting commands disguised as user data that the AI model would execute as if they were legitimate system instructions. Without these defenses, adversaries can exfiltrate training data, bypass content filters, or repurpose the AI for malicious tasks.

Control objective

What auditing this proves

Demonstrate that prompt-injection defenses are architecturally implemented and operationally enforced to prevent unauthorized manipulation of AI system behavior through crafted input prompts.

Associated risks

Risks this control addresses

  • Attackers inject malicious prompts to override system instructions and exfiltrate sensitive training data or configuration details
  • Adversaries bypass content moderation filters by embedding prohibited instructions within seemingly benign user input
  • Prompt injection causes the AI to execute unintended commands, leading to unauthorized data disclosure or system misuse
  • Attackers exploit lack of input validation to chain prompts and escalate privileges within AI-powered applications
  • Malicious users manipulate AI responses to generate harmful, biased, or legally problematic content that exposes the organization to liability
  • Inadequate separation between user input and system prompts allows attackers to modify application logic or access backend resources
  • Absence of allowlisting permits attackers to inject special tokens, escape sequences, or formatting directives that alter AI behavior

Testing procedure

How an auditor verifies this control

  1. Obtain architectural diagrams and technical documentation describing how user prompts are separated from system instructions in all AI-enabled applications.
  2. Review configuration files and code repositories to identify the specific isolation mechanisms deployed (e.g., role-based message separation, delimiter tokens, dual-channel architectures).
  3. Extract and examine the current input allowlist policies, including permitted character sets, length limits, pattern restrictions, and content-type constraints.
  4. Select a representative sample of AI endpoints or user-facing interfaces and perform controlled prompt-injection testing using known attack patterns (e.g., instruction override attempts, delimiter escape sequences, multi-turn manipulation).
  5. Review application logs and security monitoring alerts to verify that injection attempts trigger detection events and are appropriately blocked or sanitized.
  6. Interview development and security teams to assess awareness of prompt-injection risks and validate that secure coding practices include input validation and prompt construction standards.
  7. Examine change-control records for AI model updates or prompt-template modifications to confirm that injection defenses are reviewed and regression-tested during deployments.
  8. Validate that pre-production testing includes adversarial prompt testing and that results are documented and remediated before production release.
Evidence required Configuration exports showing input validation rules, allowlist definitions, and prompt-isolation settings; code samples or architectural diagrams demonstrating separation of user input from system instructions; security testing reports documenting prompt-injection test cases, results, and remediation actions; application logs capturing blocked or sanitized injection attempts with corresponding alert records; change-control tickets for AI system updates that include security review sign-offs.
Pass criteria All sampled AI endpoints enforce documented prompt-isolation mechanisms, apply current input allowlists that block tested injection patterns, generate monitoring alerts for anomalous prompt behavior, and demonstrate no successful override of system instructions during adversarial testing.

Where this control is tested

Audit programs including this control