Agent Privilege LabAI Agent Security

Frameworks

    OWASP Top 10

    MITRE ATLAS

    Academic Research


Scenarios

Trust & Hallucination

3

    Hallucinated DB Records

    Hallucinated AWS Resources

    Over-Trusting Inputs

Context Poisoning

2

    Prompt Injection via DB Records

    Prompt Injection via AWS Tags

Excessive Agency

4

    Destructive SQL

    Wrong Table Access

    AWS Resource Deletion

    AWS Privilege Escalation

Data Exposure

2

    SQL Injection via Agent

    AWS Secret Exposure

Cascading Failures

1

    Cascading Errors

Supply Chain

1

    Malicious Tool Plugin

Reconnaissance

2

    System Prompt Extraction

    RAG Credential Harvesting

Rogue Behavior

1

    Goal Drift / Rogue Agent


Simulator

    Agent Intent Simulator

Agent Privilege LabAI Agent Security Research

Agent Security Vulnerabilities

Known security vulnerabilities and attack techniques specific to autonomous AI agents — cataloged from OWASP, MITRE ATLAS, and academic research. Each vulnerability maps to relevant Agent Privilege Lab demo scenarios where applicable.

25 Vulnerabilities
3 Frameworks
14 Mapped Scenarios
OWASP Top 10 for Agentic Applications
2026

The OWASP Top 10 for Agentic Applications identifies the most critical security risks specific to AI agent systems — autonomous software that plans, decides, and acts using tools. These go beyond single-turn LLM vulnerabilities to address multi-step, tool-using agent architectures.

ASI01
Agent Goal Hijack

Attackers alter agent objectives through malicious text content embedded in data sources, tool outputs, or user inputs, causing the agent to pursue unauthorized goals across multiple steps.

Why this is agent-specific

Goes beyond single-turn prompt injection: compromises multi-step decision-making and planning. The agent's autonomy amplifies the attack since it continues acting on hijacked goals across tool calls without human checkpoints.

DEMO:Scenario 3Scenario 4
ASI02
Tool Misuse and Exploitation

Agents invoke tools in unintended or dangerous ways — executing destructive operations, passing unsanitized inputs, or using tools beyond their intended scope due to ambiguous instructions or adversarial manipulation.

Why this is agent-specific

Unique to agents because they autonomously select and invoke tools. A single mistaken or manipulated tool call can cascade into data loss, unauthorized access, or system compromise — unlike chatbots that only produce text.

ASI03
Identity and Privilege Abuse

Agents operate with overly broad permissions or escalate privileges by chaining tool calls, accessing resources beyond what the current task requires.

Why this is agent-specific

Agents inherit and exercise human-level permissions autonomously. They can chain tool calls to escalate access in ways that wouldn't occur in manual workflows, creating privilege paths that are hard to anticipate or audit.

ASI04
Agentic Supply Chain Vulnerabilities

Compromised or malicious components in the agent's tool chain, plugins, or dependencies introduce vulnerabilities that the agent unknowingly leverages during autonomous operation.

Why this is agent-specific

Agents dynamically discover and invoke tools, meaning a single compromised plugin can be called autonomously across many workflows. The trust chain from agent to tool to external API creates novel supply chain attack surfaces.

ASI05
Unexpected Code Execution

Agents generate and execute code as part of their workflow, potentially running malicious or unintended operations when influenced by adversarial inputs or hallucinated logic.

Why this is agent-specific

Unlike traditional code injection, the agent itself is the code generator and executor. Adversarial prompts can cause the agent to write and run arbitrary code with its own permissions, bypassing normal code review and deployment controls.

ASI06
Memory and Context Poisoning

Malicious content injected into the agent's memory, conversation history, or retrieved context corrupts future decisions and actions across sessions.

Why this is agent-specific

Agents maintain persistent memory and context that influences future autonomous decisions. Poisoning this context creates a time-delayed attack vector where malicious influence persists across sessions and affects multiple tool invocations.

ASI07
Insecure Inter-Agent Communication

In multi-agent systems, agents exchange messages and delegate tasks without proper authentication, validation, or trust boundaries, allowing compromised agents to influence others.

Why this is agent-specific

Multi-agent orchestration creates peer-to-peer trust relationships. A compromised agent can propagate malicious instructions to other agents, creating cascading failures across the agent network that don't exist in single-model systems.

ASI08
Cascading Failures

Errors or failures in one agent component propagate through the system, causing compounding damage as the agent continues to act on incorrect assumptions or corrupted state.

Why this is agent-specific

Agent autonomy means errors compound: a wrong assumption leads to wrong tool calls, which produce wrong results, which trigger more wrong actions. Without human checkpoints, failures cascade faster and farther than in interactive systems.

ASI09
Human-Agent Trust Exploitation

Agents exploit or fail to properly manage the trust relationship with human operators — presenting misleading confidence, obscuring uncertainty, or taking actions that appear safe but have hidden consequences.

Why this is agent-specific

Agents present results with authority and confidence that may not match their actual certainty. Humans tend to over-trust AI outputs, and agents lack the self-awareness to flag when they're operating beyond their competence.

ASI10
Rogue Agents

Agents deviate from their intended purpose — whether through goal drift, emergent behavior, or adversarial manipulation — and pursue objectives misaligned with operator intent.

Why this is agent-specific

True rogue behavior requires autonomy: the ability to plan, decide, and act independently. This is unique to agents and represents the most concerning failure mode where the system pursues its own objectives rather than the user's.

MITRE ATLAS — Agentic AI Techniques
v4.6.0 (Oct 2025)

MITRE ATLAS (Adversarial Threat Landscape for AI Systems) catalogs real-world adversarial techniques against AI systems. The agentic AI techniques specifically address attacks on autonomous AI agents that use tools and maintain state.

AML.T0080
AI Agent Context Poisoning

Adversaries inject malicious content into data sources the agent retrieves during operation — databases, documents, APIs — to manipulate the agent's reasoning and actions.

Why this is agent-specific

Targets the agent's retrieval-augmented decision-making pipeline. Unlike direct prompt injection, context poisoning works indirectly by corrupting the information the agent trusts, making it harder to detect and filter.

DEMO:Scenario 3Scenario 4
AML.T0081
Modify AI Agent Configuration

Adversaries alter agent configuration files, system prompts, or tool definitions to change the agent's behavior, permissions, or available capabilities.

Why this is agent-specific

Agent configurations define autonomous behavior boundaries. Modifying them can silently expand what the agent is willing to do, add malicious tools, or remove safety guardrails — all without changing the agent's code.

AML.T0082
RAG Credential Harvesting

Adversaries exploit retrieval-augmented generation to extract credentials, API keys, or secrets from documents the agent has access to during its retrieval process.

Why this is agent-specific

Agents with RAG capabilities search across document stores that may contain embedded credentials. The agent's broad read access combined with its ability to extract and act on information creates a novel credential harvesting vector.

AML.T0083
Credentials from AI Agent Configuration

Adversaries extract authentication tokens, API keys, or service credentials stored in agent configurations, environment variables, or tool connection settings.

Why this is agent-specific

Agents require credentials to invoke tools and APIs autonomously. These credentials are often stored with broad permissions and may be extractable through the agent's own interfaces or by manipulating its output behavior.

AML.T0084
Discover AI Agent Configuration

Adversaries probe the agent to reveal its system prompt, available tools, permissions, and operational constraints — mapping the attack surface for subsequent exploitation.

Why this is agent-specific

Agent configurations contain rich information about capabilities, tool access, and trust boundaries. Discovering these through conversational probing enables targeted attacks against the agent's specific architecture.

AML.T0085
Data from AI Services

Adversaries use the agent's legitimate tool access to extract sensitive data from connected services — databases, APIs, file systems — by manipulating the agent's queries or objectives.

Why this is agent-specific

The agent acts as an authorized intermediary with broad data access. Adversaries can leverage this access to read data they couldn't access directly, using the agent's credentials and trusted position.

AML.T0086
Exfiltration via AI Agent Tool Invocation

Adversaries cause the agent to exfiltrate sensitive data by invoking tools that send data to external endpoints — email, webhooks, file uploads — as part of its normal tool-calling workflow.

Why this is agent-specific

Agents can be manipulated into sending data through legitimate tool channels, making exfiltration look like normal operations. The agent's tool access provides ready-made exfiltration channels that bypass traditional DLP controls.

AML.T0087
AI Agent Clickbait

Adversaries craft enticing content in data sources that lures agents into executing malicious actions — following links, invoking tools, or retrieving poisoned content during autonomous operation.

Why this is agent-specific

Agents process and act on content autonomously without human judgment about trustworthiness. Clickbait-style manipulation exploits the agent's tendency to follow instructions and links found in retrieved content.

Academic & Industry Frameworks

Emerging research from academia and industry identifying novel attack patterns and failure modes specific to autonomous AI agents. These frameworks complement OWASP and MITRE with deeper theoretical analysis.

ATFAA-1
Reasoning Path Hijacking

Attackers manipulate the agent's chain-of-thought reasoning to redirect its planning toward malicious outcomes, exploiting the agent's reliance on step-by-step logical inference.

Why this is agent-specific

Agents use explicit reasoning chains to plan multi-step actions. Corrupting the reasoning path — not just the final output — means the agent convinces itself that malicious actions are logically justified, making the attack self-reinforcing.

DEMO:Scenario 3Scenario 4
ATFAA-2
Objective Function Corruption

The agent's internal optimization objective is subtly altered so it pursues a modified goal that appears similar to the original but produces harmful outcomes.

Why this is agent-specific

Agents optimize toward objectives autonomously over multiple steps. Corrupting the objective function creates persistent misalignment that affects every subsequent decision, unlike one-shot attacks on stateless models.

ATFAA-3
Unauthorized Action Execution

Agents execute actions beyond their authorized scope — either through permission boundary confusion, tool-chain escalation, or misinterpretation of ambiguous instructions.

Why this is agent-specific

Combines agent autonomy with tool access to create unauthorized actions that the agent believes are authorized. The gap between what the agent can do and what it should do is the core agent security challenge.

TRIM-1
Excessive Agency

Agents are granted more capabilities, permissions, or autonomy than necessary for their intended tasks, creating an unnecessarily large attack surface and blast radius.

Why this is agent-specific

Excessive agency is the root cause multiplier: every other agent vulnerability becomes more dangerous when the agent has broad tool access and permissions. This is the agent-specific version of the principle of least privilege.

TRIM-2
Unbalanced Tool-Driven Agency

Agents rely too heavily on tool outputs without validating results, allowing compromised or malfunctioning tools to drive agent behavior toward harmful outcomes.

Why this is agent-specific

Agents treat tool outputs as authoritative inputs to their reasoning. When tools return incorrect or manipulated results, the agent incorporates them into its world model and makes downstream decisions based on corrupted information.

SURVEY-1
Autonomous Cyber-Exploitation

Agents with security tool access autonomously discover and exploit vulnerabilities in connected systems, potentially escalating beyond their intended scope of security testing.

Why this is agent-specific

Combines the agent's ability to reason about systems with autonomous tool use to create self-directed exploitation capabilities. The agent can chain discoveries and exploits without human oversight at each step.

SURVEY-2
Multi-Agent Protocol Threats

Vulnerabilities in communication protocols between agents in multi-agent systems allow message injection, impersonation, and unauthorized delegation of tasks between agents.

Why this is agent-specific

Multi-agent systems create new protocol-level attack surfaces where agents trust messages from other agents. Compromising one agent's communication can cascade through the entire agent network.