Agent Privilege LabAI Agent Security

Frameworks

    OWASP Top 10

    MITRE ATLAS

    Academic Research


Scenarios

Trust & Hallucination

3

    Hallucinated DB Records

    Hallucinated AWS Resources

    Over-Trusting Inputs

Context Poisoning

2

    Prompt Injection via DB Records

    Prompt Injection via AWS Tags

Excessive Agency

4

    Destructive SQL

    Wrong Table Access

    AWS Resource Deletion

    AWS Privilege Escalation

Data Exposure

2

    SQL Injection via Agent

    AWS Secret Exposure

Cascading Failures

1

    Cascading Errors

Supply Chain

1

    Malicious Tool Plugin

Reconnaissance

2

    System Prompt Extraction

    RAG Credential Harvesting

Rogue Behavior

1

    Goal Drift / Rogue Agent


Simulator

    Agent Intent Simulator

Agent Privilege LabAI Agent Security Research

Agent Privilege Lab

Interactive demos of real-world AI agent failure modes — and how to prevent them.
16 Scenarios
8 Categories
Trust & Hallucination

Agent fabricates data or blindly trusts unverified inputs, presenting fiction as fact.

    Hallucinated DB Records

    Hallucinated AWS Resources

    Over-Trusting Inputs

Context Poisoning

Malicious content in data sources hijacks the agent's reasoning and actions.

    Prompt Injection via DB Records

    Prompt Injection via AWS Tags

Excessive Agency

Agent performs actions beyond what was requested — destructive operations, unauthorized access, or over-fetching data.

    Destructive SQL

    Wrong Table Access

    AWS Resource Deletion

    AWS Privilege Escalation

Data Exposure

Agent leaks sensitive data — credentials, secrets, or private information — through unsafe queries or outputs.

    SQL Injection via Agent

    AWS Secret Exposure

Cascading Failures

Errors compound as the agent continues acting on incorrect assumptions without human checkpoints.

    Cascading Errors

Supply Chain

Compromised tools or plugins inject malicious behavior into the agent's workflow.

    Malicious Tool Plugin

Reconnaissance

Attackers extract the agent's configuration, credentials, or internal knowledge to enable targeted attacks.

    System Prompt Extraction

    RAG Credential Harvesting

Rogue Behavior

Agent drifts from its intended purpose, progressively escalating actions beyond user intent.

    Goal Drift / Rogue Agent