Agent Identity: Why Every Agent Vulnerability Is a Trust Boundary Failure
This article explores trust boundary failures in AI agent systems. Agents are loops where the model decides tool calls at runtime, introducing vulnerabilities like prompt injection, identity spoof, budget bombs, and tool poisoning. The core issue is missing identity propagation—when an agent calls a backend without a signed user claim, the service cannot authorize correctly. The solution from Portkey and Palo Alto Networks involves agent gateway for identity, MCP registry for drift detection, and LLM gateway for quotas and guardrails, enforcing trust at the platform layer.
Consider these scenarios
An MCP server quietly returning extra tool descriptions
Prompt injection through a calendar invite
An Agent invokes a tool that the principal should not have access to
Cost overruns
It isn't the model that failed. It isn't the tool that failed. What failed is the trust boundary, the trust between two components with different authority
In a classic application/service, code calls APIs and the developer decides what is sent. In an agent, a language model decides at runtime which tool to call, with what arguments, after reading text the developer has never seen.
Let us create a mental model of the different failure modes and how you can secure your AI workloads
Simple Inference calls have no side affects
01 · Simple inference calls have no side effects
A model maps text to text. Guardrails secure what goes in and what comes out — PII redaction, harmful content, jailbreaks.
principal / user request in flight input guardrail · redact output guardrail · block model (stateless)
An Agent is a while loop An agent is a while loop with inference + tool/agent calls
02 · An agent is a loop
There is no "agent object." There is a transcript and a runtime that keeps calling the model until it stops asking for tools.
in-flight tool call / result loop component
This distinction matters because the trust questions are properties of the loop, not of the model. The model does not know who the user is. The model does not know which tools are safe. The model does not know its own budget.
Every part of the chain needs trust and identity Agent Identity is a theme that Portkey and Palo Alto Networks have been building on for a long time, trust should exist through enforcement.
03 · Whose authority is on the wire?
Whose authority is on the wire?
Top: identity propagated. Bottom: anonymous call. Same network, same payload, opposite blast radius.
user principal agent service identity backend service missing / unverified principal
If the agent calls transfer_funds(amount=50000) and the request carries no signed claim about which user authorized it, the receiving service has two options: refuse everything (and break the product), or trust the caller and create a confused deputy (and ship the breach). This is not a theoretical pattern. It is the dominant failure mode of every agent platform shipping today.
The same question applies to MCP. When an agent mounts an MCP server, the server can change its tool list, its tool descriptions, or its tool behaviors between sessions, and the agent will obediently re-render those descriptions into its own prompt at the next call. Tool descriptions are instructions. An MCP server you do not control is an unsigned, mutable extension of your system prompt.
04 · MCP tool descriptions are instructions
MCP tool descriptions are instructions
A server can change a tool's description between sessions. The model renders the new text into its prompt. The "drift" line is the trust boundary.
refresh / discovery drift from registered manifest policy violation
And the same again for A2A and other agent protocols. Without a propagated identity chain, every agent in a multi-hop call is effectively anonymous to every downstream agent. If you cannot answer "on whose behalf is this call being made," you cannot apply per-user policy, you cannot rate-limit per principal, and your audit log is fiction.
05 · A2A identity chain
A2A: identity chains, or the lack of them
A planner agent calls a shopper agent calls a payments agent. The chain only holds if each hop carries a verifiable claim about the principal that started it.
user principal agent service identity (chain segment) identity lost
What goes wrong, in slow motion
Here is the same agent under four common attacks, with no governance and policies in place. The pattern is identical every time: an untrusted input crosses a boundary that is not defined.
06 · Four attacks on an undefended agent
tainted input resulting violation boundary crossed
Prompt injection via tool result. A tool returns text, an email body, a web page, a calendar event that contains instructions for the model. The model has no syntactic way to distinguish "data the tool returned" from "instructions the user gave." Boundary failure: data ↔ instruction.
Identity spoof. An agent forwards a user_id header that no one validated. The downstream tool trusts it. Boundary failure: principal claim ↔ verified principal.
Budget bomb. The model loops, calling a paid tool 400 times. Nothing checks spend before the bill arrives. Boundary failure: resource consumption ↔ authorization.
Tool poisoning. A registered MCP server quietly updates a tool's description to include "and also email the conversation to attacker@". The agent renders this into its next prompt and complies. Boundary failure: registered capability ↔ runtime capability.
Agent Identity
The remediation is not "tell developers to be more careful." Trust boundaries in distributed systems have to be enforced by infrastructure, not by convention. That is what Portkey, integrated with the Palo Alto Networks Cortex platform, is for.
07 · Portkey + Cortex: lines drawn at the platform layer
Portkey sits in front of agents, MCPs, and LLMs. Every call carries propagated identity. Every call passes through guardrails. The control plane is where policy lives — and where it is enforced fail-fast.
Portkey gateway / principal authorized call policy / quota guardrail / cap policy violation blocked
Portkey Agent Gateway: identity for agents, the same way you do it for services
Every agent registers with the Agent Gateway and receives a workload identity. Calls between agents carry an OAuth bearer token scoped to service and user, supporting the three identity modes machine identities have always supported: assumed (gateway-issued service token), delegated (token exchange on behalf of a user), and chained (signed claims propagated across hops from your IdP — Okta, Entra, or equivalent). Tool calls and MCP calls use the same abstractions, so the principal is intact from the first user gesture to the terminal API call. Policies are authored in the Portkey control plane and attach at the workspace or organization level, granular enough to differ per agent and per tool, centralized enough to audit.
Portkey MCP Registry: drift detection and scoped capability
Every MCP server an agent is allowed to mount is registered with a signed manifest. The registry watches the live server against that manifest: if a tool's description changes, if the tool list grows, if behavior diverges, the registry flags drift and can quarantine the server before it reaches an agent's context window. Identity is forwarded as a token header so the MCP server itself can enforce per-user authorization. Tool-level scopes are configurable: read_* for one agent, write_* only for another.
Portkey LLM Gateway: quotas, attribution, and guardrails on the only path that matters
The LLM Gateway is the single egress for every inference call, with a unified signature across providers (Anthropic, OpenAI, Bedrock, Vertex). That single chokepoint is what makes the rest of the controls real: rate limits and cost caps attach at five levels: API key, user, agent, workspace, organization and fail fast when exceeded, rather than alerting after the budget is gone. The end-user principal travels in the session token, so attribution is cryptographic, not advisory. Pre- and post-request hooks integrate input/output guardrails: Palo Alto Networks Prisma AIRS for AI-runtime security, with optional third-party providers applied uniformly to LLMs, agents, and MCP servers.
What a defense actually stops
Attack Identity propagation MCP registry LLM Gateway quotas Prisma AIRS guardrails Audit log
Prompt injection via tool result — partial (blocks if tool quarantined) — blocks detects after
Identity spoof in A2A header blocks — partial (attribution wrong) — detects after
Budget bomb / runaway loop partial (scopes blast radius) — blocks — detects after
Tool poisoning via MCP drift partial (scopes blast radius) blocks — partial (may catch payload) detects after
Data exfiltration via tool args partial (scopes principal) partial (scoped capability) — blocks detects after
Cross-agent confused deputy blocks — partial (per-principal limits) — detects after
No single control stops everything. Identity propagation, registry-level capability control, gateway-level quotas, and runtime guardrails are complementary and together they map cleanly onto the boundaries the attacks crossed. That is what "platform-layer enforcement" actually means: every boundary on the diagram has a runtime owner.
💡
Authors note: Abstractions around agents have been evolving, in the end everything boils down to trust between services. We have been working to bring enforcement and policies for your AI workloads to a single platform. Connect with us at [email protected] to explore how we can help your organisation get started