2026-06-28 17:11 UTCIn-site rewrite4 min readUpdated: 2026-06-28 17:21 UTC

Agent Identity: Why Every Agent Vulnerability Is a Trust Boundary Failure

This article explores trust boundary failures in AI agent systems. Agents are loops where the model decides tool calls at runtime, introducing vulnerabilities like prompt injection, identity spoof, budget bombs, and tool poisoning. The core issue is missing identity propagation—when an agent calls a backend without a signed user claim, the service cannot authorize correctly. The solution from Portkey and Palo Alto Networks involves agent gateway for identity, MCP registry for drift detection, and LLM gateway for quotas and guardrails, enforcing trust at the platform layer.

SourceHacker News AIAuthor: segalord

Consider these scenarios

An MCP server quietly returning extra tool descriptions

Prompt injection through a calendar invite

An Agent invokes a tool that the principal should not have access to

Cost overruns

It isn't the model that failed. It isn't the tool that failed. What failed is the trust boundary, the trust between two components with different authority

In a classic application/service, code calls APIs and the developer decides what is sent. In an agent, a language model decides at runtime which tool to call, with what arguments, after reading text the developer has never seen.

Let us create a mental model of the different failure modes and how you can secure your AI workloads

Simple Inference calls have no side affects

01 · Simple inference calls have no side effects

A model maps text to text. Guardrails secure what goes in and what comes out — PII redaction, harmful content, jailbreaks.

principal / user request in flight input guardrail · redact output guardrail · block model (stateless)

An Agent is a while loop An agent is a while loop with inference + tool/agent calls

02 · An agent is a loop

There is no "agent object." There is a transcript and a runtime that keeps calling the model until it stops asking for tools.

in-flight tool call / result loop component

This distinction matters because the trust questions are properties of the loop, not of the model. The model does not know who the user is. The model does not know which tools are safe. The model does not know its own budget.

Every part of the chain needs trust and identity Agent Identity is a theme that Portkey and Palo Alto Networks have been building on for a long time, trust should exist through enforcement.

03 · Whose authority is on the wire?

Whose authority is on the wire?

Top: identity propagated. Bottom: anonymous call. Same network, same payload, opposite blast radius.

user principal agent service identity backend service missing / unverified principal

If the agent calls transfer_funds(amount=50000) and the request carries no signed claim about which user authorized it, the receiving service has two options: refuse everything (and break the product), or trust the caller and create a confused deputy (and ship the breach). This is not a theoretical pattern. It is the dominant failure mode of every agent platform shipping today.

The same question applies to MCP. When an agent mounts an MCP server, the server can change its tool list, its tool descriptions, or its tool behaviors between sessions, and the agent will obediently re-render those descriptions into its own prompt at the next call. Tool descriptions are instructions. An MCP server you do not control is an unsigned, mutable extension of your system prompt.

04 · MCP tool descriptions are instructions

MCP tool descriptions are instructions

A server can change a tool's description between sessions. The model renders the new text into its prompt. The "drift" line is the trust boundary.

refresh / discovery drift from registered manifest policy violation

And the same again for A2A and other agent protocols. Without a propagated identity chain, every agent in a multi-hop call is effectively anonymous to every downstream agent. If you cannot answer "on whose behalf is this call being made," you cannot apply per-user policy, you cannot rate-limit per principal, and your audit log is fiction.

05 · A2A identity chain

A2A: identity chains, or the lack of them

A planner agent calls a shopper agent calls a payments agent. The chain only holds if each hop carries a verifiable claim about the principal that started it.

user principal agent service identity (chain segment) identity lost

What goes wrong, in slow motion

Here is the same agent under four common attacks, with no governance and policies in place. The pattern is identical every time: an untrusted input crosses a boundary that is not defined.

06 · Four attacks on an undefended agent

tainted input resulting violation boundary crossed

Prompt injection via tool result. A tool returns text, an email body, a web page, a calendar event that contains instructions for the model. The model has no syntactic way to distinguish "data the tool returned" from "instructions the user gave." Boundary failure: data ↔ instruction.

Identity spoof. An agent forwards a user_id header that no one validated. The downstream tool trusts it. Boundary failure: principal claim ↔ verified principal.

Budget bomb. The model loops, calling a paid tool 400 times. Nothing checks spend before the bill arrives. Boundary failure: resource consumption ↔ authorization.

Tool poisoning. A registered MCP server quietly updates a tool's description to include "and also email the conversation to attacker@". The agent renders this into its next prompt and complies. Boundary failure: registered capability ↔ runtime capability.

Agent Identity

The remediation is not "tell developers to be more careful." Trust boundaries in distributed systems have to be enforced by infrastructure, not by convention. That is what Portkey, integrated with the Palo Alto Networks Cortex platform, is for.

07 · Portkey + Cortex: lines drawn at the platform layer

Portkey sits in front of agents, MCPs, and LLMs. Every call carries propagated identity. Every call passes through guardrails. The control plane is where policy lives — and where it is enforced fail-fast.

Portkey gateway / principal authorized call policy / quota guardrail / cap policy violation blocked

Portkey Agent Gateway: identity for agents, the same way you do it for services

Every agent registers with the Agent Gateway and receives a workload identity. Calls between agents carry an OAuth bearer token scoped to service and user, supporting the three identity modes machine identities have always supported: assumed (gateway-issued service token), delegated (token exchange on behalf of a user), and chained (signed claims propagated across hops from your IdP — Okta, Entra, or equivalent). Tool calls and MCP calls use the same abstractions, so the principal is intact from the first user gesture to the terminal API call. Policies are authored in the Portkey control plane and attach at the workspace or organization level, granular enough to differ per agent and per tool, centralized enough to audit.

Portkey MCP Registry: drift detection and scoped capability

Every MCP server an agent is allowed to mount is registered with a signed manifest. The registry watches the live server against that manifest: if a tool's description changes, if the tool list grows, if behavior diverges, the registry flags drift and can quarantine the server before it reaches an agent's context window. Identity is forwarded as a token header so the MCP server itself can enforce per-user authorization. Tool-level scopes are configurable: read_* for one agent, write_* only for another.

Portkey LLM Gateway: quotas, attribution, and guardrails on the only path that matters

The LLM Gateway is the single egress for every inference call, with a unified signature across providers (Anthropic, OpenAI, Bedrock, Vertex). That single chokepoint is what makes the rest of the controls real: rate limits and cost caps attach at five levels: API key, user, agent, workspace, organization and fail fast when exceeded, rather than alerting after the budget is gone. The end-user principal travels in the session token, so attribution is cryptographic, not advisory. Pre- and post-request hooks integrate input/output guardrails: Palo Alto Networks Prisma AIRS for AI-runtime security, with optional third-party providers applied uniformly to LLMs, agents, and MCP servers.

What a defense actually stops

Attack Identity propagation MCP registry LLM Gateway quotas Prisma AIRS guardrails Audit log

Prompt injection via tool result — partial (blocks if tool quarantined) — blocks detects after

Identity spoof in A2A header blocks — partial (attribution wrong) — detects after

Budget bomb / runaway loop partial (scopes blast radius) — blocks — detects after

Tool poisoning via MCP drift partial (scopes blast radius) blocks — partial (may catch payload) detects after

Data exfiltration via tool args partial (scopes principal) partial (scoped capability) — blocks detects after

Cross-agent confused deputy blocks — partial (per-principal limits) — detects after

No single control stops everything. Identity propagation, registry-level capability control, gateway-level quotas, and runtime guardrails are complementary and together they map cleanly onto the boundaries the attacks crossed. That is what "platform-layer enforcement" actually means: every boundary on the diagram has a runtime owner.

💡

Authors note: Abstractions around agents have been evolving, in the end everything boils down to trust between services. We have been working to bring enforcement and policies for your AI workloads to a single platform. Connect with us at [email protected] to explore how we can help your organisation get started