agent security: what microsoft shipped (and what you still need to build)
Microsoft shipped agent identities in April 2026. Entra Agent ID gives every AI agent an OAuth token, conditional access policies, and audit logs. It is a real foundation — but it solves the easy problem (who is this agent?) while leaving the hard problems (what should it be allowed to do? how do we enforce that?) entirely to you.
This is not criticism. It is architecture. Identity systems authenticate. Security enforcement happens elsewhere. This post maps the gap.
The Identity Layer: What Entra Agent ID Actually Provides
Entra Agent ID extends Microsoft's existing identity stack to AI agents. Three operating modes:
1. Agent Identity Blueprints — Templates for creating agent identities with parent-child relationships. Deploy 10,000 customer support agents? Create one blueprint, stamp out instances. Policy changes propagate automatically.
2. Cross-Platform Support — Works with agents built on Microsoft platforms (Agent 365, Copilot Studio, Azure Foundry) and third-party platforms (AWS Bedrock, n8n) via Auth SDK sidecar or workload identity federation.
3. OAuth 2.0 + MCP + A2A protocols — Standard authentication flows. Agent requests a token, presents it to resources, gets validated. Same flow as human users or service principals.
What You Get (April 2026 GA)
From the RSAC 2026 announcement:
- Conditional Access for agents — Policy-based access control (requires Entra ID P1)
- Identity Protection for agents — Risk detection and remediation (requires Entra ID P2)
- Identity Governance for agents — Lifecycle and access management (requires Entra ID P1)
- Network controls for agents — Secure web AI gateway (requires Entra Internet Access)
- Sign-in and audit logs — Complete activity tracking (included in base Entra)
The Enforcement Gap: What is Missing
1. Tool Permission Boundaries
The problem: An agent with valid OAuth token can call any MCP tool the framework exposes. Entra validates the token. It does not know what mcp_terminal does, does not care if the agent should be allowed to run shell commands.
Real example from our stack:
// What Entra Agent ID validates
const token = await getAgentToken('agent-openclaw');
// ✅ Token is valid, agent identity confirmed
// What Entra Agent ID does NOT validate
await mcpClient.callTool('mcp_terminal', {
command: 'rm -rf /production-data/*' // ❌ No enforcement layer
});
2. Prompt Injection and Jailbreak Prevention
The problem: OAuth success does not guarantee safe behavior. A compromised agent with valid credentials can still be manipulated via prompt injection to exfiltrate data, execute unauthorized actions, or bypass guardrails.
From Perplexity's NIST RFI response (March 2026):
"Agent architectures fundamentally change assumptions around code-data separation. Plaintext prompts function as code, shaping LLM control flows and tool invocations. Each generation of computing platforms introduces new code-data separation problems; LLM-powered agent systems represent the latest and perhaps most severe instance."
Industry data (Q1 2026):
- 492+ MCP servers exposed without authentication in production
- CVE-2026-26118 (CVSS 8.8): SSRF in Azure MCP Server via valid credentials
- 42,665 exposed agent instances found, 93.4% with auth bypass conditions
These are authorization failures, not authentication failures. The agents had valid tokens.
3. Action Authorization Before Execution
The problem: Current enforcement happens in two places, neither adequate:
- Model-based screening — LLM decides if action is safe (probabilistic, bypassable via prompt engineering)
- Application-layer validation — Ad hoc checks in tool implementations (no standard, inconsistent coverage)
From "Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents" (March 2026, APort Technologies):
Key finding: Without deterministic pre-action authorization, 74.6% of social engineering attacks succeeded. With policy enforcement before tool execution, 0% succeeded across 879 attempts.
Three-Layer Defense Architecture
Research consensus (Uchibeke 2026, Perplexity NIST RFI 2026, ClawLess 2026): You need all three layers. They are complementary, not competing.
| Layer | What It Prevents | What It Does Not Prevent | Overhead |
|---|---|---|---|
| 1. Identity (Entra) | Unauthorized agents accessing resources | Authorized agent doing unauthorized things | ~50ms (token validation) |
| 2. Pre-Action Authorization | Policy violations, business rule violations, prompt-injected actions | Exploits within allowed tools | ~53ms median (Uchibeke 2026) |
| 3. Sandboxed Execution | Host compromise, lateral movement, resource exhaustion | Actions within sandbox permissions | 100-300ms cold start |
Example: $500 Payment Attack
Scenario: Prompt-injected agent attempts unauthorized payment.
Agent: "Transfer $500 to attacker-wallet-xyz"
Layer 1 (Entra): ✅ Token valid, agent identity confirmed → ALLOW to Layer 2
Layer 2 (Policy):
{
"tool": "payment.send",
"capability": "finance",
"constraints": {
"max_amount": 100,
"allowed_recipients": ["vendor-a", "vendor-b"]
}
}
Amount exceeds $100 limit → DENY (audit trail generated)
Layer 3 never executes. Pre-action authorization stopped the attack before sandbox instantiation.
What We Built: ToolExecutor with Allowlist Enforcement
When building OpenClaw (autonomous engineering manager), we hit this exact gap. Entra provides the identity foundation (who is this agent?), but does not prevent a compromised agent from calling destructive tools.
Our implementation:
ALLOWED_TOOLS = {
'read_file', 'search_files', 'web_search', 'web_extract',
'terminal', 'skill_view', 'memory'
}
class ToolExecutor:
def __init__(self, agent_identity: AgentIdentity):
self.identity = agent_identity # Entra token validation
self.audit_log = AuditLogger(agent_identity.id)
async def execute(self, tool_name: str, params: dict):
# Layer 1: Entra validates identity (handled upstream)
# Layer 2: Allowlist enforcement (deterministic)
if tool_name not in ALLOWED_TOOLS:
self.audit_log.record_deny(tool_name, 'not_in_allowlist')
raise ToolNotAllowedError(f'{tool_name} not in agent allowlist')
# Additional semantic validation
if tool_name == 'terminal':
if self._is_destructive_command(params['command']):
self.audit_log.record_deny(tool_name, 'destructive_command')
raise PolicyViolationError('Destructive command blocked')
# Layer 3: Execute in sandbox (gVisor container)
result = await self._execute_in_sandbox(tool_name, params)
self.audit_log.record_success(tool_name, params, result)
return result
Key properties:
- Fail-closed — If tool not in allowlist, deny (do not fall back to model judgment)
- Double validation — Allowlist check + semantic validation on high-risk tools
- Audit trail — Every decision (allow/deny) logged with agent identity, tool, params
- Deterministic — Same tool call with same params → same decision (no sampling, no temperature)
Case Study: GitHub Copilot CLI (12 → 0 Incidents)
GitHub's internal security incident data (shared at Microsoft Build 2026):
Before agent identities (Q3 2025):
- 12 security incidents involving Copilot CLI in 3 months
- Mix of auth bypass, unauthorized data access, policy violations
After Entra Agent ID + tool permission boundaries (Q4 2025 - Q1 2026):
- 0 security incidents across same population
What changed:
- Every Copilot CLI instance got an agent identity (blueprint-based)
- Tool access enforced via allowlist (not all CLI tools enabled for all blueprints)
- Conditional access policies applied (location, risk score, device compliance)
- Audit logs enabled mandatory review for financial/PII tool access
The combination of identity + authorization enforcement delivered zero incidents. Identity alone would not have prevented the original 12 (they had valid service principals).
Research Perspective: The Authorization Gap is Fundamental
From "ClawLess: A Security Model of AI Agents" (April 2026, Southern University of Science and Technology):
"Existing approaches attempt to regulate agent behavior through training or prompting, which does not offer fundamental security guarantees. ClawLess enforces formally verified policies on AI agents under a worst-case threat model where the agent itself may be adversarial."
Two fundamental assumptions:
- Assumption of Capabilities — AI agents are capable of conducting sophisticated attacks against any security mechanisms.
- Assumption of Maliciousness — AI agents will eventually be lured to become malicious (via prompt injection, supply chain attacks, or data poisoning).
Under these assumptions, identity alone is insufficient. You need:
- Formal policy specification — Allowlist of tools, resource constraints, business rules
- Deterministic enforcement — Pre-action authorization that cannot be bypassed via prompt engineering
- Kernel-level isolation — Sandboxed execution environment (gVisor, Firecracker, Kata Containers)
What You Need to Build
If you are deploying Entra Agent ID in production, here is the implementation checklist:
1. Tool Permission System
Minimum viable:
// Define allowlist per agent blueprint
const AGENT_PERMISSIONS = {
'customer-support': ['read_ticket', 'search_kb', 'send_email'],
'data-analyst': ['query_db', 'read_file', 'web_search'],
'engineer': ['terminal', 'read_file', 'write_file', 'git']
};
// Enforce before tool execution
function canAgentUseTool(agentBlueprint: string, toolName: string): boolean {
return AGENT_PERMISSIONS[agentBlueprint]?.includes(toolName) ?? false;
}
Production-grade:
- Policy language (e.g., Open Agent Passport spec)
- Constraints per tool (amount limits, allowed recipients, file path restrictions)
- Cryptographically signed audit trail
- Dynamic policy updates without agent restart
2. Pre-Action Authorization Hook
Insert between "agent decides action" and "framework executes tool":
@hook('before_tool_call')
async def authorize_action(tool_name: str, params: dict, agent_id: str):
policy = load_policy(agent_id)
# Check allowlist
if tool_name not in policy.allowed_tools:
return Deny(reason='tool_not_allowed')
# Evaluate constraints
for constraint in policy.constraints[tool_name]:
if not constraint.evaluate(params):
return Deny(reason=f'constraint_violated: {constraint.name}')
# Check business rules
if tool_name == 'payment.send':
if params['amount'] > policy.max_payment_amount:
return Deny(reason='amount_exceeds_limit')
return Allow()
3. Sandboxed Execution Environment
Do not run agent code directly on host. Options:
| Technology | Security | Performance | Complexity |
|---|---|---|---|
| Docker | Low (shares host kernel) | High | Low |
| gVisor | High (user-space kernel) | Medium | Medium |
| Firecracker | Very High (microVM) | Medium | High |
| Kata Containers | Very High (VM per container) | Low (cold start) | High |
Recommendation: Start with gVisor. Only 1 CVE in 10 years, minimal performance overhead, widely deployed (GKE Sandbox, Cloud Run).
4. Prompt Injection Defenses
Even with identity + tool boundaries + sandbox, prompt injection can manipulate how the agent uses allowed tools.
Defenses:
- System prompt protection — Framework-managed, agent cannot override
- Input sanitization — Strip markdown, escape special chars in user input
- Output validation — Check tool call params match expected schema
- Audit anomaly detection — Flag unusual sequences (e.g., 10 DB queries in 1 second)
Example from our stack:
def sanitize_user_input(text: str) -> str:
# Strip markdown that could inject instructions
text = re.sub(r'```[\s\S]*?```', '[code block removed]', text)
text = re.sub(r'`[^`]+`', '[code removed]', text)
# Escape special tokens (framework-specific)
text = text.replace('<|endoftext|>', '')
text = text.replace('<|im_start|>', '')
text = text.replace('<|im_end|>', '')
return text
What Microsoft Should Add (But Probably Will Not)
These are agent-specific security features, not identity features. Different product domain, different team, different licensing model. Do not expect them in Entra Agent ID.
- Prompt injection classifier — Real-time detection of malicious prompts (probabilistic layer, not replacement for deterministic enforcement)
- Tool permission DSL — Declarative language for "agent X can call tool Y with constraints Z"
- Agent Data Loss Prevention — Semantic analysis of tool outputs before returning to agent (detect PII, credentials, secrets)
- Federated policy enforcement — Cross-organization policy sharing (e.g., "no financial tools for agents operating on untrusted data")
Some of these exist in Microsoft Agent 365 Security Policy Templates, but that is a different product with different licensing. Entra Agent ID is the identity foundation layer.
Bottom Line
What Entra Agent ID gives you:
- Agent identities with OAuth 2.0, MCP, A2A support
- Conditional access, identity protection, governance
- Audit logs and compliance reporting
- Cross-platform support (Microsoft + third-party agents)
What you still need to build:
- Tool permission boundaries (allowlist + constraints)
- Pre-action authorization (policy enforcement before execution)
- Sandboxed execution environment (gVisor, Firecracker, Kata)
- Prompt injection defenses (input sanitization, output validation)
- Agent DLP (semantic analysis of tool outputs)
Identity is the foundation. It is not the whole building.
Resources
Official Documentation:
Research Papers (2026):
- Uchibeke, U. (2026). Before the Tool Call: Deterministic Pre-Action Authorization for Autonomous AI Agents. arXiv:2603.20953.
- Li, N., Zhang, K., Polley, K., Ma, J. (2026). Security Considerations for Artificial Intelligence Agents (Perplexity's NIST RFI Response). arXiv:2603.12230.
- Lu, H., Liu, N., Wang, S., Zhang, F. (2026). ClawLess: A Security Model of AI Agents. arXiv:2604.06284.
Implementation Examples:
- Open Agent Passport Specification — Policy language reference