The Missing Layer in AI Infrastructure
The AI infrastructure stack has compute, orchestration, memory, and observability. But there's one critical layer that's still missing: identity and permission control for autonomous agents.
The modern AI infrastructure stack has never been richer. We have:
- Compute: GPU clouds at every price point
- Orchestration: LangGraph, CrewAI, AutoGen for multi-agent workflows
- Memory: Vector databases (Pinecone, Weaviate, Chroma) for long-term agent context
- Observability: LangSmith, Langfuse for tracing agent decisions
- Guardrails: Prompt injection detection, output filtering
Companies have raised hundreds of millions to fill each of these layers. The tooling is mature and battle-tested.
But there's one layer that nobody has built yet. The layer that sits between the agent and the action.
The Stack Has a Gap#
Here's the modern agentic workflow:
User Intent
↓
LLM (reasoning)
↓
Orchestration (LangGraph)
↓
Tool Call
↓
[MISSING: Identity & Permission Gate]
↓
Production System (Stripe, CRM, APIs)
We've built everything around the agent. We haven't built the layer between the agent and the world.
When your agent decides to call stripe.create_charge(amount=5000), nothing checks:
- Is this agent authorized to make charges at all?
- Is €5000 within its per-transaction policy?
- Is this action within the agent's current capability scope?
- Will this action be logged in a tamper-evident way?
The answer today is: none of these checks exist. The tool call goes straight through.
Why This Layer Is Hard#
The reason this gap exists isn't laziness — it's that building identity infrastructure for agents is genuinely hard. Several unsolved problems:
Problem 1: Agents Aren't Humans
Human identity systems assume a login flow. A human authenticates once, gets a session, and that session persists.
Agents don't log in. They spawn, execute, and terminate — sometimes thousands of times per day. Their "session" is a single task completion. Traditional IAM doesn't map onto this model.
Problem 2: Actions Are Dynamic
With humans, you can define roles: "admin", "editor", "viewer". The role doesn't change mid-session.
With agents, the required permissions are dynamic. An agent might need to:
- Read data (low risk)
- Process a refund under €100 (medium risk)
- Trigger a production deploy (high risk)
All in the same workflow. Static roles don't capture this granularity.
Problem 3: Speed
Agents operate at machine speed. An authorization check that adds 200ms to a human login is unnoticeable. An authorization check that adds 200ms to every agent action in a 1000-action workflow adds 3+ minutes of latency.
The identity layer needs to be fast — p99 under 20ms — or it won't be used.
Problem 4: Auditability at Scale
A human makes 50 decisions per day. An agent makes 50 decisions per minute. Traditional audit logging systems aren't designed for this volume, and most don't provide the tamper-evident guarantees required for compliance.
What the Missing Layer Looks Like#
After working on this problem, we believe the missing layer needs five primitives:
1. Agent Identity (Who is acting?)
Each agent gets a cryptographic identity — an Ed25519 keypair. The public key is registered with the identity layer. Every action the agent takes is signed with the private key.
This gives you verifiable attribution. You know, with mathematical certainty, which agent performed which action.
2. Policy Engine (What is it allowed to do?)
Policies define the permission boundary for each agent:
{
"agent": "agt_payment_processor",
"rules": {
"allowed_tools": ["charge_payment", "issue_refund"],
"spend_limits": { "max_per_tx": 100, "max_per_day": 1000 },
"rate_limits": { "actions_per_minute": 20 }
}
}
Not a role. A precise, agent-specific policy that can be versioned, audited, and updated without touching the agent code.
3. Capability Tokens (What can it do right now?)
Inspired by capability-based security, agents operate with short-lived, scoped tokens:
Agent requests capability for "charge_payment" (TTL: 5min)
Identity layer issues: cap_01J... (valid until 10:05am)
Agent uses cap_01J... for all charge_payment calls until expiry
This is fundamentally different from API keys. The token:
- Expires in 5 minutes (not 5 years)
- Is scoped to one action type
- Carries the policy constraints inline
- Can be revoked individually
4. Verification Gate (Should this action proceed?)
Before execution, every sensitive action passes through a verification gate:
POST /verify
{
"agent_id": "agt_01J...",
"action": "charge_payment",
"payload": { "amount": 45 },
"capability_token": "eyJ...",
"signature": "base64..."
}
→ { "decision": "ALLOW", "audit_event_id": "evt_01J..." }
The gate checks the full chain: identity → capability → policy → quotas. It returns ALLOW, DENY, or PENDING_APPROVAL with a machine-readable reason code.
5. Audit Trail (What happened and why?)
Every verification decision is logged with hash-chain integrity. You get:
- Who acted (agent identity)
- What they did (action + payload hash)
- Whether it was allowed (decision + reason code)
- When (timestamp)
- Why it was allowed (policy version that approved it)
And because of the hash chain, you can prove that the log hasn't been tampered with.
The Reference Architecture#
Here's what the stack looks like with the missing layer in place:
User Intent
↓
LLM (reasoning)
↓
Orchestration (LangGraph)
↓
Tool Call requested
↓
┌─────────────────────────────────┐
│ KYA Identity & Permission Layer │
│ │
│ 1. Verify agent identity │
│ 2. Check capability token │
│ 3. Evaluate policy │
│ 4. Check quotas │
│ 5. Log to audit trail │
│ │
│ → ALLOW / DENY │
└─────────────────────────────────┘
↓ (only if ALLOW)
Production System
The identity layer is not in the critical path of the LLM reasoning — it only activates when a tool call is about to execute.
Integration Is One Function Call#
The barrier to adoption needs to be as low as possible. In practice, adding the identity layer looks like this:
from kya_sdk import KyaClient
kya = KyaClient(workspace_id="ws_01J...")
# Wrap your tool execution
async def execute_tool(agent_id, tool_name, payload, capability_token, signature):
result = await kya.verify(
agent_id=agent_id,
action=tool_name,
payload=payload,
capability_token=capability_token,
signature=signature
)
if result.decision != "ALLOW":
raise PermissionError(f"{tool_name} denied: {result.reason_code}")
# Execute the actual tool
return await tools[tool_name](payload)
One function. One check. The missing layer is now present.
Who Needs This Now#
The teams that need this most urgently are building:
Fintech copilots: Agents that can initiate transactions, process refunds, or move funds. The blast radius of an unauthorized action is immediate and financial.
RevOps automation: Agents writing to CRMs, triggering outbound sequences, managing customer data. GDPR exposure without audit trails.
DevOps agents: Agents that can deploy, scale, or reconfigure infrastructure. A single unconstrained agent in prod is a nightmare scenario.
Healthcare workflows: Agents accessing patient records or triggering clinical workflows. HIPAA compliance requires audit evidence.
In all these cases, the question isn't "do we need identity and permission control for our agents?" The answer is obviously yes. The question is "why hasn't anyone built this yet?"
We're building it.
KYA is the open-source identity & permission layer for AI agents. Get started in 5 minutes →
Further Reading#
- AI Agents Are the New Root Users — why running agents without identity control is catastrophic
- KYA Quickstart: Add the missing layer in 5 minutes
- KYA API Reference — full REST API documentation
- Securing LangChain agents with KYA
- OpenAI function calling security with KYA