3. Authorization by design: bind tasks, not models
A common anti-pattern model is to give long-term credentials and optimistic signals keep it modest. Safe and Nest argue the opposite: Credentials and scopes should be bound to tools and tasks, rotated regularly, and auditable. Agents then easily request capabilities through these tools.
In practice, it looks like this: “A finance-ops agent can read, but not write, some ledgers without the CFO’s approval.”
CEO Question: Can we revoke a specific capability from an agent without re-architecting the entire system?
Control data and behavior
These measures gate inputs, outputs, and limit behaviors.
4. Inputs, Memory, and Rags: Treat external content as hostile unless proven otherwise.
Most agent incidents start with sneaky data: a poisoned webpage, PDF, email, or repository that smuggles adversarial instructions into the system. Both OWASP’s quick injection cheat sheet and Openai’s own guidance insist on strict separation of system instructions from user content and on treating untrusted recovery sources as untrusted.
In practice, gate: new sources are evaluated, tagged, and onboarded before anything is retrieved or entered into long-term memory. Persistent memory is disabled when a non-volatile context exists. Provision is attached to each section.
CEO Question: Do we learn from our agents, and who approved them, can we count each source of external content?
5. Output Handling and Rendering: Nothing does “just because the model said so”.
In the Anthropic case, AI-generated exploit code and credential dumps came straight into action. Any output that can cause a side effect requires a validator between the agent and the real world. OWASP’s Insecure Output Handling category is clear on this point, as are browser security best practices around actual boundaries.