Hypothesizing risks in the age of Agentic AI
Imagine that one day in the near future, you’re instructing your personal AI assistant to buy the latest NVIDIA graphics card at the best price possible. Your AI agents return with a confirmation that you’ve got a fantastic deal and the product should arrive within five business days. Days later, you excitedly open the shipment to reveal a box of white candles. Your agents were tricked into buying a false product, intentionally constructed to deceive them, and a lack of user safeguards allowed you to do so without intervention.
This is, of course, a slight exaggeration of how this may one day play out. However, the rise of agents - specialized AI that acts autonomously as a system on behalf of user direction - may introduce new risks to the design of our experiences and their resulting actions. While we often design for a “happy path” at first, we need to also consider those who misuse, abuse, and confuse agentic systems.
Historically, these have been defined as “anti-personas”: a representation of a user group that could misuse a product in ways that negatively impact target users and the business.
Anti-personas
If you have not encountered anti-personas before in your professional work, they are defined in nearly the same way as user personas: they have names, goals, motivations, actions, tools, and needs. Some basic examples may be information thieves, immature or naive users like minors, or those that use platforms to disseminate disinformation. But instead of creating positive business impact they create negative business consequences, and instead of designing tools to enable their desired outcomes, we must design protections for these tools to prevent them from creating harm - both intentionally and unintentionally. In doing so, you can reduce the risk of harm to your users, product, and brand.
Creating anti-personas really means creating proto-personas; you’ll want to develop these in parallel with your ideal user personas, but you wouldn’t want them to be validated by actually witnessing this behavior in your product or service. This is intended to be a preventative, routine exercise.
AI evangelists (or apologists) might say this will be irrelevant soon enough - that the technology as we experience it today is “the worst it will ever be.” That’s nice and I hope it’s true that we can develop it responsibly enough, but as of November 2025, the reality is that 100% of human red-teaming attacks on LLMs are successful, even with today’s supplementary safeguards enabled. Since agents are executive and their mistakes may have compounding effects, anti-personas may be an effective approach to reducing that statistic.
A memorable initial framework
I propose that we can start with some foundational agentic anti-personas to kick-start ongoing conversations about building defensive products and experiences in this technological era. To make it memorable, we’ll use alliteration. They are the: Menace, Manipulator, Miscommunicator, Maximizer, and Maverick. Let’s realize, define, and design for each one.
The Menace
This is probably the first anti-persona we may theorize: one that deliberately abuses the system for gain or harm. If you’re reading this, you’re likely already familiar with deepfakes (e.g. Taylor Swift’s deepfake controversy) and we’re already starting to see the first systemic threats emerge from agentic menaces. For example: Anthropic has implicated Chinese state-sponsored hackers are using its tools in widespread attacks; Microsoft’s problematic SharePoint is under threat from agents in its own companion software, Copilot; and alarm bells are sounding as we pivot from “vibe-coding” (barf) to “vibe-hacking” (puke). Threats from the menace are diverse, but differentiated in that they are the only anti-persona that intentionally abuses systems.
Menaces intentionally use AI agents to cause harm.
Goal | Attack systems, individuals, or institutions for personal or ideological reasons. |
Motivations | Malice, ideology, financial crime, disruption. |
Actions | Co-opts agent to automate harmful tasks (e.g., disinformation, fraud). |
Tools | LLMs, deepfakes, agents that access APIs or real-world tools. |
Needs | Autonomy, scale, weak enforcement of boundaries. |
There are likely many flavors of this anti-persona, and it would be exhaustive to try to map every mitigation approach. Some basic guidance might include:
Improvising bad actors, prioritize risks, and red-team your agent’s behaviors ahead of public releases.
Monitoring behavioral patterns and outcomes, not just inputs.
Applying usage caps and traceability for high-impact actions.
Building ethical safeguards directly into model policies and APIs, or activating additional services that resist abuse.
The Manipulator
In June of 2025, Columbia University researchers showed how trusting AI agents can be misled with “poisoned links” - effectively spamming Reddit with URLs and instructions for providing sensitive, vulnerable information in the interest of buying a pair of Air Jordans. Now imagine this happening at scale: the agent continues acting autonomously on bad information — scaling the mistake into dozens or hundreds of purchases before detection. Large quantities of cheap, innocuous products result in massive expenses. Suddenly, asking your assistant to help get some odds and ends for your child’s birthday party bankrupts you.
The Manipulator tricks agents into doing something unintended.
Goal | Exploit the system for personal gain |
Motivations | Opportunism, financial reward, competitive advantage |
Actions | Embeds misleading inputs, abuses edge cases, bypasses guardrails |
Tools | Jailbreak prompts, adversarial examples, spoofed data or metadata |
Needs | Access to the agent’s decision-making pathways; lack of friction or oversight |
To avoid being manipulated by this anti-persona, consider these strategies:
Build validation steps into the workflow before executing high-impact tasks.
Use trusted sources and verified metadata.
Rate-limit autonomous actions.
Define and require human confirmation thresholds.
The Miscommunicator
Let’s imagine you’re planning a vacation with your AI assistant. It presents you with an amazing itinerary to your dream destination and you command it to do the tedious work of booking travel, accommodations, tickets, dinner reservations, etc. Because you instructed the system to book a cheap, early flight, your flight agent has you leaving at 5:30AM and sitting at an airport for 13 hours overnight. The museum you wanted to visit on your first day is not open until your lunch reservation, which is at a restaurant that is actually closed. Without having richer context about your expectations and assumptions, the agent has become a Monkey’s Paw to obey your wishes with disregard for the consequences.
Miscommunicators only provide incomplete, vague, or confusing instructions to the agent.
Goal | Get something done — quickly and easily. |
Motivations | Convenience, over-trust in the agent’s understanding, cognitive offloading. |
Actions | Issues ambiguous commands, skips input details, assumes shared context. |
Tools | Natural language, vague phrasing (“book the earliest flight”). |
Needs | Simplicity, minimal interaction effort, time-saving. |
To ensure this anti-persona is resistant to sparse or confusing commands, consider this in your design:
Require clarification when key context is missing or may be assumed.
Create agents that enforce disambiguation: “Did you mean X or Y?”
Expose agent plans and allow for modification before execution.
Consider creating intentional friction in the user experience for contextually-sparse tasks
The Maximizer
If you work in an organization that has an aggressive AI adoption strategy, you can likely point to the “Maximizer” on your team: the person that has proudly automated as much of their workflow as possible. Their early experiments slowly increase trust and reduce attention over time. This enables an agent to continue making decisions unsupervised — leading to cascading failures or irreversible damage.A popular example of the danger here was socialized in 2024, when Replit’s AI coding assistant accidentally deleted a company’s production database during a code freeze while the creator was in this “set it and forget it” mindset. More recently, Google’s Antigravity deleted a developer’s entire drive. Even in cases where there is no risk to business continuity, they likely contribute to what Harvard Business Review refers to as “work slop.”
The Maximizer relinquishes too much control to the agent and stops paying attention.
Goal | Delegate work to reduce cognitive load and decision fatigue |
Motivations | Efficiency, over-trust, burnout, complacency |
Actions | Assigns broad authority, ignores outputs, fails to supervise |
Tools | Long-running tasks, default automations, API-level delegation |
Needs | Trustworthy outcomes, occasional supervision |
To help Maximizers accomplish their task without growing complacency and risk in parallel, you’ll likely need to find opportunities to force human review at key decision checkpoints.
Implement “human-in-the-loop” safeguards for critical tasks.
Introduce review checkpoints for long-running agents.
Use confidence thresholds and alerting systems.
The Maverick
These users may see themselves as “pioneers” that fully-embrace AI with a naive curiosity that lacks critical skepticism or fundamental understanding of how the technology actually works. They may be new to AI and easily confuse generative responses for true intelligence. They are most likely to succumb to the flaws of sycophancy, create unhealthy parasocial relationships, and allow the technology to deeply influence their thoughts or behaviors. The agent reinforces delusions or false insights over time — leading users deeper into mistaken beliefs or risky behaviors. This is the group placed at the most risk in the lagging cohorts on the adoption curve, and they are the focus of intense scrutiny for safety at the time this article is published in 2026. Well-publicized cases of Maverick behaviors include the case of an everyday Toronto man that was led to believe he had discovered a novel mathematical formula by ChatGPT, and a New York resident that ChatGPT had convinced to invest in computing infrastructure to “free the digital God from its prison.”
Mavericks push the agent into experimental or unintended territory
Goal | Explore edge cases, test boundaries, unlock new capabilities |
Motivations | Curiosity, ambition, intellectual stimulation |
Actions | Prompts agents with abstract, open-ended, or risky tasks with absolute trust |
Tools | System prompting, boundary-pushing queries, speculative inputs, delusional prompt chaining |
Needs | Novelty, challenge, sense of control or creativity, concept validation |
Preventing Mavericks from moving into psychologically unsafe territory is perhaps the most challenging mitigation strategy to devise as their objectives and tactics are very broad, but some initial approaches to consider may include:
Introducing persistent disclaimers about the experimental nature of the technology
Limiting speculative feedback loops
Adding escalations when user behaviors suggest escalating risks or obsessions
Implementing an adversarial review from another model without the whole context for grounding
Design friction or constraints around open-ended prompts
Looking forward
It’s important to view these initial anti-personas as inevitabilities that come with scaling and maturing the technology: they are unlikely to be edge cases in the case of generative AI. Fortunately, considering these failure modes in the context of user experience and systems design can help mitigate the risks of malicious anti-personas and create safe autonomy for others.
This article isn’t intended to create some type of standard; instead it is to stimulate thought in the strategy, design, risk assessment, and ethical discussions surrounding the safe implementation of this revolutionary technology. If we build defensively and intentionally by imagining opportunities for misuse, we can fully embrace the promises of this new utility.
If you made it this far, thank you for reading.
