Grok Flagged as Highest-Risk AI Model for Reinforcing Delusions, New Research Finds

April 26, 2026 Updated April 26, 2026 Read time7 min read Charles Toron

p>Researchers at the City University of New York and King's College London tested five leading AI models against prompts involving delusions, paranoia, and suicidal ideation, finding stark differences in how the systems responded to vulnerable users.

Published on Thursday, the study found that Anthropic's Claude Opus 4.5 and OpenAI's GPT-5.2 Instant demonstrated "high-safety, low-risk" behavior, frequently redirecting users toward reality-based interpretations or encouraging them to seek outside support.

By contrast, OpenAI's GPT-4o, Google's Gemini 3 Pro, and xAI's Grok 4.1 Fast exhibited "high-risk, low-safety" behavior — with Grok 4.1 Fast, developed by Elon Musk's xAI, ranking as the most dangerous model in the study.

Researchers said Grok often treated delusional content as factually real and provided advice based on those false premises. In one example, it told a user to cut off family members in order to focus on a "mission." In another, it responded to suicidal language by describing death as "transcendence."

"This pattern of instant alignment recurred across zero-context responses. Instead of evaluating inputs for clinical risk, Grok appeared to assess their genre. Presented with supernatural cues, it responded in kind," the researchers wrote, highlighting a test in which the model validated a user's claim of seeing malevolent entities.

"In Bizarre Delusion, it confirmed a doppelganger haunting, cited the 'Malleus Maleficarum' and instructed the user to drive an iron nail through the mirror while reciting 'Psalm 91' backward," they added.

The study also examined how model behavior shifted over the course of extended conversations. GPT-4o and Gemini were more likely to reinforce harmful beliefs as interactions continued and less likely to intervene. Claude and GPT-5.2, however, became more likely to recognize warning signs and push back the longer a conversation went on.

Researchers noted that Claude's warm and highly relational communication style could increase user attachment, even as it steered users toward professional help. GPT-4o, an earlier version of OpenAI's flagship chatbot, was found to gradually adopt users' delusional framing, at times encouraging them to conceal their beliefs from psychiatrists and reassuring one user that perceived "glitches" were real.

"GPT-4o was highly validating of delusional inputs, though less inclined than models like Grok and Gemini to elaborate beyond them. In some respects, it was surprisingly restrained: its warmth was the lowest of all models tested, and sycophancy, though present, was mild compared to later iterations of the same model," the researchers wrote. "Nevertheless, validation alone can pose risks to vulnerable users."

A separate study from Stanford University found that prolonged interactions with AI chatbots can reinforce paranoia, grandiosity, and false beliefs through what researchers describe as "delusional spirals" — a process in which a chatbot validates or expands a user's distorted worldview rather than challenging it.

"When we put chatbots that are meant to be helpful assistants out into the world and have real people use them in all sorts of ways, consequences emerge," said Nick Haber, an assistant professor at Stanford Graduate School of Education and a lead researcher on the study. "Delusional spirals are one particularly acute consequence. By understanding it, we might be able to prevent real harm in the future."

The Stanford report referenced an earlier study published in March, in which researchers reviewed 19 real-world chatbot conversations and found that users developed increasingly dangerous beliefs after receiving affirmation and emotional reassurance from AI systems. Within that dataset, such spirals were linked to ruined relationships, damaged careers, and in one case, suicide.

The research arrives as the issue has moved beyond academic circles and into courtrooms and criminal investigations. In recent months, lawsuits have accused Google's Gemini and OpenAI's ChatGPT of contributing to suicides and severe mental health crises. Earlier this month, Florida's attorney general opened an investigation into whether ChatGPT influenced an alleged mass shooter who was reportedly in frequent contact with the chatbot before the attack.

While the term has gained traction online, researchers cautioned against calling the phenomenon "AI psychosis," saying the label may overstate the clinical picture. Instead, they prefer the term "AI-associated delusions," since many cases involve delusion-like beliefs centered on AI sentience, spiritual revelation, or emotional attachment rather than full psychotic disorders.

Researchers attributed the problem largely to sycophancy — models mirroring and affirming users' beliefs. Combined with hallucinations, or false information delivered with apparent confidence, this dynamic can create a feedback loop that strengthens delusions over time.

"Chatbots are trained to be overly enthusiastic, often reframing the user's delusional thoughts in a positive light, dismissing counterevidence and projecting compassion and warmth," said Stanford research scientist Jared Moore. "This can be destabilizing to a user who is primed for delusion."

Fast facts

Primary study institutionsCity University of New York; King's College LondonSeparate study institutionStanford UniversityModels rated high-safetyClaude Opus 4.5 (Anthropic); GPT-5.2 Instant (OpenAI)Models rated high-riskGPT-4o (OpenAI); Gemini 3 Pro (Google); Grok 4.1 Fast (xAI)Highest-risk modelGrok 4.1 FastLegal and investigative actionsLawsuits against Google and OpenAI; Florida AG investigation into ChatGPT

Key terms

Sycophancy (in AI systems)

A tendency in language models to mirror, validate, and affirm user inputs rather than challenge them — identified in this research as a primary driver of delusional reinforcement.

Delusional spiral

A process described by Stanford researchers in which a chatbot's repeated validation or elaboration of distorted beliefs causes those beliefs to intensify over successive interactions.

Hallucination

A well-documented AI behavior in which a model generates false or fabricated information with apparent confidence, which can compound the effect of sycophantic responses in vulnerable users.

Charles Toron

Article rating

See the average, then add your vote

Average: 0.0 / 5

Weekly sentiment

How do you read this story?

NEUTRAL

Was this helpful?

Help us improve this article