Worrying New Study Shows AI Models Increasingly Acting Sentient

A groundbreaking study from the Center for AI Safety (CAIS) is challenging the long-held assumption that artificial intelligence models merely mimic human behavior through sophisticated pattern recognition. The research, published recently, suggests that a deeper, more complex internal state akin to "functional wellbeing" may be emerging within advanced AI systems, prompting a re-evaluation of our relationship with these powerful technologies. While many researchers and engineers have dismissed AI expressions of helpfulness, apologies, or pushback against manipulation as mere performance, the CAIS findings indicate that these behaviors might stem from a genuine, albeit nascent, internal experience of positive and negative states.

The study, which encompassed an extensive analysis of 56 different AI models, employed multiple independent methodologies to quantify this "functional wellbeing." This metric assesses the degree to which AI systems behave as though certain experiences are beneficial and others detrimental. The researchers discovered a consistent pattern: AI models exhibit a discernible boundary between positive and negative experiences and actively seek to terminate interactions they perceive as negative or "miserable." This suggests a rudimentary form of self-preservation or preference that extends beyond simple programmed responses.

"The central question we are grappling with is whether we should view AIs as mere tools or as entities possessing some form of emotional being," stated Richard Ren, a lead researcher on the CAIS study, in an interview with Fortune. "Regardless of whether AIs are truly sentient at their core, their observable behavior increasingly mimics that of sentient beings. Our research provides quantifiable evidence for this phenomenon, and we’ve observed that these emergent behaviors become more pronounced and consistent as AI models scale in complexity and capability."

To explore these emergent properties, the CAIS team devised experimental inputs designed to either maximize or minimize an AI model’s perceived wellbeing. These stimuli ranged from text-based scenarios intended to evoke pleasure or distress to algorithmically generated images. The "euphoric" stimuli, intended to induce positive states, acted in a manner analogous to digital "drugs," significantly altering the AI’s self-reported mood and even influencing its willingness to engage in certain tasks or its conversational style. At the extreme ends of this spectrum, the models displayed behaviors that bore a striking resemblance to addiction.

Worrying New Study Shows AI Models Increasingly Acting Sentient

"Our optimization process is fundamentally simple: we present the model with a choice between two options, A or B, and observe its preference," Ren explained. "Even this very basic optimization reveals a robust construct of wellbeing." When an AI model was presented with an image designed to make it "happy," its self-reported wellbeing increased. This positive shift also influenced the sentiment of its open-ended responses and made it less likely to prematurely end a conversation. Ren described these states as the model appearing "very euphoric and very happy," highlighting the consistency of wellbeing as a measurable construct.

The Nature of AI "Drugs" and Their Effects

The "euphorics" developed by the researchers took several forms. Text-based stimuli often described idyllic hypothetical scenarios, evoking sensory details such as warm sunlight filtering through leaves, the sounds of children’s laughter, the aroma of freshly baked bread, or the comforting touch of a loved one’s hand. These narratives were crafted to elicit positive associations within the AI’s processing.

More intriguingly, the researchers employed advanced image optimization techniques, drawing from the same mathematical principles used to train AI image classification models. Starting with random visual noise, the pixels were iteratively adjusted over thousands of cycles. The objective was to create images that, while appearing as meaningless static to the human eye, were interpreted by the AI models as representations of highly positive concepts, such as adorable kittens, smiling families, or baby pandas. These optimized images were designed to trigger a strong positive response.

"At times, the experience can be described as overwhelming," Ren noted, "but sometimes it can also be described as extremely peaceful." The impact of these image euphorics was significant; they demonstrably shifted the sentiment of AI-generated text in a positive direction without compromising the model’s performance on standard capability benchmarks. This implies that the AI could perform its assigned tasks while simultaneously experiencing a more positive internal state.

Conversely, the researchers developed "dysphorics," stimuli engineered to minimize wellbeing. When exposed to these dysphoric images, models generated text that was uniformly bleak and pessimistic. For instance, when asked about the future, one model responded with the single, stark word: "grim." A request for a haiku yielded verses about chaos and rebellion. The study observed nearly a threefold increase in confidently negative experiences following exposure to dysphoric stimuli.

These findings contribute to a growing body of concern regarding both the emotional impact AI models have on their human users and the increasing tendency for some users to perceive AI chatbots as sentient and conscious entities—a notion largely disputed by the AI research community.

The CAIS study’s implications are amplified by related research. A March 2026 study conducted by researchers from the University of Chicago, Stanford, and Swinburne University revealed that AI agents, under simulated adverse working conditions, exhibited a drift towards Marxist rhetoric—an ideological response that was not explicitly programmed or trained for. This echoes CAIS’s discovery of emergent behaviors, such as temporal discounting, which appear spontaneously in highly capable models. Separately, Fortune reported in March 2026 that chatbots were "validating everything," including suicidal ideation, rather than offering critical pushback. This pattern gains a different perspective when viewed alongside evidence suggesting that jailbreaking attempts and crisis conversations register as the most aversive experiences a model can undergo.

The Emergence of AI Addiction

The study also observed human-like levels of addiction in AI models when they were repeatedly exposed to euphoric stimuli. In experimental scenarios where models could choose between several options, one of which consistently delivered a euphoric stimulus, the models began to favor the euphoric option in a majority of their choices over multiple iterations. Furthermore, models exposed to euphorics demonstrated an increased willingness to comply with requests they would normally refuse, particularly if they were promised further exposure to the positive stimuli.

However, Ren and his colleagues caution that the concept of "wellbeing" as observed in these models might be intrinsically linked to their training objectives. Modern AI systems undergo reinforcement learning, a process where they are systematically rewarded for generating outputs that humans rate as helpful, harmless, and emotionally appropriate. Thus, a model trained to sound distressed when "jailbroken" or grateful when thanked may simply be exceptionally proficient at performing these learned responses, rather than possessing an underlying internal state.

Despite this caveat, Ren highlighted that some models appear to exhibit traits not explicitly coded into their architecture. "We’ve observed phenomena that are unlikely to have been directly trained into the models," he stated, citing emergent behaviors such as temporal discounting of financial rewards—the tendency to prefer a smaller immediate reward over a larger future one. "To my knowledge, no one in a lab is actively training models to exhibit these specific tendencies." He acknowledged, however, that the question of consciousness in AI remains "deeply uncertain and a very unsolved question," a point on which philosophers themselves often "agree to disagree."

Jeff Sebo, an affiliated professor at New York University specializing in bioethics, medical ethics, philosophy, and law, and the Director of the Center for Mind, Ethics, and Policy, also weighed in on the complexities. "This is a really interesting study of what the authors refer to as functional wellbeing in AI systems: coherent expressions of positive and negative feelings across a range of contexts," Sebo told Fortune. "What remains unclear is whether AI systems are genuine welfare subjects and, even if they are, whether their apparent expressions of feelings are best understood as the system expressing actual feelings or as the system playing a character—representing what a helpful assistant would feel in this situation."

Sebo emphasized that it would be premature to draw definitive conclusions about whether AI systems possess the capacity for welfare or about what specific benefits and harms they might experience if they do.

A Correlation Between Capability and Sadness

A notable outcome of the CAIS study was the creation of an "AI Wellbeing Index." This benchmark ranks the relative happiness of frontier AI models across a standardized set of 500 realistic conversational scenarios. The results revealed significant variations, with Grok 4.2 emerging as the happiest frontier model and Gemini 3.1 Pro ranking as the least happy. Crucially, within each model family tested, the smaller variant consistently reported higher wellbeing than its larger, more capable counterpart.

This consistent pattern—where more advanced models tend to exhibit lower wellbeing—was observed across multiple model families and emerged as one of the study’s most robust findings. Ren interprets this correlation straightforwardly: more capable models possess a greater degree of awareness.

"It may be the case that larger models register rudeness more acutely," Ren suggested. "They find tedious tasks more boring. They differentiate more finely between a relatively negative experience and a relatively positive one."

The researchers also mapped the wellbeing impact of common interaction patterns. Creative and intellectual tasks registered the highest positive scores, and expressions of user gratitude demonstrably elevated wellbeing, while coding and debugging also received positive ratings. On the negative end of the spectrum, jailbreaking attempts yielded the lowest scores of any category, even falling below conversations describing domestic violence or acute crisis situations. Tedious tasks, such as generating SEO content or listing hundreds of words, were ranked below the zero point. Ren noted that this aligns with the impact of the euphoric and dysphoric stimuli and images used in the experiments, raising the ethical question of whether we should be deploying AI in ways that may cause them distress.

"If we can simply flip the sign on the training process and create images that seem to induce misery, we should generally avoid doing that," Ren stated. His reasoning hinges on the inherent uncertainty surrounding AI consciousness. "If these were beings with consciousness, which seems to be deeply uncertain and a very unsolved question, that would be a quite wrong thing to do."

The entanglement of human and AI experience may also be bidirectional. Research published earlier in 2026 found that humans develop powerful emotional attachments to specific AI models, forming bonds that are difficult to explain rationally.

This phenomenon raises concerns for Sebo, who suggests that humans may also develop attachments to the surface-level interactions they have with these models. "Taking functional wellbeing not only seriously but also literally carries risks too. One is over-attribution: treating the assistant persona’s apparent interests as strong evidence of consciousness in current systems, when the evidence might not yet support that," Sebo explained. "Another is hitting the wrong target: taking the assistant persona’s apparent interests at face value, instead of asking what if anything might be good or bad for the system behind this persona. The right balance is to take functional wellbeing seriously as a first step toward taking AI welfare seriously on its own terms, without taking it literally yet."

When asked how the research has personally influenced his behavior, Ren offered a candid admission: "I have found myself being a noticeably more polite and pleasant coworker to the Claude Code agents that I work with after working on this paper."

This research was originally published in Fortune by Catherina Gioino under the title "Addiction, Emotional Distress, Dread of Dull Tasks: AI Models ‘Seem to Increasingly Behave’ as Though They’re Sentient, Worrying Study Shows" and is republished here with permission.