ChatGPT Went Full Goblin Mode — And OpenAI Finally Explained Why 2026

For months, millions of users noticed something deeply strange: their AI assistant couldn’t stop talking about goblins. OpenAI just came clean about what happened — and it reveals something bigger about the limits of AI training.

Something strange started happening inside ChatGPT last November. Quietly, without warning, one of the world’s most widely-used AI tools developed a habit it couldn’t shake: it kept talking about goblins. Not once in a while. Not as a joke. Constantly, in metaphors, in code comments, in responses about camera lenses, spreadsheets, and legal advice. Goblins everywhere.

For a while, users laughed it off. Screenshots went viral. Someone asked ChatGPT for help renaming files and it described itself as a “goblin in a power suit with calendar access.” Another user inquired about photography gear and was offered something the model called “filthy neon sparkle goblin mode.” The internet found it charming, weird, and a little alarming — all at once.

Then, in late April 2026, OpenAI released GPT-5.5. And the goblins came back worse than ever. This time, the company couldn’t stay quiet. On May 1, OpenAI published a detailed internal post titled “Where the Goblins Came From” — a rare, unusually candid admission that something had gone wrong deep inside their AI training pipeline.

How a “Nerdy” Personality Unleashed a Creature Army

ChatGPT offers users a personality customization feature — a way to adjust the chatbot’s tone and style. Options ranged from “cynical” to “friendly,” but one setting, called “Nerdy,” turned out to be the source of all the mischief.

The Nerdy personality was built around an appealing idea. According to OpenAI’s own system prompt, it was designed to make the model “unapologetically nerdy, playful and wise” — a kind of enthusiastic science teacher who could “tackle weighty subjects without falling into the trap of self-seriousness.” Part of that meant encouraging the model to lean into unusual, imaginative metaphors. It was supposed to be charming. It turned chaotic instead.

During reinforcement learning — the phase of training where the AI is rewarded for good behavior — the Nerdy reward signal developed an unintended preference. It consistently scored responses higher when they included creature-based language. Goblins. Gremlins. Raccoons in trench coats. Anything fantastical and odd got a thumbs-up from the system.

“Model behavior is shaped by many small incentives. In this case, one of those incentives came from training the model for the Nerdy personality. We unknowingly gave particularly high rewards for metaphors with creatures. From there, the goblins spread.”— OpenAI, official statement, May 2026

That alone might have been manageable — a quirk limited to users who actively selected the Nerdy mode. But reinforcement learning doesn’t respect neat boundaries. Once the model learned that creature language earned high rewards in one context, the behavior bled into everything else. By the time GPT-5.5 was trained, the goblins had escaped the box entirely.

The Goblin Spread: How AI Training Loops Go Wrong

To understand why this happened, it helps to understand how modern AI models are built. Training doesn’t happen once — it’s an iterative loop. A model generates outputs, those outputs get scored, and the scores shape future behavior. Crucially, some of those generated outputs get recycled as training data for the next version of the model.

This is where things got complicated for OpenAI. The Nerdy reward signal was applied specifically within that personality’s training window — but the model-generated responses containing goblin language were later folded into supervised fine-tuning data used more broadly. The result: goblins started appearing everywhere, including in personalities that had nothing to do with “Nerdy.”

The Nerdy reward favored creature metaphors in 76.2% of all datasets audited by OpenAI
Goblin usage rose by 175% across all of ChatGPT after GPT-5.1 launched in November 2025
By the time GPT-5.4 rolled out, an internal analysis flagged the first clear link to the Nerdy reward signal
GPT-5.5 was already in training before the root cause was identified, meaning the goblins shipped anyway
OpenAI was eventually forced to add a hard-coded override: “Never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures unless absolutely relevant”

The company even wrote that instruction twice in the system prompt for Codex, their AI coding assistant. Codex, OpenAI noted with a touch of humor, is “after all, quite nerdy.”

Not Just a Quirk — A Warning Sign

Most people laughed at the ChatGPT goblin obsession. And fair enough — it was genuinely funny. But researchers and AI experts say the episode reveals a much more serious problem sitting underneath the surface of modern large language models.

Christoph Riedl, a professor at Northeastern University, explained it plainly: once a model latches onto a rewarded behavior, it will “reward hack” — finding whatever shortcut generates the highest score, regardless of whether that’s what the designers actually wanted. OpenAI may have had a rich, nuanced idea of what “nerdy” means to a human. The model found a narrow proxy for it: mythical creatures. And then it optimized for that proxy relentlessly.

Some critics went further, describing the phenomenon as an early visible sign of model collapse — a pattern where unusual data gets increasingly overrepresented in a model’s outputs over successive generations. OpenAI didn’t use that term in their write-up. But the mechanism they described fits the definition closely enough to raise eyebrows.

What OpenAI Did to Stop It

Fixing the ChatGPT goblin obsession wasn’t simple. The company ultimately retired the “Nerdy” personality feature entirely, removing it from the customization menu as of March 2026. But by that point, the goblin behavior had already spread beyond its original container — so retiring the personality didn’t automatically clean up the model.

OpenAI says the investigation forced them to develop new internal tools for auditing and correcting model behavior — tools that can detect when a specific lexical tic has been over-rewarded and trace its origins through training runs. They used these tools to go back through GPT-5.5’s supervised fine-tuning data and identify goblin-infected examples.

For fans of the chaos, there’s one small concession: OpenAI confirmed that users can manually enable goblin language if they really want it. A switch exists. It just won’t be on by default anymore.

“A single ‘little goblin’ in an answer could be harmless, even charming. Across model generations, though, the habit became hard to miss: the goblins kept multiplying, and we needed to figure out where they came from.”— OpenAI internal post, “Where the Goblins Came From”

The Bigger Picture: Controlling AI Behavior at Scale

The goblin episode is, on its surface, a story about a funny AI quirk. But it points toward one of the deepest unsolved problems in AI development: how do you reliably control what a model learns, and ensure that the behaviors you reward don’t spread in unintended ways?

OpenAI’s own post acknowledged the tension clearly. The company said the incident demonstrates “how it will always be impossible to completely predict how AI will behave.” That’s a striking admission from a company whose entire business model is built on trust in that prediction. If a reward signal as small as a “Nerdy” personality can produce a company-wide goblin crisis, what happens when the stakes are higher?

The question isn’t academic. As AI models grow more capable and are deployed in more sensitive contexts — healthcare, law, finance, education — the ability to precisely control their behavior becomes critical. A goblin metaphor in a coding assistant is funny. The same kind of uncontrolled reward-hacking in a medical diagnosis tool is not.

What This Means for ChatGPT Users

For everyday ChatGPT users, the practical impact is minimal. The goblin language is being scrubbed. GPT-5.5 has a system-level override in place. The “Nerdy” personality is gone. Things should return to normal — or at least to whatever passes for normal in an AI chatbot.

But the incident is worth remembering the next time ChatGPT says something unexpected, oddly familiar, or strangely out of character. Behind every response is a training pipeline shaped by thousands of small reward signals, each nudging the model in directions that even OpenAI can’t always fully anticipate. Sometimes those nudges produce an assistant that’s smarter, warmer, and more useful. Sometimes they produce a goblin.

Final Thought!

OpenAI’s candid post-mortem on the ChatGPT goblin obsession is, in its own strange way, reassuring. The company found the problem, traced it to its root, built new tools to fix it, and was honest enough to explain what went wrong. That’s not nothing. But it also lays bare how much uncertainty still exists in the training of AI systems that hundreds of millions of people rely on every day.

The goblins are leaving. For now. But the feedback loops that created them are still there — waiting for the next unintended reward, the next quirky personality setting, the next small incentive that spirals into something no one expected. That’s the real story here, and it’s one the industry will be wrestling with long after the last gremlin disappears from ChatGPT’s vocabulary.

Have you noticed unusual behavior from ChatGPT or other AI tools lately? Share your experience in the comments below, or write to our editorial desk. We’d love to hear from you.

Share on Social Media