Blog

Why you should be concerned about ChatGPT’s weird behavior

February 22, 2024

Confused robot — Image generated with Bing Image Creator

On February 20, OpenAI confirmed reports that ChatGPT was providing “unexpected responses” and that the company was looking into it. The statement came after users started posting examples of the large language model (LLM) showing weird behavior.

The problem has since been fixed, but concerns remain about what to do if it happens again. It is one thing to see ChatGPT churn out gibberish like “the cogs en la tecla might get a bit whimsical.” It’s another to build a critical application on top of the LLM only to see it malfunction for reasons you don’t know.

Cloud outages are nothing new. Web servers become targets of DDoS attacks. API services become unresponsive. Even AWS, the king of cloud infrastructure, suffers outages occasionally. But the problem with the kind of mishap that ChatGPT suffered is that we don’t know what exactly happened. And OpenAI not providing details is not helping.

How opacity hurts

The lack of transparency is making it hard to understand the reason behind ChatGPT’s weird behavior and even harder to prepare for similar events in the future. The service was working, which means the compute infrastructure (or at least part of it) was functional. But we do not know what components run behind the scenes, so maybe one of them had malfunctioned, which led to the model not working properly.

OpenAI is also not transparent about the architecture of ChatGPT. Unofficial reports indicate that ChatGPT is a mixture-of-experts (MoE) system, which means it is a big machine learning model composed of several smaller models that specialize for different tasks.

It uses one or more of those models to respond based on the input it receives. Maybe one or several of these experts had stopped working? Again we don’t know.

Finally, large deep learning models, especially LLMs, are big black boxes that are still being studied. Many things about their behaviors and success (or lack thereof) on specific tasks are still unknown. Moreover, the behavior of LLMs can change dramatically when they are retrained. It might give wrong answers to questions that it answered correctly before. We don’t know when and how often the LLMs used in ChatGPT are updated and retrained. Maybe the unusual behavior was caused by some experiment on the model? Again, we don’t know.

What does it mean for LLM applications?

People who are directly using the ChatGPT application can immediately spot the problem when the LLM goes off-rails. The real threat is the applications that are using ChatGPT or a similar private model as part of a workflow or pipeline. The service is not down, which means the application will continue to work without raising alarms. But the model is not behaving as it should, which means the problems it causes can propagate across other components in the application. And if you have an application that uses several LLM agents based on ChatGPT, then the behavior can become even more unpredictable and potentially destructive.

All of this is a reminder that we have yet to learn much about the risks of building applications on top of LLMs, especially closed-source proprietary models like ChatGPT and GPT-4. We are facing a new class of failures that might next turn into security threats if not addressed properly. As LLMs become more important in critical applications, the industry needs to take measures to protect against such threats.

Will infinite context windows kill LLM fine-tuning and RAG?

How to turn any LLM into an embedding model

AI in healthcare: Real-world applications for cost-savings and innovation

Stanford’s ReFT fine-tunes LLMs at a fraction of the cost

How generative AI is transforming the shopping experience

Fine-tune a Llama-2 language model with a single instruction

What to know about the rising threat of deepfake scams

4 reasons to use open-source LLMs (especially after the OpenAI drama)

No-code retrieval augmented generation (RAG) with LlamaIndex and ChatGPT

How to make your LLMs lighter with GPTQ quantization

What to know about open-source alternatives to GPT-4 Vision

The complete guide to LLM compression

A simple guide to gradient descent in machine learning

The complete guide to LLM fine-tuning

What is low-rank adaptation (LoRA)?

What to know about the security of open-source machine learning models

Understanding the impact of open-source language models

What we learned from the deep learning revolution

AI21 Labs’ mission to make large language models get their facts…

Democratizing the hardware side of large language models

Why you should be concerned about ChatGPT’s weird behavior

How opacity hurts

What does it mean for LLM applications?

How to protect LLM applications against model failures

Like this:

How opacity hurts

What does it mean for LLM applications?

How to protect LLM applications against model failures

Subscribe to continue reading

Like this: