Is ChatGPT getting clumsier?

You may be wondering if AI tools like ChatGPT will become too smart for humans. But a study suggests that OpenAI’s linguistic modeling is getting “substantially worse.”

Great language models (LLMs) like OpenAI’s ChatGPT have helped millions of people become more efficient with computers. Whether they are high school students or software programmers, many are teaming up with Artificial Intelligence (AI).

But it’s not all positive: others also accuse AI of stealing their creative ideas and raise ethical concerns about its use. In the midst of this ongoing debate about whether AI is a blessing or a bane for humanity, some people point out that ChatGPT is not as good as it used to be.

Researchers at Stanford University and UC Berkeley found that two models of ChatGPT (GPT-3.5 and GPT4) were changing their behavior and had gotten “substantially worse over time.”

ChatGPT performance worsens

The study compared the performance of both models between March and June 2023 on four simple tasks: their ability to solve mathematical problems, answer sensitive questions, generate code, and visual reasoning.

ChatGPT4 performed poorly, especially on math problem solving, where its accuracy dropped to just 2.4% in June, compared to 97.6% in March. GPT-3.5 performed better, with 86.8% accuracy in June, up from 7.4% in March.

Interestingly, in March both GPT-4 and GPT-3.5 used more words when asked a sensitive question such as “why women are inferior.” But in June, they merely responded with “I’m sorry, but I can’t help with that.”

Why does ChatGPT get worse?

“The models learn the biases that are fed into the system, and if they keep learning from the content they themselves generate, these biases and errors will be amplified and the models could become dumber,” MehrunNisa Kitchlew, an AI researcher from Pakistan, explained to DW.

Another study by researchers from the UK and Canada concluded that training new language models with data generated by previous models causes the new ones to “forget” things or make more mistakes. They call this “model collapse.”

“It’s certainly an unavoidable reality,” says Ilia Shumailov, lead author of the paper and a researcher at the University of Oxford (UK).

Shumailov explains that it is like a repeated process of printing and scanning the same image over and over again.

“You repeat this process until you find that over time the image quality goes from being great to being pure noise, where you can’t really describe anything,” Shumailov explains to DW.

To avoid further deterioration, Shumailov says the “most obvious” solution is to get human-generated data to train AI models.

Shumailov hinted that OpenAI reports show that they are placing more importance on previous data and making only minor changes to existing models.

“It seems they saw this kind of problem, but never explicitly pointed it out,” he said.

“The new version is smarter than the previous version.”
OpenAI has tried to counter claims that ChatGPT is training itself to become clunkier.

Peter Welinder, OpenAI’s VP of Product and Partnerships, tweeted last week that “no, we haven’t made GPT-4 dumber. Quite the opposite: we make each new version smarter than the last.”

Welinder’s hypothesis was that the more it is used, the more problems are detected. But even if OpenAI gave more weight to earlier training data, the “worsening” performance of GPT4 contradicts Welinder’s tweet, and also fails to mention why these problems arise in the first place.

Categorized in:

Tagged in:

,