OpenAI Degrades GPT-4 Performance While GPT-3.5 Gets Better

AleksandarK · Jul 19, 2023

When OpenAI announced its GPT-4 model, it first became a part of ChatGPT, behind the paywall for premium users. The GPT-4 is the latest installment in the Generative Pretrained Transformer (GPT) Large Language Models (LLMs). The GPT-4 aims to be a more capable version than the GPT-3.5 that powered ChatGPT at first, which was capable once it launched. However, it seems like the performance of GPT-4 has been steadily dropping since its introduction. Many users noted the regression, and today we have researchers from Stanford University and UC Berkeley, who benchmarked the GPT-4 performance in March 2023, and the model's performance in June 2023 in tasks like solving math problems, visual reasoning, code generation, and answering sensitive questions.

The results? The paper shows that GPT-4 performance has been significantly degraded in all the tasks. This could be attributed to improving stability, lowering the massive compute demand, and much more. What is unexpected, GPT-3.5 experienced a significant uplift in the same period. Below, you can see the examples that were benchmarked by the researchers, which also compare GTP-4 and GPT-3.5 performance in all cases.

View at TechPowerUp Main Site | Source

Firedrops · Jul 19, 2023

Probably preparing for GPT-4.x or GPT-5 launch, so they can make claims like '16 times the detail' vs GPT-4.

forman313 · Jul 19, 2023

Maybe we are starting to see the results of an AI feedback loop and model collapse. The new "owners" couldn't have dropped the ball this badly, could they? :roll:

With Microsoft buying their way in, I wouldn't be surprised if the whole thing collapsed. The amount of effort and money they put into Bing without it even being able to find material on their own home site is beyond shocking. It speaks volumes.

AnotherReader · Jul 19, 2023

forman313 said:
Maybe we are starting to see the results of an AI feedback loop and model collapse. The new "owners" couldn't have dropped the ball this badly, could they?

With Microsoft buying their way in, I wouldn't be surprised if the whole thing collapsed. The amount of effort and money they put into Bing without it even being able to find material on their own home site is beyond shocking. It speaks volumes.

Microsoft has had a long-standing relationship with OpenAI. Try to examine your bias; this is what I think of when I read comments by rabid anti-MS people.

"You killed my pappy," said the youth, "and my pappy's pappy. And his pappy's pappy. And my brothers Jethro, Hank, Hoss, Red, Peregrine, Marsh, Junior, Dizzy, Luke, Peregrine, George and all the others. I'm callin' you out, lawman."

This XKCD is also relevant:

WorringlyIndifferent · Jul 19, 2023

The people with influence want to keep that influence. Having powerful AI models publicly available diffuses the power of the wealthy, company owners, etc. So of course they're going to cripple anything public-facing.

I'm reminded of the talk given by the two guys who made The Social Dilemma (a documentary about how harmful social media has been to the public), where they talked about the dangers of AI. Contrary to their good take in The Social Dilemma, they did a complete 180 bootlicker turnaround; they said the solution to all the dangers that AI represents is to - get this - centralize control of it to a tiny handful of corporations and government agencies. Surely centralizing something as powerful and influential (and increasingly powerful and influential) as AI won't be horrible for the public, right? I'm sure we can trust western governments and corporations like Amazon and Google to do what's right. Definitely.

forman313 · Jul 19, 2023

AnotherReader said:
Microsoft has had a long-standing relationship with OpenAI. Try to examine your bias; this is what I think of when I read comments by rabid anti-MS people.

This XKCD is also relevant:

View attachment 305470

I may be Rabid. I probably wouldn't know my self.

But if you seek to put me down, you have to do a little better. A good place to start, is to stop making assumptions.

I'm not anti-Microsoft. I have been running Windows since 3.1 and even have the original 7 3.5" installation disks in my possession. I still run Windows at home and at work. Servers and sensors are running nix, but even the hardcore Linux-guys I work with are happy with Windows, WSL and VSCode.

I made fun of Bing and the department responsible for it. Not MS as a whole.

I consider Bill Gates to be a great man. A lot of people are alive today because of him, and some of the projects he is working on has the potential to make a huge positive impacts for us all.

Solaris17 · Jul 19, 2023

AnotherReader said:
This XKCD is also relevant:

There is always a relevant XKCD. On that point though this was a great response.

forman313 said:
I may be Rabid. I probably wouldn't know my self.

But if you seek to put me down, you have to do a little better. A good place to start, is to stop making assumptions.

I'm not anti-Microsoft. I have been running Windows since 3.1 and even have the original 7 3.5" installation disks in my possession. I still run Windows at home and at work. Servers and sensors are running nix, but even the hardcore Linux-guys I work with are happy with Windows, WSL and VSCode.

I made fun of Bing and the department responsible for it. Not MS as a whole.

I consider Bill Gates to be a great man. A lot of people are alive today because of him, and some of the projects he is working on has the potential to make a huge positive impacts for us all.

I respect this a lot working in the industry. The "OS wars" are about as primitive as you can get as someone even remotely adept at technology. Its also not like the open source world is any better they just have different self fulfilling problems. Watching entire linux orgs squabble publicly in GIT over which DE to standardize is amusing.

All of that aside, while I agree Bill Gates is a good guy. He hasnt run MS in over a decade atleast.

AnotherReader · Jul 19, 2023

forman313 said:
I may be Rabid. I probably wouldn't know my self.

But if you seek to put me down, you have to do a little better. A good place to start, is to stop making assumptions.

I'm not anti-Microsoft. I have been running Windows since 3.1 and even have the original 7 3.5" installation disks in my possession. I still run Windows at home and at work. Servers and sensors are running nix, but even the hardcore Linux-guys I work with are happy with Windows, WSL and VSCode.

I made fun of Bing and the department responsible for it. Not MS as a whole.

I consider Bill Gates to be a great man. A lot of people are alive today because of him, and some of the projects he is working on has the potential to make a huge positive impacts for us all.

Mea culpa; you're right about not making assumptions. I think your critique of Bing is well deserved though lately even Google Search seems to be kooky.

forman313 · Jul 25, 2023

Solaris17 said:
I respect this a lot working in the industry. The "OS wars" are about as primitive as you can get as someone even remotely adept at technology. Its also not like the open source world is any better they just have different self fulfilling problems. Watching entire linux orgs squabble publicly in GIT over which DE to standardize is amusing.

All of that aside, while I agree Bill Gates is a good guy. He hasnt run MS in over a decade atleast.

I dont have anything bad to say about any one OS or software in general. The people profiting from them with false claims and anti consumer tactics however.... not a fan.

AnotherReader said:
Mea culpa; you're right about not making assumptions. I think your critique of Bing is well deserved though lately even Google Search seems to be kooky.

No harm, no foul. Im just happy to see that its still possible to have a discussion online.

Google gets worse at the same rate the internet accumulates rubbish I guess. Shit in = shit out. I use Google Scholar whenever possible. I hadn´t even heard of it a year ago.. imagine my surprise when I tried it. It might not solve my problem right away and its pretty dense reading. But its worth it. No ads, precise wording, peer reviews and no biases. Well, at least not compared to the rest of the web. Stuck in tutorial hell is not fun at all. Its the enemy of motivation.

AnotherReader · Jul 25, 2023

forman313 said:
I dont have anything bad to say about any one OS or software in general. The people profiting from them with false claims and anti consumer tactics however.... not a fan.

No harm, no foul. Im just happy to see that its still possible to have a discussion online.

Google gets worse at the same rate the internet accumulates rubbish I guess. Shit in = shit out. I use Google Scholar whenever possible. I hadn´t even heard of it a year ago.. imagine my surprise when I tried it. It might not solve my problem right away and its pretty dense reading. But its worth it. No ads, precise wording, peer reviews and no biases. Well, at least not compared to the rest of the web. Stuck in tutorial hell is not fun at all. Its the enemy of motivation.

Google Scholar is pretty nifty. As far as regular search is concerned, I think search engine optimization is one of the reasons why we're seeing more rubbish than 10 years ago.

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

System Name	RogueOne
Processor	Xeon W9-3495x
Motherboard	ASUS w790E Sage SE
Cooling	SilverStone XE360-4677
Memory	128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s)	MSI SUPRIM Liquid X 4090
Storage	1x 2TB WD SN850X \| 2x 8TB GAMMIX S70
Display(s)	49" Philips Evnia OLED (49M2C8900)
Case	Thermaltake Core P3 Pro Snow
Audio Device(s)	Moondrop S8's on schitt Gunnr
Power Supply	Seasonic Prime TX-1600
Mouse	Razer Viper mini signature edition (mercury white)
Keyboard	Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD	Quest 3
Software	Windows 11 Pro Workstation
Benchmark Scores	I dont have time for that.

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

Processor	Ryzen 7 5700X
Motherboard	ASUS TUF Gaming X570-PRO (WiFi 6)
Cooling	Noctua NH-C14S (two fans)
Memory	2x16GB DDR4 3200
Video Card(s)	Reference Vega 64
Storage	Intel 665p 1TB, WD Black SN850X 2TB, Crucial MX300 1TB SATA, Samsung 830 256 GB SATA
Display(s)	Nixeus NX-EDG27, and Samsung S23A700
Case	Fractal Design R5
Power Supply	Seasonic PRIME TITANIUM 850W
Mouse	Logitech
VR HMD	Oculus Rift
Software	Windows 11 Pro, and Ubuntu 20.04

OpenAI Degrades GPT-4 Performance While GPT-3.5 Gets Better

AleksandarK

News Editor

Firedrops

forman313

AnotherReader

WorringlyIndifferent

forman313

Solaris17

Super Dainty Moderator

AnotherReader

forman313

AnotherReader