• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVidia now HIDING hot spot temperature? A great problem IMO.

Yeah like that time when 3090 and 3080 FE memory chips were easily reaching 100C (brand new card, after time and some dust it would be worse) while max tj junction temp was 105, that was easily fixable by swapping thermal pads on the memory chips, Nvidia engineers sure knew what they were doing then.

BTW I had that issue on my 3080 FE as well, and hotspot reading was useful then too, as I used too thick pads and my hotspot was a lot higher than it needed to be. Easily checked that and bought thinner pads. Another case where hotspot temp could be useful.
 
It’s 99.9% certain that any modern chip, CPU or GPU uses resources and info internally to regulate its functions, that the user is completely unaware of.

So there is a very slim chance that a GPU this size and that complexity to not monitor its silicon “health” by dozens of temperature, current and voltage sensors.

nVidia already is getting some criticism about the power increase, the not new fab node and the relatively small raw performance increase. Those points cannot be hidden.
They don’t need another point to make their GPUs even more unappealing to more potential buyers. Most likely they decided to hide the hotspot temp, especially on the FE 5090 that has a compact cooler and will probably peak constantly over 100C.

Sure it’s something less, info wise for those who used to use it. Had its value.

I don’t think this is a matter of GPU health concern but time will tell as always.

It may be harder for the user to know if TIM is going bad, but to be honest that is a thing of traditional TIM.
Not LM. If LM is between nickel plated surfaces it will almost never dry out. As long as it can’t drip out either (for reduced quantity).

Copper surface can absorb “half” of LM compound and gradually render it dry (within 2-3 months) and need to be replaced frequently until it’s saturated.
But that also depends on LM actual compound ingredients, for example the TG Conductonaut that is based on Indium-Gallium.

I will also guess that AIB variants will also have that sensor hidden for uniformity across all variants, even though there are going to be huge coolers on some of them with a lot lower temperatures.
 
I learned that at least on their FE 5090 models, Nvidia is now hiding the chip hot spot temperature. I hope it will not be the rule for all 50 series cards!

Hot spot temperature is an important health status indicator of the graphic card!

For example my current 4070 started with the hot spot being about 10°C above the whole chip temperature, but then it slowly started drifting higher and now is 30°C above. My card is probably affected by the crap-paste problem Igor reported about, and now thanks to the hot spot temperature I know that I should probably fix it.
Hotspot temperature tells you only about paste/TIM issues. The 20, 30, 40-series GPUs manage their own clocks, voltages, core usage, and temperature across hundreds of locations on the die. Distilling those readings that are sampled every few milliseconds down to "temperature" and "hotspot temperature" that's sampled every 1-2 seconds was already pretty silly.

For liquid metal there is nothing you can do to change the hotspot temperature, so why even present the user with a number that they can't do anything about, other than worry (due to being ill-educated on how GPU temperatures really work)?
 
I watched GN teardown and it seems that the liquid metal breached one or two ribbons, they probably overfilled it, or they have problem with air pressure in the chip area while mounting the cooler.

How I wrote in the thread about the FE cooler, their cooler has about half surface area of the larger AIB models, and the liquid metal is a desperate measure to lower the temps the chip is hitting with such a small surface area cooler. And the liquid metal was not enough, they simply need to hide the evidence of what is going on.

I do not think that making the 600W card so small was a good decision, they insisted on carrying this plan out even though they found out that the card overheated, and the disastruous fallout for everybody is that they decided to hide hot spot temperature.

Well yeah, and rubber decays too over time, damage to that seal could be hard to see and never mind replacing it. And one thing for sure i don't want liquid metal all over my PC because a seal broke and splashed all over every were because it hit a fan.

I'll leave the suckers and people with no sense of value to buy them.
 
Im sure they have a good reason to remove it. Not needed, added complexity, cost whatever.
If they wanted to hide something, they can always just fudge the number the sensor spits out through the bios anyways.
 
Im sure they have a good reason to remove it. Not needed...
I posted an example of my card with a manufacturing flaw caused by using a faulty material, which disintegrates and cannot function properly as intended.

I need that number to tell that there is something wrong with my card.
 
Some people that state this is a non issue would be griping if other companies did the same or relocated it...
Why? Its mostly an irrelevant temperature in grand scheme of things.

I bet you most forgot it was even there on their 30 and 40 cards until reviewers said it was gone on the 5090.
 
Wasn't the high hotspot a meme of RX 6000 and RX 7000 series? Especially RX 67xx series?
A noticeable difference between "normal" and hotspot temps on my 6700 XT, but nothing to worry about. I blame the aftermarket cooler for not having a flat surface.
 
Im sure they have a good reason to remove it. Not needed, added complexity, cost whatever.
If they wanted to hide something, they can always just fudge the number the sensor spits out through the bios anyways.

Not only would the above potentially put Nvidia on the hook for card failures (Nvidia would have to service AIB card failures if they are the one's with the faulty temperature readings), it could be used in court to demonstrate intent. After all, if there is no problem there is no reason to fudge sensor data. Of course the act of hiding the data itself could be considered intent but it's on a different level of intent than completely fudging the numbers. Fudging the numbers indicates a very clear awareness of an issue.

Hiding the data is far more preferable, it gets the same effect without the larger legal ramifications if caught.
 
Yeah like that time when 3090 and 3080 FE memory chips were easily reaching 100C (brand new card, after time and some dust it would be worse) while max tj junction temp was 105, that was easily fixable by swapping thermal pads on the memory chips, Nvidia engineers sure knew what they were doing then.

BTW I had that issue on my 3080 FE as well, and hotspot reading was useful then too, as I used too thick pads and my hotspot was a lot higher than it needed to be. Easily checked that and bought thinner pads. Another case where hotspot temp could be useful.

Well i've seen quite a few dead 3090 FE's mostly do to with memory - the 3090 FE was absolutely utterly terrible with memory cooling and would throttle whenever the case heated up. It was a design fault and wasn't made to be a big deal because open test benches with new cards wouldn't show the issue. Changing pads helped a lot but putting mini heatsinks on the back + pad swap is what fully solves it.

It was a terrible piece of engineering. When you spend so much money on a decent cooler only to forget (or not care) that the memory is hitting 100'c at stock it's unacceptable. People don't run open test benches, they run closed systems which inevitably acquire dust. The card.will.throttle. Later 3090's didn't have memory on the back so it was eventually a non-issue.

5090 FE cooler seems to have a similar problem because those memory temps frankly look atrocious. Let's see people benching them on closed systems.

edit: even with my thick backplate + heatsinks + Fujipoly Sarcon GR130A, memory temps hit around 84'c when running tests that stress the memory. Wanna know what an earlier model stock 3090 FE would have? Throttle the f out while being stuck at 105'c

Why? Its mostly an irrelevant temperature in grand scheme of things.

I bet you most forgot it was even there on their 30 and 40 cards until reviewers said it was gone on the 5090.

If you read the posts, you'll see why it was useful for a lof of us here.
 
mkppo - that is a lot of interesting information, I had no idea that 3090 FE had such problems.

At first I was genuinely enthusiastic about the new 5090 FE cooler, I am posting about it here:


When fins of a cooler reach 65°C on an open test bench, while the RAM is over 90°C and hotspot temp hidden from sight, I really do not think anything good will happen in a closed case. The VERY hot air exiting the cooler might do some damage to other components.
 
Geee I wonder who knows more about what constitutes the health status of a GPU. Is it you, or NVIDIA engineers?
The accountants. Don't be naive. Its not there because it might scare people as its probably higher with 500+W going through the chip. And it might very well run over 100C all the time, which is WAY past comfort zone in people's minds. And frankly, its not how I would like to run a 2000 dollar piece of kit either that I cared about.

Is this hidden to make sure people don't know it'll fail, no. But we also know that with the current tolerances Nvidia GPUs last WELL BEYOND any kind of economically practical usage time. You can rest assured that time is shortened as TDPs go up and coolers get smaller on top. We've seen this live in effect at Intel and how it backfires, sooner or later, the limit is reached and overstepped.

Humans can't handle the truth

mkppo - that is a lot of interesting information, I had no idea that 3090 FE had such problems.

At first I was genuinely enthusiastic about the new 5090 FE cooler, I am posting about it here:


When fins of a cooler reach 65°C on an open test bench, while the RAM is over 90°C and hotspot temp hidden from sight, I really do not think anything good will happen in a closed case. The VERY hot air exiting the cooler might do some damage to other components.
Well, there is another aspect here. Hotspot temp is very hard to cool, its almost a direct translation of voltage to temps, before the cooler's had any chance to dissipate that heat. Its a worst case scenario to view temps. And no, a 20-25C gap isn't problematic. It gets problematic if you have that hotspot and the GPU is throttling.
 
No offence or anything, but if you are going to run a high performance GPU like a 5090, you better have some air moving through your chassis.

That literally should go without saying. It is a 600w GPU after all, with spikes that will exceed that..

Silence and high performance almost never go hand in hand.

But you all knew that I am sure..
 
@BoggledBeagle I know the 3090FE cooler all too well - I have done a bunch of tests before putting it under water. Also tested 4 other 3090's including GB's waterforce WB version. I still have it. The FE cooler was consistently the worst for memory temps but some other AIB's also had offenders to a lesser degree (I think MSI Ventus 3090 was one of them)

Yeah I mean, i'm not dissing the new 5090 cooler. It's honestly a great piece of engineering; a two slot cooler dissipating 600W is pretty nuts. They spared no expenses too - Just look at the bobs and weaves in the fins themselves and you start to question how many models they made and machines they need. I have no issues with it other than the fact that I wish they made it 3 slots instead and it would've been even better.

Then there's the memory temp and that makes me disappointed. Even better pads would go a long way - Steve even asked that nvidia rep why they're still using that shitty pad. His response was nonsense (steve too - see his face) and basically said it's very flexible and easy to apply and there are forces that go through that these pads cover well yada yada. Bruh, I changed your thin sheets of white shit to a good one and got a 10'C drop so i don't give two shits about forces. Well if you're going to stick to those pads no matter what, do something else.
 
Im sure they have a good reason to remove it. Not needed, added complexity, cost whatever.
If they wanted to hide something, they can always just fudge the number the sensor spits out through the bios anyways.
Just because it doesn’t show it doesn’t mean it’s not there.
There’s absolutely no chance that a chip this big, consuming 500+W to not have dozens of temp and voltage sensors across the whole die for regulating operation.

No subject for complexity or cost whatsoever when you have already the sensors.
Like I said before nVidia didn’t want another reason for criticism.
99% if hotspot was visible it would be 100+C almost all times especially on the FE variant with its compact cooler, and that would raise questions and doubts, no matter if it’s harmful or not.

Like others said it would’ve been worse if the did show hotspot + offset.

Not trying to say that this will lead to silicon degradation. I want to believe that nVidia is not Intel at this point. Not that desperate anyway to deliver raw performance.

Just stating the obvious reasons hiding the hotspot metric.
 
No offence or anything, but if you are going to run a high performance GPU like a 5090, you better have some air moving through your chassis.

That literally should go without saying. It is a 600w GPU after all, with spikes that will exceed that..

Silence and high performance almost never go hand in hand.

But you all knew that I am sure..

Silence and high performance can indeed go hand in hand, if you got a proper cooler, unlike the fe card.
 
Hot spot temperature is an important health status indicator of the graphic card!
No it's not.
It merely makes Noobs overly concerned about nothing.
 
A noticeable difference between "normal" and hotspot temps on my 6700 XT, but nothing to worry about.
Mine has a very-noticeable difference, but mine wasn't nearly as bad as what some others report, IIRC.
 
Cool story bro.

We have been over this before, and i've shown you that you can have a system like mine running with fans at sub 1000 rpm. But you didn't get it then, and you still don't get it now - hardly surprising...
 
We have been over this before, and i've shown you that you can have a system like mine running with fans at sub 1000 rpm. But you didn't get it then, and you still don't get it now - hardly surprising...
Not really interested in anything you say.
 
Yeah like that time when 3090 and 3080 FE memory chips were easily reaching 100C (brand new card, after time and some dust it would be worse) while max tj junction temp was 105, that was easily fixable by swapping thermal pads on the memory chips, Nvidia engineers sure knew what they were doing then.
It wasn't really fixed by doing that, they still ran very hot. It's not that Nvidia engineers don't know what they are doing, they are simply trying to skimp out as much as they can on things like materials, that's why the 5090 cooler is smaller, in reality a 4090 FE cooler would perform better but it most certainly costs more to make so here we are. They didn't hide the hotspot readings by mistake either, obviously they are going to be much higher than before, they don't want people to complain about it.

Im sure they have a good reason to remove it. Not needed, added complexity, cost whatever.
It's not removed, the sensors are 100% still there, it's a needed function to ensure everything works fine, they're just not exposed outside the firmware.
 
Last edited:
It wasn't really fixed by doing that, they still ran very hot. It's not that Nvidia engineers don't know what they are doing, they are simply trying to skimp out as much as they can on things like materials, that's why the 5090 cooler is smaller, in reality a 4090 FE cooler would perform better but it most certainly costs more to make so here we are. They didn't hide the hotspot readings by mistake either, obviously they are going to be much higher than before, they don't want people to complain about it.


It's not removed, the sensors are 100% still there, it's a needed function to ensure everything works fine, they're just not exposed outside the firmware.
It's the ignorance is bliss approach from them.
 
It is exactly ignorance is bliss, and some would rather believe what Nvidia tells them rather than the valid reasons people have said why having the sensor is helpful.
 
No offence or anything, but if you are going to run a high performance GPU like a 5090, you better have some air moving through your chassis.

That literally should go without saying. It is a 600w GPU after all, with spikes that will exceed that..

Silence and high performance almost never go hand in hand.

But you all knew that I am sure..
In this situation, I don't think lots of airflow is going to solve much. Imagine having a radiator in a car that is just large enough for cool weather conditions, then you try upsizing only the fan in order to drive it in the summer. That extra airflow will help, but it can only do so much if the cooling solution gets saturated due to insufficient mass and surface area. It feels like NVIDIA designed the FE to counter all the flack it got for that rumored 4-slot cooler that never even made it out. The 5090 FE is physically smaller than the 4090 FE, but consumes over 100W more energy in loads. Yes, the cooler is an innovative design, and I actually like the concept. However, it feels like it's insufficient to put it on this GPU with its increased TDP. At best, the design is barely sufficient. 3-slot+ AIB 5090s perform 20C better at the same noise level to bare that out. I think had they made this same cooler a 3-slotter, it would have given them the extra mass they needed to properly cool this thing. 2 slots for a 600W card? That's just insane unless it's a waterblock.
 
Back
Top