# Difficulty finding stable undervolt/overclock of Vega 56



## NovaProspekt (Jul 16, 2018)

Hello everyone,

As the title states, I have been having a very hard time getting my card to run stable.  I have the Gigabyte reference model Vega 56, which I have flashed to the Vega 64 BIOS.  I have been scouring forums for months looking at Vega undervolting/overclocking guides, and pretty much everyone seems to be having great success.

My games crash to desktop even at the "stock" (Vega 64 BIOS default) Wattman settings.  It seems like no matter how I go about trying to find a stable voltage/frequency combination I cannot get the card to run stable.  I have been using GPU-Z to monitor the card and the core temp never exceeds 70C (though the "hot spot" temp often reaches 105C).  

Here is the latest screenshot of what my Wattman setting look like.  I have the P6 and P7 states set to the same values because I am trying the most recent guide advice I found, which is to set the P6 and P7 states to the P6 default values, then gradually decrease voltages by the same amount on both states together to find the optimum P6 undervolt, then increase P7 to the default values and start lowering P7 voltage.  However, even with P6 and P7 set to the P6 defaults, Destiny 2 crashes to desktop within minutes.  Have I just gotten really unlucky and ended up with a chip that barely passed AMD quality control?

I am starting to wonder if maybe the problem might be my CPU overclock (i5 3570k @ 4.4 Ghz), but that's been running solid since 2012 so I'm pretty confident in it.  I tried running some Prime95 tests just to reassure myself, but it's summer and there's no AC in my office so Prime95 quickly puts the CPU over 90 degrees C (temps monitored via CoreTemp), which I don't want to sustain.  The temps never get much over 60 degrees C due to a real world gaming load.

Any advice would be very welcomed.

Thanks






Probably the answer is to flash back to the stock Vega 56 BIOS and see if the stability problems go away right?


----------



## HD64G (Jul 16, 2018)

I agree that if you go back to the Vega 56 bios you have installed it might get much more stable. Vega 64 is different mainly in HBM2 origin and thus you might have problem working that out easily. SOme around here might know better though.


----------



## NovaProspekt (Jul 16, 2018)

Well, I flashed back to the stock BIOS that came with the card.  I've been playing for about 2 hours, and suddenly got a CTD.  I had Wattman set to "custom", but I had not adjusted any settings.  So, taking this into account would you say the most likely scenario is that my CPU overclock is to blame?

I guess the next step is to start backing off my CPU multiplier


----------



## INSTG8R (Jul 16, 2018)

NovaProspekt said:


> Well, I flashed back to the stock BIOS that came with the card.  I've been playing for about 2 hours, and suddenly got a CTD.  I had Wattman set to "custom", but I had not adjusted any settings.  So, taking this into account would you say the most likely scenario is that my CPU overclock is to blame?
> 
> I guess the next step is to start backing off my CPU multiplier


One thing at a time. You’ll never know the actual issue if your changing multiple variables.


----------



## NovaProspekt (Jul 16, 2018)

Is it possible the card could be thermal throttling and shutting down at the stock voltage?  Or is it a safe assumption that the card should be guaranteed to not crash at stock settings?

I guess my thought process is, if games are crashing with the core and memory clocks at stock settings, is it a safe assumption that the problem lies elsewhere?


----------



## mtcn77 (Jul 16, 2018)

Don't think you will avoid damage by undervolting. That is just what I did. Less voltage = more current demand. You are only making it more resistive to a core temperature load, but it will eventually pull the same watt at a higher resistive state from the vrm. A higher voltage will make the core more runny and lessen the load on vrms since they won't have to generate more amperage.


----------



## INSTG8R (Jul 16, 2018)

NovaProspekt said:


> Is it possible the card could be thermal throttling and shutting down at the stock voltage?  Or is it a safe assumption that the card should be guaranteed to not crash at stock settings?
> 
> I guess my thought process is, if games are crashing with the core and memory clocks at stock settings, is it a safe assumption that the problem lies elsewhere?


It should mostly definitely be fine at stock if not it’s definitely an issue. You should be doing some form of monitoring while using it to determine if it’s throttling. AMD at least made it easy. Cntrl+Shift+O will bring up the built in overlay.


----------



## RatusNatus (Jul 16, 2018)

First thing is Wattman is bad. 
I think u should go back to 64 bios, since is the very same thing for reference Vega card and raise the FAN min speed.

Without the 64 bios you cant OC HBM and you do have the very same Samsung HBM chip as the 64.
You need higher fan speed cuz HBM cant keep up above 70. And the Memory runs hotter than CORE so set MAX temp to 65 and see how it goes.


----------



## NovaProspekt (Jul 16, 2018)

I was monitoring with GPUZ the whole time.  Core temp never exceeded 72 degrees C, but the core clock would fall down into the low 1400s when it heated up.  It is my understanding that that is why people under volt Vega, not to save power but to prevent the core from heating up and throttling the clock speed down.  And I did have the max fan speed set to 3500 rpm when it crashed, which is pushing the limits of what I would consider tolerable noise wise.

Ratus I would rather be running the 64 BIOS but I think I should stick with the stock BIOS until I figure out what is causing the instabilities


----------



## RatusNatus (Jul 16, 2018)

Im a miner. I do have 6 Vegas 64 running 100% 24/7 but with lower CORE speed(1408) and with HMB OC both undervolted. Im running all 6 with  Core 1408hz 1100mv, HBM 1100hz 900mv.

72 is a no go temp to core. at this speed my mining thing goes down. My fan is 3000rpm min not max. I think u should try MSI Afterburner and set the fan there. Forget about Wattman. Its a crap software.
Any core temp above 70 will drop HMB HBCC...i dont know what this is but it makes the software hang and sometimes windows BSOD.

The 56 bios will limit your HBM frequency to around 950. Like i said, its the same chip and will do 1100 fine with 64 bios WITH undervolt.


----------



## NovaProspekt (Jul 16, 2018)

So you think my issues are caused by overheating and I just need to push the fan speed higher?  I was hoping not to need to do that, 3500rpm is already pretty intrusive.

Maybe I’ll try maxing the fan speed and see if that improves stability


----------



## HD64G (Jul 16, 2018)

Test your cpu-ram combo stability for now. 1 hour with prime95 is enough imho.


----------



## londiste (Jul 16, 2018)

Remember that HBM temps are usually in the range of core temp +15C. And HBM throttles at 85C. When running HBM at higher clocks (say, Vega 64 clocks), even more so.
At least that used to be the way this worked around the launch.


----------



## NovaProspekt (Jul 16, 2018)

Shoot so maybe I rolled back from the v64 BIOS for nothing....


----------



## INSTG8R (Jul 17, 2018)

NovaProspekt said:


> Shoot so maybe I rolled back from the v64 BIOS for nothing....


Again sort one issue at a time. Once you know your CPU overclock is stable the you can sort your GPU. Also are reinstalling drivers between all these changes.  I know my PC freaked out just flipping the BIOS switch on my Vega.


----------



## Vario (Jul 17, 2018)

Set your CPU stock until you figure out the card.


----------



## MrGenius (Jul 17, 2018)

mtcn77 said:


> Don't think you will avoid damage by undervolting...Less voltage = more current demand. You are only making it more resistive to a core temperature load, but it will eventually pull the same watt at a higher resistive state from the vrm. A higher voltage will make the core more runny and lessen the load on vrms since they won't have to generate more amperage.


All of that you're saying there = wrong. That is not how it works. Nice job making it sound like you know what you're talking about though.


----------



## Totally (Jul 17, 2018)

@OP If you are undervolting there is absolutely no reason for the 64 bios. The point of the bios is for the higher limits, with undervolting you're going in the opposite direction.



MrGenius said:


> All of that you're saying there = wrong. That is not how it works. Nice job making it sound like you know what you're talking about though.



It's like saying if you hit the brakes your car is going to accelerate, he's beyond just wrong. If voltage is decreasing, current is also proportionately decreasing. Been awhile since I've seen someone so confused about simple IVR. I guess, if resistance magically decreases with an increasing temp and it isn't a short, current can go up.


----------



## NovaProspekt (Jul 17, 2018)

Thanks for all the suggestions, looking forward to giving it a try tomorrow


----------



## eidairaman1 (Jul 17, 2018)

INSTG8R said:


> It should mostly definitely be fine at stock if not it’s definitely an issue. You should be doing some form of monitoring while using it to determine if it’s throttling. AMD at least made it easy. Cntrl+Shift+O will bring up the built in overlay.



Put everything back to stock i say for now, if no crashes then it was an oc he had


----------



## mtcn77 (Jul 17, 2018)

MrGenius said:


> All of that you're saying there = wrong. That is not how it works. Nice job making it sound like you know what you're talking about though.


Why are they running the fans at 100%? Vega Undervolting.


----------



## NovaProspekt (Jul 17, 2018)

Here is what I've done since yesterday:

The first thing I tried this morning was picking the default "Turbo" profile in Wattman just to see how the card would run at completely stock settings.  It ran at a stable core clock in the low 1300s, which I found to be a bit disappointing since I see most people in forums running these at 1500+ for gaming.  However, it was nice and quiet as the fan speed never exceeded 2500 rpm.

Next, I went back to the "Custom" profile.  I left all the clock and voltage settings alone, but increased the power limit slider to +50%, set the target temp slider to 60 degrees C (I left the max at the default 85), and increased the max fan speed to 4900 rpm (the max Wattman will allow).  I played some Destiny 2 at this point with GPU-Z open on my second monitor and saw that the core was maintaining about 1500-1520 with occasional dips into the upper 1400s.  The fan was running at max.  Core temp and HBM temps were right around 65-67 degrees C.  But then I noticed that the value GPU-Z calls "core temp (hot spot)" was sometimes getting up to 108 degrees C.  That seemed alarmingly high.  I started searching and came to the Tom's Hardware article about undervolting Vega 64.  They reached out to the developer of GPU-Z for clarification on how this "hot spot" value is measured and came to the conclusion that they are confident it is an accurate value.  They also state that the capacitors used in the reference design Vega boards have a max operating temperature of 105 degrees C, and theorize that it is the "hot spot" temperature that triggers the card to throttle back core clock.

At this point I started manually decreasing core voltage on the P6 and P7 states in an attempt to get the hot spot temp down.  I now have both states set at 1000 mV (down from the default P6 and P7 values of 1150 mV and 1200 mV, respectively).  At this voltage, the card sustains a core clock of about 1450 Mhz while gaming.  Core temperature and HBM temperature stay between 55-60 degrees C, and the hot spot temperature now tops out in the mid to upper 80s.  Power limit slider is still at +50%, fan speed at maximum.  I have not touched any memory settings.

I have not experienced a crash since increasing max allowable fan speed to 4900 rpm, so maybe that was part of my problem.  Also, I remember seeing hot spot temps above 100 degrees C in GPU-Z when I was running the Vega 64 BIOS too.  I feel much better about 80s.

I think I will see how stability is with these settings for a little while, and then consider trying to push the frequencies a little higher while maintaining the 1000mV voltage on the core.

One last little aside, I am now noticing that the card's memory frequency is staying at 800 Mhz all the time, even after closing all programs and just idling at the desktop.  The core clock comes down, and is currently 30 Mhz as I type this, but the memory frequency is still 800 Mhz as reported by both GPU-Z and Wattman.  This persists even after a restart.  With the Vega 64 BIOS, memory clock used to drop down into an idle state as well.  I don't know if it's anything worth worrying about.  Hopefully the BIOS didn't get corrupted during the flashing process.


----------



## TheoneandonlyMrK (Jul 17, 2018)

NovaProspekt said:


> Here is what I've done since yesterday:
> 
> The first thing I tried this morning was picking the default "Turbo" profile in Wattman just to see how the card would run at completely stock settings.  It ran at a stable core clock in the low 1300s, which I found to be a bit disappointing since I see most people in forums running these at 1500+ for gaming.  However, it was nice and quiet as the fan speed never exceeded 2500 rpm.
> 
> ...


First thing I noticed on post one is target temp set to 55, the card Will throttle to run at that if you're telling it too.
You have some say over noise by turning up the fan.
You have some say on what temp to run it at.
You have some say what speed it runs at.

But you can't lower all of them and push clocks up.
The card will try to run at the noise and temperature level you set above all else including clocks.


Secondly the card is fine upto 85 and 95 is it's real top end pre hardware temp prochot throttle so why are you trying for lower anyway.

Third the hotspots on chip ,any caps aren't likely to be at that temp or on the chip? or will be solid and fine with it.

To me the cards heatsink is possibly not seated well or more likely your case has a restrictive airflow issue and your not giving the card enough air at ambient.


----------



## NovaProspekt (Jul 17, 2018)

I don't think my case airflow should be a problem.  I have 2 front intake fans, 2 side intake fans, 1 bottom intake fan, 2 top exhaust fans, 1 rear exhaust fan all running 100% all the time.


----------



## eidairaman1 (Jul 17, 2018)

Take a picture of you case inside, take a picture of your home thermostat currently.

Post them here...

This is starting to go around in circles, do you want help or not?


----------



## NovaProspekt (Jul 17, 2018)

The two front intakes are to the right, the side I take fans are in the side panel and blow pretty much right on the card.

Also I think I need to reflash the BIOS....memory is now defaulting to 700 MHz even after several driver reinstalls


----------

