• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

AMD Responds to Ryzen's Lower Than Expected 1080p Performance

It's funny to observe how the narrative changed from extreme hype to full crisis control.

The folks at PC Perspective have shared a statement from AMD in response to their question as to why AMD's Ryzen processors show lower than expected performance at 1080p resolution (despite posting good high-resolution, high-detail frame rates). Essentially, AMD is reinforcing the need for developers to optimize their games' performance to AMD's CPUs (claiming that these have only been properly tuned to Intel's architecture).
Seriously? Resorting to conspiracy theories? This is low, AMD!

I've never seen any games optimized for Intel, as a matter of fact games commonly contain some of the worst CPU code. The reason why Intel wins is their prefetcher is better at handling crappy code.

"As we presented at Ryzen Tech Day, we are supporting 300+ developer kits with game development studios to optimize current and future game releases for the all-new Ryzen CPU. We are on track for 1000+ developer systems in 2017. For example, Bethesda at GDC yesterday announced its strategic relationship with AMD to optimize for Ryzen CPUs, primarily through Vulkan low-level API optimizations, for a new generation of games, DLC and VR experiences.
Imagine if AMD spent this kind of resources on designing a god prefetcher for their CPU…

AMD Ryzen 8% behind Kapy-Lake in IPC and 12%behind Kapy-Lake in clock speed.
During gaming all the CPUs will boost, AMD is not far behind in clock speed, if it's not ahead.
Ryzen 7 1800 X boost beyond 4.0 GHz
i7-7700K boosts to 4.5 GHz
i5-7600K boosts to 4.2 GHz
i7-6950X boosts to 4.0 GHz
i7-6900K boosts to 4.0 GHz
i7-6800K boosts to 3.8 GHz
Yet all of these Intel CPUs have marginal differences in games, while Ryzen struggles in a number of games. Something tells me that it's not just a lack of clock speed. We already know there is little gains for Intel beyond 4.0 GHz, so AMD would have to do something with their prefetcher.

-----

Well, we all knew this were going to happen. AMD did a decent job by building a more superscalar processor, but they didn't prioritize building a proper front end/prefetcher. Their prefetcher is worse the one in Sandy Bridge, and considering that most of the improvements from Sandy Bridge to Kaby Lake is in the prefetcher, they have some serious catching up do to.

The efficiency of the prefetcher matters a lot for some workloads, including gaming. And when it comes to cache misses, increasing the clock frequency wouldn't help mitigate the performance penalty.

It's not like this "problem" is going to blow over. It might not matter to a GTX 1060, but when Ryzen is too slow to saturate a GTX 1080, things are only going to get worse with GTX 1080 Ti, Volta, etc. For buyers of GTX 1070 or higher the first generation Ryzen is simply too slow. For gaming a i7-6800K is a better deal, even if Ryzen 7 1800X beats it in some workloads.
 
And another incorrect post by efikkan.
 
I don't think anybody will recode already published titles to utilize more cores but lets see.
If this dev kit includes a C/C++ compiler for Ryzen, all they should have to do is point it at their code base and compile then push it out a digital distribution update. I think the only games that will get that treatment though are ones actively getting updates (CS:GO, DOTA2, PD2, MMOs, etc.).


FX didn't get special compiler treatment because that was putting lipstick on a pig.
 
@Batou1986 ,you get what you pay for ,Intel or amd,if you buy a low end cheap motherboard you get low end performance and will end up a moaner.

I overclocked my friends 6320 on you're board two nights ago because he finally got a evo 212 like I told him.
Nightmare, his throttled all the time at stock settings,I was forced into bclk clocking it by the crapness of his board, I've had his chip easily do 4.5 in my rig but not in his, 4.3 max.
Point being you're reference and perception have been affected by your purchase choices and you should have chosen better imho.

My man I think you need to stop making assumptions, The crapness of my board has nothing to do with the lack of performance from the FX series.
I easily beat the benchmarks for an 8350 because all 8 cores are running 4.2 as a non turbo boosted speed and I have none of these throttling issues you mentioned even when running linpac for hours.
Running at 5ghz is not going to make DCS or Star Citizen or any number of other games that have issues with AMD CPU's run any better for me.

Its great that AMD kinda caught up to Intel with Ryzen.
But if its going to be the same as the FX series where certain applications perform worse specifically because of AMD CPUs like DCS World and Cryengine games that's a major issue that cant just be ignored.

Also you need to stop repeating that devs are all making games for 8 cores and using that as a reason your "8 core" cpu is still ok, its well known fact that there are 4 full cpu cores and 4 limited cpu cores on the FX series and this makes a HUGE difference in performance when comparing it with a true 8 core cpu.
 
FX didn't get special compiler treatment because that was putting lipstick on a pig.
Bulldozer got compiler optimizations from major compilers such as GCC and LLVM, in fact GCC alone has 4 levels of it.

If this dev kit includes a C/C++ compiler for Ryzen, all they should have to do is point it at their code base and compile then push it out a digital distribution update. I think the only games that will get that treatment though are ones actively getting updates (CS:GO, DOTA2, PD2, MMOs, etc.).
Compiler optimizations have been available for a long time, ever since the ISA was planned. You can see some of the results from it here. Such optimizations usually helps with specific edge cases and helps with vectorization, which can help a bit in some applications. But games are usually limited by cache misses and branch mispredictions, compiler optimization wouldn't do much to help with this, so game developers can't just throw a compiler at it.
 
My man I think you need to stop making assumptions, The crapness of my board has nothing to do with the lack of performance from the FX series.
I easily beat the benchmarks for an 8350 because all 8 cores are running 4.2 as a non turbo boosted speed and I have none of these throttling issues you mentioned even when running linpac for hours.
Running at 5ghz is not going to make DCS or Star Citizen or any number of other games that have issues with AMD CPU's run any better for me.

Its great that AMD kinda caught up to Intel with Ryzen.
But if its going to be the same as the FX series where certain applications perform worse specifically because of AMD CPUs like DCS World and Cryengine games that's a major issue that cant just be ignored.

Also you need to stop repeating that devs are all making games for 8 cores and using that as a reason your "8 core" cpu is still ok, its well known fact that there are 4 full cpu cores and 4 limited cpu cores on the FX series and this makes a HUGE difference in performance when comparing it with a true 8 core cpu.
and you think a HT core is a full core? your so ,so wrong the FX series are a closer fit to dual cores and thats there inherant problem each core had less actual rescources and no micro ops so under utilisation happens.
intel on the other hand had micro ops and could if wanted use a hole cores(2 intel cores worth) on one thread leaveridging a wider execution pipe micro ops and better cache plus two node swaps lower ,but they are all old advantages and its clear amd have the raw per core and multicore performance so a few tweaks here and there on this brand new uarch and im sure it will be fine.

Then i might buy one ,but as i said if i bought one , running two 480s and 4k , i could not do better buying intel anything in any metric , apparently , so i could happily dodge 1080p my whole life but alas im skint so im dreamin still.
 
I don't think anybody will recode already published titles to utilize more cores but lets see.
I dont see why not when Trion/Rift has.
 
Well being fair, games don't need to be recoded to use more cores to benefit Ryzen, but they could do with being compiled with an AMD friendly compiler. Contrary to belief, its not quite as simple as just using the AMD compiler and off you go, but the work to do it wouldn't be extravagant either.

Depending on how well Ryzen sells, I can see plenty of recentish games getting patches.
 
game developers can't just throw a compiler at it.
To be fair, you are partially right, there are number of compiler optimizations that can help with prefetch to have less cache misses: http://ece-research.unm.edu/jimp/611/slides/chap5_3.html ... I say partially, because they either do so in the expense of more added instructions or covering edge cases for this particular purpose (gaming) ... it's win some lose some situation. Can't know for certain until you fire up the cpu profiler on the ryzen for the specific game.
Trouble is game devs will find little incentive to do so for past projects ... and for the new ones, compilers will get tuned as time goes by because of the zen in the console space
 
Resorting to conspiracy theories?
CPU optimization is a conspiracy theory?
Need to take into account that 8 core chip is actually 4x4 (OS would do that) with their own L3 is a conspiracy theory?

"lower than expected" is a fact nowadays? Expected by whom?

I have seen Starcraft 2 benchmarks with Ryzen doing min 16 average 31 fps (on 980), are you freaking kidding me?
This is plain and outright bullshit, there is no desktop CPU that is less than 4 years old that would score like that in that game.

There is an expected single thread advantage that Intel's 4 cores have, and AMD has voiced it actually.
AMD states they they are 6% behind Skylake IPC, taking higher clock into account it's flat 20% advantage for 7700k in single core tasks, who "expected" something, pretty please?
Haswell was an "unlikely but hopefully" target. It ended up on Broadwell levels, jeez.


/double facepalm
 
To be fair, you are partially right, there are number of compiler optimizations that can help with prefetch to have less cache misses: http://ece-research.unm.edu/jimp/611/slides/chap5_3.html ...
This is primarily referring to other instruction sets than x86, since modern x86 architectures have a prefetcher with a large instruction window. If a prefetching hint should give any helpe before dereferecning a pointer, then the programmer has to insert this hint earlier in the code. Large number of cache misses usually occur when traversing a list, but using such hints inside a loop will provide no benefit since the CPU will what's inside the loop, and you can't know the memory address of data several iterations ahead without dereferecning pointers, so doing so will probably reduce cache efficiency causing a performance penalty. For this reason manual prefetching is discouraged.

I say partially, because they either do so in the expense of more added instructions or covering edge cases for this particular purpose (gaming) ... it's win some lose some situation. Can't know for certain until you fire up the cpu profiler on the ryzen for the specific game.
Please explain what this means.

Trouble is game devs will find little incentive to do so for past projects ... and for the new ones, compilers will get tuned as time goes by because of the zen in the console space
Compilers are already "tuned", so we wouldn't see any major change there, but as I've mentioned optimizing compilers can't do much with branch mispredictions and cache misses.
If a compiler were to eliminate some branching, the CPU has to have some new unique instructions allowing certain conditionals to be converted into branchless code. Otherwise, a compiler can't help here.
Data cache misses usually occur because of traversal of lists, and the only way to eliminate this would be to rewrite the whole codebase to align the data in a native array, no compiler can ever do this. This is largely a result of how the developer chose to do OOP.
Code cache misses is once again usually a result of the code structure, OOP and lists of arbitrary elements is the greatest challenge here. Once again the solution is to restructure the code which is outside the realm of a compiler. A kind of optimization I can think of which would help here would be to inline small functions, but compilers already do that, like GCC with -O2 which enables -finline-small-functions.
 
Please explain what this means.
I was pointing out that only running a cpu profiler while debugging a specific game on ryzen can show where those nanoseconds are lost inside a frame compared to less new cpu architectures. Then critical sections get either rewritten for specific arch, or those libraries can be compiled with different options. Mostly combination of both.
Compilers are already "tuned", so we wouldn't see any major change there, but as I've mentioned optimizing compilers can't do much with branch mispredictions and cache misses.
Let's not forget this is a completely new arch, I'm not saying compilers should suddenly start doing impossible ... but in the realm of what is possible, it seems there is a headroom. I'm guessing here that CPU with such large cache shouldn't suffer much from cache misses, branch misprediction is another story but anyway both would produce stuttery experience, and fps seems extremely steady only lower on average.
 
I was pointing out that only running a cpu profiler while debugging a specific game on ryzen can show where those nanoseconds are lost inside a frame compared to less new cpu architectures. Then critical sections get either rewritten for specific arch, or those libraries can be compiled with different options. Mostly combination of both.
First of all, profiles will not measure large problems like cache misses accurately. There will hardly be any games which can get substantial benefits from AMD specific tweaks this way, since compiler optimizations are limited to small patterns of instructions. Unless there are some big "bugs" in the Zen architecture here, there is little to gain from this. Almost all larger problems would need a rewrite, and are not AMD specific in any way.

Let's not forget this is a completely new arch, I'm not saying compilers should suddenly start doing impossible ... but in the realm of what is possible, it seems there is a headroom.
Why does there seem to be headroom? Do you even know how a compiler works? You clearly don't seem to do so.

I'm guessing here that CPU with such large cache shouldn't suffer much from cache misses,
You know what a kB is right?
Just the rendering of a single frame will process several hundred MBs, and at 60 FPS there is a lot of data flowing through.
With 512 kB of L2 cache, and 8 MB of shared L3 cache it's not like even 1% of the data is in there at any point.

branch misprediction is another story but anyway both would produce stuttery experience, and fps seems extremely steady only lower on average.
So since FPS is "stable", there is not branch mispredictions and cache misses? I'm sorry, but you clearly don't even know at which scale this things even happen. We are not talking of single stalls causing ms of latency known as stutter, no we are talking about clock cycles which are in ns scale, and since there are so many thousands of them every second they add up to a steady performance drop rather than noticeable stutter. A single local branch misprediction causes ~20 clocks of idle, a non-local adds a cache miss as well(code cache miss), so +~250 clocks. A data cache miss is ~250 clocks on modern CPUs.
 
There are for sure but in general these parts are not used for gaming so this whole 1080p lower gaming performance might not be an issue at all. At least not now. We will discuss once more on this when r5/r3 will come.
Quite frankly, I'm not worried about FHD gaming at all (I game at 1920x1200 atm). In a couple of years I hope 4k will become much more affordable. What's giving me pause is AMD has only matched an architecture that has been stagnant for years. AMD themselves said Zen is their workhorse for the next four years. And if Intel comes up with something till then (which they probably will), AMD may not get a chance to cash in properly. Then again, I was never that good at predicting things ;)
 
And if Intel comes up with something till then (which they probably will), AMD may not get a chance to cash in properly. Then again, I was never that good at predicting things ;)

On the other hand if Intel doesn't have some secret weapon, then I think AMD will stand to gain ~100% in market share by 2018 (would put them ~35%) based on Ryzen and refinements. This is *so* much better than Bulldozer. Ryzen may be down a little on raw speed but power efficiency is very good, which will bode well for laptops and servers.
 
So since FPS is "stable", there is not branch mispredictions and cache misses? I'm sorry, but you clearly don't even know at which scale this things even happen. We are not talking of single stalls causing ms of latency known as stutter, no we are talking about clock cycles which are in ns scale, and since there are so many thousands of them every second they add up to a steady performance drop rather than noticeable stutter. A single local branch misprediction causes ~20 clocks of idle, a non-local adds a cache miss as well(code cache miss), so +~250 clocks. A data cache miss is ~250 clocks on modern CPUs.
Of course I'm not talking about single cache miss ... rather about max frame times ... with each cache miss at 62.5 ns, any excessive cache misses in one frame compared to previous would show as much bigger variation of the maximum frame time ... you say it adds up to a steady performance drop, but I say it should affect measured frame time variations in a non-steady manner.
 
disadvantage does not melt at higher resolution due to anything to do with cpu. higher resolutions will simply bring gpu limit quite a bit lower.
with 1080ti (and hopefully vega) out soon, titanxp level of performance will be more accesible than ever. that performance level is the same on 1440p as gtx1080 performance is on 1080p.

Not necessarily true... There are actually quite a few 8 -12 threaded games on the market.
When there is falloff in framerate on the intel side and the AMD side stays flat at higher res... that shows a cpu bottleneck plain and clear. If the gap does anything other than stay constant... the difference is more than the gpu.
 
@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..
 
Last edited:
@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..
Rise of the tomb raider uses all my 8 cores/ threads...
 
@Patriot - Please list all titles which can utilize 8+ threads. :)

Wondering how many you believe is 'quite a few'..

The last two tomb raiders, Crytek engine games, frostbite engine games (BC2, 9 threads, BF3 was up to 12 threads... BF4, battlefront BF1 ... and whatever else uses it.
GTA5, Sniiper Elite is a showcase of it...

idk, how many AAA titles does it take? I am sure there are more... and as DX12 and vulkan become more prevalent I am sure that is the trend.

Point stands... If the gap does anything other than stay constant when you change resolutions... the difference is more than the gpu.

Even on the games that just hit 4-6 threads hard... having spare threads means if anything hiccups in the background doesn't hurt you.
 
Last edited:
Well there's hope that AMD's push on much more affordable multicore to the masses could result in developers making prettier and higher performance games even though it won't be easy just like AMD's push on lower level API with Mantle and more recently Vulkan/DX12
 
Well there's hope that AMD's push on much more affordable multicore to the masses could result in developers making prettier and higher performance games even though it won't be easy just like AMD's push on lower level API with Mantle and more recently Vulkan/DX12
They are also pushing 1000+ dev units out... they are giving away ryzen to game devs...
 
Why not compare any Ryzen againts i7 7700k at same clock speed, mem timings, core/thread count?

For example, because Ryzen won't oc much. Clock them both @ 3.9ghz ~ 4.1ghz, 4c/8t. I know we are gimping the i7 7700k but i'm just curious to know the result of "almost the same" setup would be. Gaming & productivity benches needed

division.png gta-v.png battlefield-1.png farcry-primal-1.png rise-of-the-tomb-raider.png
division.png
gta-v.png
battlefield-1.png
farcry-primal-1.png
rise-of-the-tomb-raider.png
 
@akumod77 Just wait for Anandtech. They'll do this right.
 
Back
Top