Tuesday, August 23rd 2022

AMD Zen 4 EPYC CPU Benchmarked Showing a 17% Single Thread Performance Increase from Zen 3

The next-generation flagship AMD GENOA EPYC CPU has recently appeared on Geekbench 5 in a dual-socket configuration for a total of 192 cores and 384 threads. The processors were installed in an unknown Suma 65GA24 motherboard running at 3.51 GHz and paired with 768 GB of DDR5 memory. This setup achieved a single-core score of 1460 and multi-core result of 96535 which places the processor approximately 17% ahead of an equivalently clocked 128 core EPYC 7763 in single-threaded performance. The Geekbench listing also includes an OPN code of 100-000000997-01 which most likely corresponds to the flagship AMD EPYC 9664 with a max TDP of 400 W according to existing leaks.
Sources: Geekbench (via @moe_v_moe), Wccftech
Add your own comment

37 Comments on AMD Zen 4 EPYC CPU Benchmarked Showing a 17% Single Thread Performance Increase from Zen 3

#26
Minus Infinity
ADB1979I am glad you clarified your point.

Zen 4 with 3D V-Cache is not far away, for the desktop parts at least.!

IMHO, it is fairly likely that AMD will launch Zen 4 EPYC CPU's with 3D V-Cache models, if so, this gives Intel another kick in the delicates.!

FYI: Here is some info.


@admin @moderators if you do not approve of the video link, please just delete the link and not the whole post, thanks :)
What do you mean likely. It's confirmed they are releasing v-cache models. And it's now confirmed by sources release has been pushed forward to late Q1, early Q2 2023. And reports showing much larger gains compared to what 5800X3D had over 5800X. Also 7950X3D is incoming. I hope 7900X3D is on the roadmap, 12 cores is my sweet spot this time around.
Posted on Reply
#27
blkspade
HenrySomeoneYup, it truly looks as though single thread increase in the realm of 15% is all AMD will muster with zen4... That won't be nearly enough to catch up even with Alder Lake, much less Raptor Lake and it will be hopelessly inferior to Meteor Lake, against which it will most likely compete later in its cycle. Still, they should remain competitive in the server segment for a while due to the really high core count, desktop though...not so much:
Actually, their server CPU's don't boost even remotely as high as the Desktop CPUs. That 17% reported is actually more likely to be at the same clocks. Geekbench is reporting the boost clock as the base, because the 7763 is supposed to have a base of 2.45Ghz. These tests were done on Linux, which seems to always be reporting only the boost clock as if its the base. More often than not server chips quoted speed is what you're getting. Single thread throughput removes a lot of other variables that would skew the differences between Server and Desktop, if the chiplets are otherwise the same. If the Ryzen 7950X can boost to 5.5, that's getting closer to 30% over Ryzen 5950X.
Posted on Reply
#28
ADB1979
Minus InfinityWhat do you mean likely. It's confirmed they are releasing v-cache models. And it's now confirmed by sources release has been pushed forward to late Q1, early Q2 2023. And reports showing much larger gains compared to what 5800X3D had over 5800X. Also 7950X3D is incoming. I hope 7900X3D is on the roadmap, 12 cores is my sweet spot this time around.
Sorry for the confusion, poor choice of wording.

I meant that there is a fair chance that AMD will launch Zen 4 EPYC CPU with V-Cache "at the same time" as the non V-Cache EPYC CPU's as they are (as far as I know) expected to launch sometime in Q1-Q2, and the of course use the same compute dies as the desktop CPU's, so the V-Cache compute dies should be ready at the same time as the EPYC Zen 4 launch. I will edit my original post for clarification.

Having watched all of the above video that I posted, it seems that AMD will (paper) launch Zen 4 with V-Cache when Intel launches (mostly on paper) Raptor Lake. Zen 4 (desktop) with V-Cache CPU's are expected to be on shelves in the first quarter.

:D Exciting times for us tech nerds.
Posted on Reply
#29
InVasMani
Do you think a 4C X3D chip with a larger X3D cache than a 8C X3D chip would be something AMD would consider and a interesting consideration for gaming. I mean if they push 5.5GHz or so across all cores and have a larger sized X3D cache than the 8C version it might be better in some cases and worse in others depending on how multi-threaded the game is and how much performance deficit from that is left on the table.

There are instances where the cache could also be of more importance in a game than the core count as well which 5800X3D kind of already illustrates it beats the same Zen 3 chip cores with fewer of them. We saw a similar scenario with Intel and Broadwell with eDRAM in terms of the cache doing really well in certain tasks gaming in particular can saturate lots of cache access.
Posted on Reply
#30
HenrySomeone
InVasManiDo you think a 4C X3D chip with a larger X3D cache than a 8C X3D chip would be something AMD would consider and a interesting consideration for gaming. I mean if they push 5.5GHz or so across all cores and have a larger sized X3D cache than the 8C version it might be better in some cases and worse in others depending on how multi-threaded the game is and how much performance deficit from that is left on the table.

There are instances where the cache could also be of more importance in a game than the core count as well which 5800X3D kind of already illustrates it beats the same Zen 3 chip cores with fewer of them. We saw a similar scenario with Intel and Broadwell with eDRAM in terms of the cache doing really well in certain tasks gaming in particular can saturate lots of cache access.
That would be a nice ironic twist for a company that's been pushing forward the "moar coars" propaganda for well over a decade now - that their best gaming chip would be a meager, paltry quad core, lol! I mean, it sort of already happened once with the 3300x that was the most competitive against Intel rivals (even though only due to the fact, there were no unlocked i3s). Of course they only made about 15 of them to send to reviewers and soon enough it was permanently MIA, but it was funny nevertheless.
Posted on Reply
#31
ADB1979
InVasManiDo you think a 4C X3D chip with a larger X3D cache than a 8C X3D chip would be something AMD would consider and a interesting consideration for gaming. I mean if they push 5.5GHz or so across all cores and have a larger sized X3D cache than the 8C version it might be better in some cases and worse in others depending on how multi-threaded the game is and how much performance deficit from that is left on the table.

There are instances where the cache could also be of more importance in a game than the core count as well which 5800X3D kind of already illustrates it beats the same Zen 3 chip cores with fewer of them. We saw a similar scenario with Intel and Broadwell with eDRAM in terms of the cache doing really well in certain tasks gaming in particular can saturate lots of cache access.
Not going to happen. The cost of the extra cache would be the same, as is the packaging, and if it uses the same compute dies, they have the same cost as well, there would be no point except to potentially get an extra couple hundred Mhz because of power usage by disabling some cores, which can be done on some overclocking motherboards anyway, so that at least can be tested.

The cache dies that are put on the top of the compute dies are only made in one size right now, it is possible to stack them, and it is likely that will happen to increase the L3 cache even further. However, it is very unlikely that AMD will make different cache dies for their SoC's as the manufacturing complexity would be even higher due to their integrated layer and the extra processing needed to fit the cache dies, as these are CPU's that are aimed at a lower (and cheaper) market segment, one where they already have HALF the amount of L3, this is IMHO very unlikely to happen.

Yes it would be nice to see it happen (just to see the test results, AMD already knows as it would have accurately simulated all of these possibilities), but it's all down to cost vs benefit when it comes to actually manufacturing and selling CPU's. Also remember that AMD doesn't have an unlimited amount of silicon production, so it has to chose wisely what that production is used for.!

Here is something to chew on, (mostly confirmed) rumours are that Zen 5 has quite a "quite changed" cache arrangement, they will be drop in compatible (with BIOS update) with this generation of AM5 motherboards, which likely means that the I/O die will be either the same, or a tweaked / updated version of the 7000 Series, and the compute die(s) will have that "quite changed" cache arrangement. I have no idea what that means right now, we will find out in leaks over the next few months. IMHO, "if" AMD can, they will put the cache dies on the bottom of the compute dies for better thermals, but I have no idea if that is possible, and is not actually required. I am guessing that the changes will be to the L1, L2 and L3 all at once, and likely L3 sharing between dies (essentially pseudo L4 as IBM uses on it's Power CPU's).

Enjoy postulating :D

Also, here is the latest Zen 4 X3D (3D V-Cache) leak +

Enjoy.


Addendum: This video answers some of your questions about extra cache on certain models, 3D V-Cache will be coming to Zen 4 desktop replacement types of laptops "Dragon Range" code name, now to watch the 2nd half of this video...
Posted on Reply
#32
InVasMani
Perhaps they will put a L4 underneath and use the top stacked cache still, but reduce the size to drop latency of it and voltage/heat while the bottom one could be a bit slower and thicker L4. Like a sammich with thicker bread on the bottom slice!
Posted on Reply
#33
ADB1979
InVasManiPerhaps they will put a L4 underneath and use the top stacked cache still, but reduce the size to drop latency of it and voltage/heat while the bottom one could be a bit slower and thicker L4. Like a sammich with thicker bread on the bottom slice!
For some time now I have been thinking that they would put a huge slab of L3/L4 on the I/O die of EPYC/TR CPU's, but that is now a pointless idea now that they have such fast interconnects between compute dies, I/O and now can throw heaps of L3 at the compute dies. My expectation is that they will use a "virtual" L4 by "sharing" the L3 between compute dies. The implications of this are quite unknown, and will IMHO be all about what is being processed, and whether it needs to share that "virtual" L4 between compute dies at all :confused:
Posted on Reply
#34
InVasMani
They might already do L3 sharing for stacked cache for ThreadRipper Pro/EYPC not sure. I don't even recall if TR Pro has X3D yet, but they do have a lot of cache either way. I can see them making cubes/rectangles that have I/O connectivity on each edge for other chiplet's to connect with and share with a bit of mesh bus access.
Posted on Reply
#35
Panther_Seraphin
ADB1979My expectation is that they will use a "virtual" L4 by "sharing" the L3 between compute dies. The implications of this are quite unknown, and will IMHO be all about what is being processed, and whether it needs to share that "virtual" L4 between compute dies at all :confused:
Latency!!! Lots and lots of Latency in comparison to a True L3 or L4 as a job that has memory in cache of a seperate die for example would have to go out across the I/O die to recover it.

Is this faster than going out to system RAM? Yes
Can this cause more Cache misses etc? Yes!

Its why I have sort of been surprised this 2nd Gen I/O die hasnt had HBM provisions to act as a large L4 cache. Maybe for the 4th Gen Threadripper pro/Epyc.
Posted on Reply
#36
Punkenjoy
Each time you do a cache lookup, you add latency on your main memory requests. so it must be faster than the actual memory. You also need some mechanism to decide what to cache and what to get rid of.

HBM as L4 might not be the best since it's not really a low latency memory. It's a high bandwdith. And to get your high bandwidth, you need a lot of concurrent access to get the maximum bandwidth. This is why it's more looked at in high core counts cpu servers or GPU.

Would a L4 cache in the I/O die would really be useful, that depend, it would be complicated. If the goal is to save a roundtrip to memory when there is data on the other CCD chips, that would means that the L4 would be inclusive with L3. So when you write data into L3, you would have to also write it at the same time on L4. This could slow down L3 quite a lot. It will be a matter of tradeoff.

The other thing is the I/O die is made on a older node, adding a large amount of sram would increase the die size. It could be added as 3d-vcache maybe. but that would be a significant increase in cost too.
Posted on Reply
#37
Panther_Seraphin
PunkenjoyEach time you do a cache lookup, you add latency on your main memory requests. so it must be faster than the actual memory. You also need some mechanism to decide what to cache and what to get rid of.

HBM as L4 might not be the best since it's not really a low latency memory. It's a high bandwdith. And to get your high bandwidth, you need a lot of concurrent access to get the maximum bandwidth. This is why it's more looked at in high core counts cpu servers or GPU.

Would a L4 cache in the I/O die would really be useful, that depend, it would be complicated. If the goal is to save a roundtrip to memory when there is data on the other CCD chips, that would means that the L4 would be inclusive with L3. So when you write data into L3, you would have to also write it at the same time on L4. This could slow down L3 quite a lot. It will be a matter of tradeoff.

The other thing is the I/O die is made on a older node, adding a large amount of sram would increase the die size. It could be added as 3d-vcache maybe. but that would be a significant increase in cost too.
3DVCache is most likely going to be the only option in the consumer space. Just due to the cost of HBM and also the negligable benefits HBM would have in most situations (just look at 5800x3d reviews for example)

But for Datacenter uses and feeding 96 Cores I can see HBM being beneficial on top of the 3DVcache. Being able to have multiple Gb of L4 cache would be beneficial for database/mathematical simulations etc especially with HBM2e and theoretically HMB3

I suspect the I/O die would have to be completely redesigned to take advantage of it as HBM is extremtley Pin dense to get the advantages of the bandwidth.
Posted on Reply
Add your own comment
Dec 26th, 2024 00:48 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts