• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD 64-core EPYC "Milan" Based on "Zen 3" Could Ship with 3.00 GHz Clocks

btarunr

Editor & Senior Moderator
Staff member
Joined
Oct 9, 2007
Messages
47,443 (7.50/day)
Location
Hyderabad, India
System Name RBMK-1000
Processor AMD Ryzen 7 5700G
Motherboard ASUS ROG Strix B450-E Gaming
Cooling DeepCool Gammax L240 V2
Memory 2x 8GB G.Skill Sniper X
Video Card(s) Palit GeForce RTX 2080 SUPER GameRock
Storage Western Digital Black NVMe 512GB
Display(s) BenQ 1440p 60 Hz 27-inch
Case Corsair Carbide 100R
Audio Device(s) ASUS SupremeFX S1220A
Power Supply Cooler Master MWE Gold 650W
Mouse ASUS ROG Strix Impact
Keyboard Gamdias Hermes E2
Software Windows 11 Pro
AMD's 3rd generation EPYC line of enterprise processors that leverage the "Zen 3" microarchitecture, could innovate in two directions - towards increasing performance by doing away with the CCX (compute complex) multi-core topology; and taking advantage of a newer/refined 7 nm-class node to increase clock-speeds. Igor's Lab decoded as many as three OPNs of the upcoming 3rd gen EPYC series, including a 64-core/128-thread part that ships with frequency of 3.00 GHz. The top 2nd gen EPYC 64-core part, the 7662, ships with 2.00 GHz base frequency and 3.30 GHz boost; and 225 W TDP. AMD is expected to unveil its "Zen 3" microarchitecture within 2020.



View at TechPowerUp Main Site
 
Joined
Feb 15, 2019
Messages
1,681 (0.77/day)
System Name Personal Gaming Rig
Processor Ryzen 7800X3D
Motherboard MSI X670E Carbon
Cooling MO-RA 3 420
Memory 32GB 6000MHz
Video Card(s) RTX 4090 ICHILL FROSTBITE ULTRA
Storage 4x 2TB Nvme
Display(s) Samsung G8 OLED
Case Silverstone FT04
Base 1.6 Boost 3.0 ?
 
Joined
Jan 8, 2017
Messages
9,642 (3.28/day)
System Name Good enough
Processor AMD Ryzen R9 7900 - Alphacool Eisblock XPX Aurora Edge
Motherboard ASRock B650 Pro RS
Cooling 2x 360mm NexXxoS ST30 X-Flow, 1x 360mm NexXxoS ST30, 1x 240mm NexXxoS ST30
Memory 32GB - FURY Beast RGB 5600 Mhz
Video Card(s) Sapphire RX 7900 XT - Alphacool Eisblock Aurora
Storage 1x Kingston KC3000 1TB 1x Kingston A2000 1TB, 1x Samsung 850 EVO 250GB , 1x Samsung 860 EVO 500GB
Display(s) LG UltraGear 32GN650-B + 4K Samsung TV
Case Phanteks NV7
Power Supply GPS-750C
Obviously, since this is stepping A0 it means it's the first tape out, meaning it's a waste of time to look at clock speeds.

Anyway, Zen 3 it's clearly on it's way to the EPYC line, meanwhile still no Xeon competitor in sight.
 
Joined
Nov 4, 2005
Messages
12,076 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
The CCX is just going to be 8 cores instead of 4. They cannot ”do away with the CCX” unless they go for monolithic dies.


Considering the latency penalty for the CCX and node refinement they could put the CCX on multiple dies and stack them together at higher speed.
 
Joined
Oct 15, 2019
Messages
588 (0.30/day)
Considering the latency penalty for the CCX and node refinement they could put the CCX on multiple dies and stack them together at higher speed.
They would still have CCX based architecture unless they unify the L3 cache. CCX = group of processor cores that share L3 cache.
 
Joined
Oct 12, 2005
Messages
724 (0.10/day)
They would still have CCX based architecture unless they unify the L3 cache. CCX = group of processor cores that share L3 cache.

Yes this is true. But the main thing about CCX is how Inter-core communication is done. In a CCX, the core to core communication can be done directly where between CCX, the data have to go thru the Infinity Fabric to the I/O die back to the Other CCX.

This lead to greatly increased latency. If we ignore thermal, frequency and other stuff, a CPU with 2x 4+0 CCD or chiplet will behave the same way as a 4+4 single CCD/chiplet.

A full die is considered a CCD. Since not only you remove the requirement to use the Infinity Fabric but also merge the L3 cache, you end up removing at all the concept of CCX to only keep a 8 core CCD. Altought there, it's more playing with the words. The whole concept of CCX is to split a CCD. if the CCD is no longer split, what is the CCX?

Don't really matter in the end but the good things we will get rid of the inter-core latency between CCX. That is the main thing.

A Shared L3 cache will probably help if the latency isn't too much affected. The problem with bigger cache is the larger it is, the longer it take to perform the cache lookup, hence reducing the latency.

The rumors are that AMD will use a new techology using hash for managing larger cache. We will see.


For workload like Blender render, Cinebench or other highly multithreadable application, the inter-core latency have probably low impact because there isn't much to do anyway. But for things like video game that use a lot of core, it's very probable that there will be significative gains from that change alone.
 
Joined
Oct 15, 2019
Messages
588 (0.30/day)
Yes this is true. But the main thing about CCX is how Inter-core communication is done. In a CCX, the core to core communication can be done directly where between CCX, the data have to go thru the Infinity Fabric to the I/O die back to the Other CCX.

This lead to greatly increased latency. If we ignore thermal, frequency and other stuff, a CPU with 2x 4+0 CCD or chiplet will behave the same way as a 4+4 single CCD/chiplet.

A full die is considered a CCD. Since not only you remove the requirement to use the Infinity Fabric but also merge the L3 cache, you end up removing at all the concept of CCX to only keep a 8 core CCD. Altought there, it's more playing with the words. The whole concept of CCX is to split a CCD. if the CCD is no longer split, what is the CCX?

Don't really matter in the end but the good things we will get rid of the inter-core latency between CCX. That is the main thing.

A Shared L3 cache will probably help if the latency isn't too much affected. The problem with bigger cache is the larger it is, the longer it take to perform the cache lookup, hence reducing the latency.

The rumors are that AMD will use a new techology using hash for managing larger cache. We will see.


For workload like Blender render, Cinebench or other highly multithreadable application, the inter-core latency have probably low impact because there isn't much to do anyway. But for things like video game that use a lot of core, it's very probable that there will be significative gains from that change alone.
CCX is a memory topology concept while CCD is a physical component. Both can be of the same size and them being the same size does in no way invalidate the concept of CCX. Of cource for a 8 core gaming pc this means no more dealing with CCX peculiarities, but for this news items 64 core EPYCs it makes only a small-ish difference of 8 CCX’s vs 16. The change has more to do with effectively having more cache as less of data needs to be copied into multiple L3 caches in some workloads. Also for single/low threaded tasks you effectively double the amount of cache available.
 
Top