AMD Zen 2 12-Core, 24-Thread Matisse CPU Spotted in UserBenchmark

TheLaughingMan · Jan 25, 2019

Imsochobo said:
it auto submits.

And they know that. The can test this stuff in house and never show anything, but AMD wants and needs us to be talking about their new CPU. We have months before they are available so this is what they do. Oh we just so happened to use some software that posting a score. They will do the same with a real performance test paried with a Radeon VII and top specs close to release.

efikkan · Jan 25, 2019

All "leaks" like this that are genuine are intentional.

Darmok N Jalad · Jan 26, 2019

It’s quite possible it’s intentional. It might be another case of a demo to compare to Intel’s best options. The CES demo was matching a 9900K’s performance for 50W less TDP. This one appears to match a 10C/20T 9900X in multicore for 60W less....on single channel DDR4 no less. Of course it’s just the one benchmark, but it also doesn’t appear to be AMD’s strongest possible effort based on the design.

efikkan · Jan 26, 2019

I put this Zen 2 12-core up against the i9-9900X (10-core) in my table in post #6 for a reason; there are rumors of a stop-gap "Comet Lake" 10-core socket 1151 CPU until Ice Lake is fully here. If true, this will probably have similar characteristics of i9-9900X with some tweaks, and is probably the CPU the 12-core Zen 2 will be competing against.

While I do expect a final 12-core Zen 2 to be slightly higher clocked and get slightly better single and quad core scores, and the Zen 2 to have the upper hand in energy efficiency, I don't expect there to be a 60W difference in TDP. Let's hope Intel at least ditches the integrated graphics, it has nothing to do in a 10-core.

I do wonder though what place these CPUs deserve in the market. Don't get me wrong, options are fine and while they look compelling, what market demand do they serve?
It's obviously not gaming. And many heavy multithreaded workloads are also consuming of memory bandwidth. I guess these are relevant for people looking for a "HEDT lite", perhaps image editing or coding, but probably not heavy encoding or simulations. Personally I would probably not consider these high-core "mainstream" CPUs from either company, as I value the flexibility for more memory bandwidth and capacity, and when investing this much money anyway, the expandability of the platform is also something to consider.

Darmok N Jalad · Jan 26, 2019

Skylake-X doesn’t have integrated graphics already. It also uses quad channel memory. Being a different design, I’m now sure how easy it will be for Intel to get it into their desktop socket. And that TDP value is for sustained all-core speed at base clock—turbo will take it way over 165W. I do actually expect there to be a big delta in TDP, as Zen2 is on 7nm and Skylake-X is 14nm. AMDs individual cores are actually pretty efficient—it’s the InfinityFabric that consumes a fair amount of power, especially when there is more than one CCX on a CPU. I’m curious if the chiplet design of Zen2 will improve this.

efikkan · Jan 26, 2019

Darmok N Jalad said:
Skylake-X doesn’t have integrated graphics already. It also uses quad channel memory. Being a different design, I’m now sure how easy it will be for Intel to get it into their desktop socket. And that TDP value is for sustained all-core speed at base clock—turbo will take it way over 165W. I do actually expect there to be a big delta in TDP, as Zen2 is on 7nm and Skylake-X is 14nm.

A potential "Comet Lake" 10-core on socket 1151 will probably be an extended Skylake (non-X) design like the recent 8-core Coffee Lake refresh. My reasoning behind this is that the Skylake-X/SP 10-core design have a 6-channel memory controller, 2× UPI-links and AVX-512, all of which are "wasted" die space compared to the competition. But who knows, they have done strange things in the past.

hat · Jan 27, 2019

One might wonder whether the single stick of abysmally slow DDR4 has to do with showing off the CPU performance by using a terrible stick of RAM, or if they had to use that RAM to get the system to run...

efikkan · Jan 27, 2019

DDR4-2667 isn't "abysmally slow".
I wouldn't read too much into the specifics of the setup as an indicator of problems with the platform. It's highly likely that this is some kind of validation setup, and the BIOS which was just a few days old might be an indicator of a BIOS or motherboard testing lab.

ghazi · Jan 27, 2019

lexluthermiester said:
That might be possible. Theoretically, that arrangement could have logistical, time and cost savings.

Correct me if I'm wrong, but I don't believe it's possible to have the L3 off-die with the current CCX design. My understanding is that the L3 is a fundamental part of making a CCX a CCX. Each has its own shared pool of L3 victim cache which is fully inclusive of the L2; cores within a CCX use the L3 to communicate with one another, and CCX's communicate with each other by transferring data between each others' L3 pools. Then there's the issue of latency which is already not very good with CCX design. We'll see though, it would make some amount of sense as caches are very costly in terms of die area and don't benefit much from node shrinks, and I suppose having the CCX's all use a single unified L3 could be considered an architectural advancement.

Darmok N Jalad · Jan 27, 2019

ghazi said:
Correct me if I'm wrong, but I don't believe it's possible to have the L3 off-die with the current CCX design. My understanding is that the L3 is a fundamental part of making a CCX a CCX. Each has its own shared pool of L3 victim cache which is fully inclusive of the L2; cores within a CCX use the L3 to communicate with one another, and CCX's communicate with each other by transferring data between each others' L3 pools. Then there's the issue of latency which is already not very good with CCX design. We'll see though, it would make some amount of sense as caches are very costly in terms of die area and don't benefit much from node shrinks, and I suppose having the CCX's all use a single unified L3 could be considered an architectural advancement.

This is correct, at least with current Zen. We don’t really know how Zen 2 works yet. However, I’d be very surprised if Zen 2 deviated from this, as that would not only be a pretty significant design change, but it would also likely add latency to the the L3 and intercore communication, resulting in IPC decreases.

efikkan · Jan 27, 2019

ghazi said:
Then there's the issue of latency which is already not very good with CCX design. We'll see though, it would make some amount of sense as caches are very costly in terms of die area and don't benefit much from node shrinks, and I suppose having the CCX's all use a single unified L3 could be considered an architectural advancement.

Well, cache is one of the things that benefits the most from die shrinks. Cache is on the least thermally intensive end of the scale, which means it can be packed tighter, compared to FPUs, ALUs and register files which are on the opposite end of the scale. When it comes to packing cache tight, it comes more down to placement in terms of latency and in relation to the other parts of the design that needs to interact with it. So cache has traditionally been challenging to place due to its size and the increasing core complexity, but shrinks should generally help this.

I'm not sure making a unified L3 cache for several chiplets is a good idea in general, but not primarily because of latency as Darmok N Jalad mentioned, but because of the way L3 works. As you said, L3 is a victim cache, and in most designs it's an inclusive cache. There is a reason why Skylake-X changed this, because it's very inefficient use of die space, and it also means that increasing L2 will also decrease the efficiency of L3. As you probably know, modern CPUs typically split L1 cache into instruction and data caches, while L2 and L3 is both. And while L3 cache is shared between multiple cores, the actual sharing is commonly very minimal. The entire cache is overwritten every few microseconds, so the chance of two cores needing data from the same cache line is very minimal, because when you have multiple threads working, they have to use separate data, otherwise they would stall all the time. So the only thing that is generally shared between cores is instructions, if the cores are executing the same part of the code of course. And the few times the times the L3 victim cache is useful for data, it's usually from the same core that evicted it. So to sum up, L3 is largely wasteful in its current application, and only gives minor benefits.

I think it's time to re-evaluate L3 cache's role, and the changes Intel did in Skylake-X is probably just the beginning. Perhaps a split L3 cache, or instructions only L3 cache? Perhaps L3 shouldn't be shared and be data only, but L4 be instructions only and shared?

System Name	ODIN
Processor	AMD Ryzen 7 5800X
Motherboard	Gigabyte B550 Aorus Elite AX V2
Cooling	Dark Rock 4
Memory	G Skill RipjawsV F4 3600 Mhz C16
Video Card(s)	MSI GeForce RTX 3080 Ventus 3X OC LHR
Storage	Crucial 2 TB M.2 SSD :: WD Blue M.2 1TB SSD :: 1 TB WD Black VelociRaptor
Display(s)	Dell S2716DG 27" 144 Hz G-SYNC
Case	Fractal Meshify C
Audio Device(s)	Onboard Audio
Power Supply	Antec HCP 850 80+ Gold
Mouse	Corsair M65
Keyboard	Corsair K70 RGB Lux
Software	Windows 10 Pro 64-bit
Benchmark Scores	I don't benchmark.

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Mac mini
Processor	Apple M1 8C
Motherboard	Mac mini logic board
Cooling	Mac mini cooler
Memory	16GB
Video Card(s)	M1 GPU
Storage	512GB
Display(s)	ASUS Pro Art 27"
Case	Mac mini enclosure
Power Supply	Apple 150W

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

System Name	Mac mini
Processor	Apple M1 8C
Motherboard	Mac mini logic board
Cooling	Mac mini cooler
Memory	16GB
Video Card(s)	M1 GPU
Storage	512GB
Display(s)	ASUS Pro Art 27"
Case	Mac mini enclosure
Power Supply	Apple 150W

AMD Zen 2 12-Core, 24-Thread Matisse CPU Spotted in UserBenchmark

TheLaughingMan

efikkan

Darmok N Jalad

efikkan

Darmok N Jalad

efikkan

hat

Enthusiast

efikkan

ghazi

Darmok N Jalad

efikkan

Similar threads

System Name	Starlifter :: Dragonfly
Processor	i7 2600k 4.4GHz :: i5 10400
Motherboard	ASUS P8P67 Pro :: ASUS Prime H570-Plus
Cooling	Cryorig M9 :: Stock
Memory	4x4GB DDR3 2133 :: 2x8GB DDR4 2400
Video Card(s)	PNY GTX1070 :: Integrated UHD 630
Storage	Crucial MX500 1TB, 2x1TB Seagate RAID 0 :: Mushkin Enhanced 60GB SSD, 3x4TB Seagate HDD RAID5
Display(s)	Onn 165hz 1080p :: Acer 1080p
Case	Antec SOHO 1030B :: Old White Full Tower
Audio Device(s)	Creative X-Fi Titanium Fatal1ty Pro - Bose Companion 2 Series III :: None
Power Supply	FSP Hydro GE 550w :: EVGA Supernova 550
Software	Windows 10 Pro - Plex Server on Dragonfly
Benchmark Scores	>9000