"Trounce" is a bit extreme, plus, whatever advantages it has they most likely come from the huge caches and not because of the core architecture itself.
That's just for the SoC ? Then the memory and package power can easily reach a good chunk of that in addition to the 40-60W. That doesn't really matter, I just wonder what's the point in having an SoC at this stage.
They didnt say whether it's SoC or package power, but given that the RAM is on the same package I would kind of expect them to be counted together. Either way, how much can this be? If a 256-bit GDDR5 bus is about 20W, I would expect the same in LPDDR5 to be way less than 10, especially when mounted on package like this.
And how is "trounce" extreme when we're talking about a >50% IPC advantage? And obviously the caches play a huge part in that, especially how they somehow manage much, much lower latencies than everyone else. That doesn't make it any less impressive though.
24MB L2 is the P-core LLC. LLC is not shared by P and E cores, they each have dedicated L2.
View attachment 221410
Oh, no, that's not the LLC. LLC is SoC-wide L3(ish) where cores, ML cores and likely gpu all have access, illustrated with that 3*8 block grid to the lower right of the cores in that diagram. If the L2 is 24MB, I would expect the LLC to be far, far larger than that given the relative size in the diagram. 128MB? 256?
Edit:
the A15 has a 32MB LLC. I'm thinking 256MB now.
They also massive caches which negates some of the disadvantages of having IF, besides they're designed for workstations & servers so they're kinda made for different loads.
Maybe maybe not, we can't do a truly apples to apples comparison here without a monolithic AMD APU having unified memory subsystem ~ that IMO is the biggest gamechanger!
AMD & Intel have long talked about unified memory (CPU+GPU) for nearly half a decade now, even more for AMD, & yet Apple is the one that stole the show.
It's a bit strange for you to bring up the Epyc/TR comparison just to then say it's not a valid comparison once people get into why this is likely to be more efficient. LPDDR5 is much lower power than a heap of IF links - but also much lower bandwidth, of course. Apple makes up for this with huge and fast caches, keeping memory accesses to a minimu, while the monolithic architecture and low core counts lets them stick to relatively efficient and low power on-die interconnects.
What's the difference? Is the memory "truly unified" only if memory access is governed by a single MMU for both CPU and GPU?
Truly unified means everything can access the same data equally, with no copies needed. That is a major performance benefit and power savings.
I mean... its called the PS5 / XBox Series X.
I'm pretty sure they have unified memory. Hell, CUDA + CPU / OpenCL + CPU has unified memory. Its just emulated over PCIe. PS5 / XBox Series X actually have the same, literal RAM work for the iGPU side and CPU side.
No, at least MS has explicitly stated how their memory is split between OS/CPU software/GPU.