Tuesday, July 9th 2024

AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

Apparently the AMD "Strix Halo" processor is real, and it's large. The chip is designed to square off against the likes of the Apple M3 Pro and M3 Max, in letting ultraportable notebooks have powerful graphics performance. A chiplet-based processor, not unlike the desktop socketed "Raphael," and mobile BGA "Dragon Range," the "Strix Halo" processor consists of one or two CCDs containing CPU cores, wired to a large die, that's technically the cIOD (client I/O die), but containing an oversized iGPU, and an NPU. The point behind "Strix Halo" is to eliminate the need for a performance-segment discrete GPU, and conserve its PCB footprint.

According to leaks by Harukaze5719, a reliable source with AMD leaks, "Strix Halo" comes in a BGA package dubbed FP11, measuring 37.5 mm x 45 mm, which is significantly larger than the 25 mm x 40 mm size of the FP8 BGA package that the regular "Strix Point," "Hawk Point," and "Phoenix" mobile processors are built on. It is larger in area than the 40 mm x 40 mm FL1 BGA package of "Dragon Range" and upcoming "Fire Range" gaming notebook processors. "Strix Halo" features one or two of the same 4 nm "Zen 5" CCDs featured on the "Granite Ridge" desktop and "Fire Range" mobile processors, but connected to a much larger I/O die, as we mentioned.
At this point, the foundry node of the I/O die of "Strix Halo" is not known, but it's unlikely to be the same 6 nm node as the cIOD that AMD has been using on its other client processors based on "Zen 4" and "Zen 5." It wouldn't surprise us if AMD is using the same 4 nm node as it did for "Phoenix," for this I/O die. The main reason an advanced node is warranted, is because of the oversized iGPU, which features a whopping 20 workgroup processors (WGPs), or 40 compute units (CU), worth 2,560 stream processors, 80 AI accelerators, and 40 Ray accelerators. This iGPU is based on the latest RDNA 3.5 graphics architecture.

For perspective, the iGPU of the regular 4 nm "Strix Point" processor has 8 WGPs (16 CU, 1,024 stream processors). Then there's the NPU. AMD is expected to carry over the same 50 TOPS-capable XDNA 2 NPU it uses on the regular "Strix Point," on the I/O die of "Strix Halo," giving the processor Microsoft Copilot+ capabilities.

The memory interface of "Strix Halo" has for long been a mystery. Logic dictates that it's a terrible idea to have 16 "Zen 5" CPU cores and a 40-Compute Unit GPU share even a regular dual-channel DDR5 memory interface at the highest possible speeds, as both the CPU and iGPU would be severely bandwidth-starved. Then there's also the NPU to consider, as AI inferencing is a memory-sensitive application.

We have a theory, that besides an LPDDR5X interface for the CPU cores, the "Strix Halo" package has wiring for discrete GDDR6 memory. Even a relatively narrow 128-bit GDDR6 memory interface running at 20 Gbps would give the iGPU 320 GB/s of memory bandwidth, which is plenty for performance-segment graphics. This would mean that besides LPDDR5X chips, there would be four GDDR6 chips on the PCB. The iGPU even has 32 MB of on-die Infinity Cache memory, which seems to agree with our theory of a 128-bit GDDR6 interface exclusively for the iGPU.
Sources: Harukaze_5719 (Twitter), Olrak29 (Twitter), VideoCardz
Add your own comment

40 Comments on AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

#1
ymdhis
Man I want this in a desktop socket.

Also earlier rumors said the memory was DDR5 256bit.
Posted on Reply
#2
Wirko
This is a GPU with a CPU hanging off the side. Could this concept be usable for building discrete GPUs too? "Reverse accelerated processing unit"?
Posted on Reply
#3
Neo_Morpheus
Could this be some variation of the chip used in current xbox/ps5 consoles?

And since Ngreedia is getting away with murder (pricing wise) with the halo 4090, I wonder if AMD is going for the same “motif” given the Halo word in the name?

I hope is not strictly a gaming device and instead also used in a more professional segments/devices.
ymdhisMan I want this in a desktop socket.
Or at the very least, in a mini pc, from Minisforums, for example.
Posted on Reply
#4
R0H1T
In current consoles? Nope, they're zen3 (zen2?) IIRC.
Posted on Reply
#5
Wirko
ymdhisMan I want this in a desktop socket.

Also earlier rumors said the memory was DDR5 256bit.
Only the Threadripper socket could be a candidate for that.
Posted on Reply
#6
Daven
ymdhisMan I want this in a desktop socket.

Also earlier rumors said the memory was DDR5 256bit.
The rumors still say 256-bit 8000 LPDDRX. Not sure why TPU now says 128 bit because of the socket size.

Edit: Oh the article is saying 128-bit GDDR6 not LPDDRX. Now that would be cool.
Posted on Reply
#7
Chrispy_
16-core Zen5 with GPU performance in the ballpark of a 6750XT? (laptop TDP limitations will likely offset any RDNA 3.5 advantages, since RDNA3.5 is just RDNA3 with an NPU bolt-on, and RDNA3 gave us very little IPC over RDNA2!)

That's still a very CPU-weighted config for gamers who honestly won't want to overpay for 8-10 cores they'll never use, and most likely the only configuration that will include the full-fat 40CU GPU component.

The sensible Ryzen7 or Ryzen5 variants that have 6-8 cores games need will likely come with crippled 32CU or 28CU GPUs in them which is acceptable, but not worth much - 6600S (28CU RDNA2 dGPU) laptops were occupying the sub-$1000 entry-level bargain bin 18 months ago. They're fine for casual gaming and esports but hardly what I'd call bleeding edge and already struggling in plenty of modern titles at 1080p.

Non-gamers likely aren't interested because no matter how good AMD's GPU compute performance is, they don't use CUDA which is a massive gatekeeper for the entire productivity industry, and 4060 laptops are cheap, even available in thin-and-lights that pull a mere 120W from the wall outlet. I don't see Strix Halo competing well with those, especially since the slides here indicate a potential 175W power draw so that's definitely not going to be a thin-and-light laptop.
Posted on Reply
#8
Daven
Chrispy_16-core Zen5 with GPU performance in the ballpark of a 6750XT? (laptop TDP limitations will likely offset any RDNA 3.5 advantages, since RDNA3.5 is just RDNA3 with an NPU bolt-on, and RDNA3 gave us very little IPC over RDNA2!)

That's still a very CPU-weighted config for gamers who honestly won't want to overpay for 8-10 cores they'll never use, and most likely the only configuration that will include the full-fat 40CU GPU component.

The sensible Ryzen7 or Ryzen5 variants that have 6-8 cores games need will likely come with crippled 32CU or 28CU GPUs in them which is acceptable, but not worth much - 6600S (28CU RDNA2 dGPU) laptops were occupying the sub-$1000 entry-level bargain bin 18 months ago. They're fine for casual gaming and esports but hardly what I'd call bleeding edge and already struggling in plenty of modern titles at 1080p.

Non-gamers likely aren't interested because no matter how good AMD's GPU compute performance is, they don't use CUDA which is a massive gatekeeper for the entire productivity industry, and 4060 laptops are cheap, even available in thin-and-lights that pull a mere 120W from the wall outlet. I don't see Strix Halo competing well with those, especially since the slides here indicate a potential 175W power draw so that's definitely not going to be a thin-and-light laptop.
You're picking and choosing the markets for the failure of this product by meandering around DIY desktop gamers and CUDA workstation users. Of course, these two user groups won't buy Strix Halo. But DIY desktop gamers represent < 1% of the desktop/laptop market. CUDA workstation users are a different market all together and should never be mentioned here.

Let's see how Strix Halo performs and which products are built around it. I have my own ideas of how a fat CPU/GPU SoC can be used but the market is way more creative.
Posted on Reply
#9
TumbleGeorge
Yes well, already many myths and legends have spread about this product. There must already be misled people who would drop thousands of dollars to own it because they read someone's comments. And what if the product turns out to be mediocre?
Posted on Reply
#10
R0H1T
Mediocre how?
TumbleGeorgeThere must already be misled people who would drop thousands of dollars to own it because they read someone's comments.
As always wait for multiple reviews!
Posted on Reply
#11
mikesg
TumbleGeorgeYes well, already many myths and legends have spread about this product. There must already be misled people who would drop thousands of dollars to own it because they read someone's comments. And what if the product turns out to be mediocre?
Strix Point (RDNA 3.5) / 890M 16CU (in a GPD Duo) has scored in the region of a RTX 3050 in Time Spy.

With more than double the CU's and TDP headroom it's easily a 4050-4060-5050 competitor.

Strix Point is more for thin laptop/mini PC. Strix Halo would suit desktop a lot more. Within one day you would go shopping for the biggest cooler.
Posted on Reply
#12
ToTTenTranz
btarunrWe have a theory, that besides an LPDDR5X interface for the CPU cores, the "Strix Halo" package has wiring for discrete GDDR6 memory. Even a relatively narrow 128-bit GDDR6 memory interface running at 20 Gbps would give the iGPU 320 GB/s of memory bandwidth, which is plenty for performance-segment graphics. This would mean that besides LPDDR5X chips, there would be four GDDR6 chips on the PCB. The iGPU even has 32 MB of on-die Infinity Cache memory, which seems to agree with our theory of a 128-bit GDDR6 interface exclusively for the iGPU.
Strix Halo's memory controller has shown to be 256bit LPDDR5X since the first leak. Probably LPDDR5X 8000, so 256GB/s unified.

There's no GDDR6 there, but there's 32MB Infinity Cache for the iGPU.
ymdhisMan I want this in a desktop socket.
Why? It'll probably be cheaper and faster to get a 12/16-core Ryzen 9 with a discrete GPU. Especially if you wait for RDNA4.
mikesgWith more than double the CU's and TDP headroom it's easily a 4050-4060-5050 competitor.
It's expected to have RTX 4070 Laptop performance (desktop RTX 4060 Ti chip) but without the 8GB VRAM limitation.
In fact, Strix Halo is probably only going to appear in expensive laptops with 32GB LPDDR5X or more, as there's been shipping manifestos with Strix Halo test units carrying 128GB RAM.
Posted on Reply
#13
Neo_Morpheus
R0H1TMediocre how?
Simple, by being an AMD product.
/s

I simply dont understand the automatic hate that all AMD products get, accompanied by false statements.
Posted on Reply
#14
ADB1979
mikesgStrix Point (RDNA 3.5) / 890M 16CU (in a GPD Duo) has scored in the region of a RTX 3050 in Time Spy.
Please drop a link with the leaked information on it, thanks.
Posted on Reply
#15
Sound_Card
As much as I am hyped for Strix Halo, it does make me wonder why not make a APU that is 60cu and 12 cores? or 80cu and 8 cores? Do we really need 16cores 32threads for gaming? They could easily market a 'gamer 3D APU' that is 8 cores, 60cu, with 3D cache and Infinity Cache. The mini PC market would go absolute bonkers.
WirkoThis is a GPU with a CPU hanging off the side. Could this concept be usable for building discrete GPUs too? "Reverse accelerated processing unit"?
Posted on Reply
#16
ADB1979
Sound_CardAs much as I am hyped for Strix Halo, it does make me wonder why not make a APU that is 60cu and 12 cores? or 80cu and 8 cores? Do we really need 16cores 32threads for gaming? They could easily market a 'gamer 3D APU' that is 8 cores, 60cu, with 3D cache and Infinity Cache. The mini PC market would go absolute bonkers.
The first answer is the chip size and power consumption and also therefore cooling. As noted in this article the package is already massive, the chip that contains the graphics already dwarfs the chiplets that contain the CPU cores so you get an idea of how much space you would save, which is nowhere near enough to add that much more in the the way of GPU, which in turn will need more memory bandwidth.

If you put all of this together you soon find out why AMD ended up with the design they did that isn't going to be insanely expensive, so will actually end up with mass market appeal whilst doing a solid job of making a product that is designed specifically to not have a discrete GPU alongside it, thus eliminating the sale of a nVidia GPU, and making a product that beats anything intel has to offer.

This is looking like a good product to me, and much as I had assumed already, leaks are suggesting more and more that this is the first in a whole line of products.! The 265-bit bus laptop/desktop CPU/APU's are just over the horizon and IMHO, this is why Strix Halo is the product I am most interested in seeing this year, not least because of some very interesting use cases that look very promising, what OEM's will do with it, what mini-desktop PC's will look like on the inside, what the public want to see from version 2, and ultimately where AMD decides to take version 2, 3, etc.

I almost forgot to say that there will be variants that have fewer than 16 cores and fewer than 40CU's of GPU performance and yes there are already lots of people calling for a single CCD version ideally with 3D V-Cache if that is possible and a fully enabled 40CU GPU because that would be fantastic for gaming, but others are looking for a 256GB Strix Halo laptop fully enabled (16c 40CU) because they simply need it.

Also remember that this is essentially a new market and AMD has some choices to make, no doubt they are even reading comments like this in forums to get an idea of what people want and what people expect. As much as people (myself included) often laugh at marketing, it is important to launch the right product in the right segment at the right price, to keep customers happy and buying by providing the products that people actually want which is rarely the top models.
Posted on Reply
#17
Carillon
Sound_CardAs much as I am hyped for Strix Halo, it does make me wonder why not make a APU that is 60cu and 12 cores? or 80cu and 8 cores? Do we really need 16cores 32threads for gaming? They could easily market a 'gamer 3D APU' that is 8 cores, 60cu, with 3D cache and Infinity Cache. The mini PC market would go absolute bonkers.
If they put only one CCD the chip would be less mechanically robust, corners are weakspots, a rectangle has 4 corners, a big IO die adjacent a single CCD would have 6
Posted on Reply
#18
Chrispy_
DavenYou're picking and choosing the markets for the failure of this product by meandering around DIY desktop gamers and CUDA workstation users. Of course, these two user groups won't buy Strix Halo. But DIY desktop gamers represent < 1% of the desktop/laptop market. CUDA workstation users are a different market all together and should never be mentioned here.

Let's see how Strix Halo performs and which products are built around it. I have my own ideas of how a fat CPU/GPU SoC can be used but the market is way more creative.
My experience and exposure is laptop gamers, and laptop creatives who work with Adobe PS/Premiere, Davinci, and the many various AI tools now popping up that need CUDA. I'm sure there are many more use cases than that, but every single one of those demographics will be better off with an Nvidia GPU from either a performance/Watt or API compatibility perspective. If you need multi-threaded CPU compute but not CUDA then it's a demographic I'm not familiar with - not to say that it doesn't exist. The Music industry is one but it's more concerned with DPC latency, not multi-threaded workloads needing a Ryzen 9 instead of a Ryzen 5 or 7.

The number of things that use OpenCL to any success these days is dwindling by the day, and ROCm support is a noble attempt but its influence so far on the software market is somewhere between "absolute zero" and "too small to measure". It's why Nvidia is now the most valuable company on earth, bar none. I certainly don't like that fact, but it's the undeniable truth.
Posted on Reply
#19
alwayssts
TPU's speculative analysis of the 128-bit/128-bit split between DDR and GDDR is quite astute, and while I've never seen it mentioned before...makes a ton of sense. I hope this to be the case.

I kept wondering how this product makes any sense, given the bw is so low and the cache does not appear to have changed versus RDNA3.

FWIW, I really like the sporadic "TPU has a theory" last paragraph inclusion that has been included in some news articles as of late.

If it were me, I too would include a last paragraph with an editorial (perhaps italicized and/or with an asterisk). I think this is very good writing that encompasses both available info and what we know it needs.

Thanks for the personal insight (along with the news).

Don't be afraid to keep writing the correctly-compartmentalized editorials! This is, ofc, (a very large part of) what makes TPU special.


TLDR: Keep up the good work, btarunr (in both regards to news and analysis), and don't be afraid to continue to show us that our News Guy actually has an innate understanding of the stuff they are reporting.

:lovetpu:


Edit: was using 7600 math (2048sp) in former calculation (~2.8ghz), not 2560sp. Wasn't completely awake yet when I wrote that (or even as I write this, for that matter).

Erased that JIC anyone caught it. :laugh:

Still, the GDDR/LPDDR theoretical split for bw makes sense wrt VERY efficient clocks at perfect yield (2560sp) or higher-clocked (but still on the voltage/yield curve) using a lesser-enabled part (such as 2048sp).

I really should have had another cup of coffee before writing anything; apologies for my blunder. :oops:
Posted on Reply
#20
R0H1T
alwaysstsTPU's speculative analysis of the 128-bit/128-bit split between DDR and GDDR is quite astute, and while I've never seen it mentioned before...makes a ton of sense. I hope this to be the case.
What do you mean by split? It has 256bit LPDDR5x support & "possibly" separate support for GDDR6 ~ you can't split memory interfaces like that IIRC.
Posted on Reply
#21
AnotherReader
R0H1TWhat do you mean by split? It has 256bit LPDDR5x support & "possibly" separate support for GDDR6 ~ you can't split memory interfaces like that IIRC.
Moreover, LPDDR5X, along with HBM3, is the most power efficient DRAM type. Opting for GDDR6 would increase system power consumption without a commensurate performance increase.
Posted on Reply
#22
mrnagant
And it still has infinity cache. That is neat.

One thing I have been curious about, with this path AMD has been going down. Why have GPU AI accelerators and a dedicated NPU? Is the space the GPU accelerators take up mostly insignificant? What kind of capability overlap do they have? What makes them unique?

RDNA 3.5 is being used in packages like this and won't be offered on a dedicated discrete card where those AI accelerators make sense, as you'd likely not have an NPU. It seems like if your package has an NPU, you could have designed RDNA 3.5 to not have those AI accelerators at all. But AMD chose to leave them there for a reason. I wonder what that reasoning is.
Posted on Reply
#23
alwayssts
R0H1TWhat do you mean by split? It has 256bit LPDDR5x support & "possibly" separate support for GDDR6 ~ you can't split memory interfaces like that IIRC.
I apologize. You're correct...I don't know what I was thinking. Again, I spoke before my brain was fully firing on this one.

Just forget I said anything.

I'm feeling pretty foolish at the moment (outside the commending the observation of possible GDDR6).

I usually think about things a lot before I post; I don't know what I was thinking with that one. I would just delete it, but won't hide that everyone makes mistakes, myself included..
Posted on Reply
#24
R0H1T
That's just fine, it's all speculation after all ~ as for the OP I don't really expect GDDR6 support for the same chips going into laptops! Although it is possible.

AMD would probably want to minimize the die size & having multiple memory interfaces supported will do the opposite, the only way GDDR6 support is plausible is if any of these chips go into a console or something!
Posted on Reply
#25
alwayssts
AnotherReaderMoreover, LPDDR5X, along with HBM3, is the most power efficient DRAM type. Opting for GDDR6 would increase system power consumption without a commensurate performance increase.
There we definitely disagree. Samsung GDDR6 can run at 1.1v (as opposed to 1.05v for LPDDR5x; not a huge difference)...and the bandwidth between the two could make a substantial performance difference.

Certainly the difference between playing a game at 1080p60 or not.

Also, if 4 (16Gb/2GB) chips...8GB. That's what, like ~8-12W? I would take that trade-off 100% of the time, personally.

273gbps with just LPDDR5x; that's not even enough bandwidth for a desktop 7600 (given the similar cache) alone, ntm 7600 was always borderline-acceptable as a contemporary GPU; less-so moving forward.

By all accounts the mobile N33 was a failure (for not meeting even close to that standard); hence why we got the 7600XT (16GB) on desktop; why would they settle for trying to attract the same failed market?

I have to imagine this thing was created to keep pace with (at least) the PS5 (or in laptop GPU terms; at least a 4070 mobile), at least as an *option*.

They could run very low clocks w/o it and that's all fine and good wrt power or competing with Intel...but not when versus a contemporary discrete GPU; most generally target at least that metric.

Add to this...Navi 32 was never released as a mobile part AFAIK. I wonder why that could be...IMO probably because it would be a more-efficient (if-expensive) option than this.

That's why this part makes very little sense to me without some kind of additional bandwidth option....why create something so large if not to do battle with something like a 4070 mobile (and win)?

Besides the obvious reasons, a 'sideport' (yes, I know it's not exactly the same thing) of GDDR6 would make sense for a host of reasons, some that are in that article from 20 years ago.
Posted on Reply
Add your own comment
Nov 21st, 2024 07:31 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts