Tuesday, July 9th 2024

AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

Jul 9th, 2024 02:42 Discuss (40 Comments)

Apparently the AMD "Strix Halo" processor is real, and it's large. The chip is designed to square off against the likes of the Apple M3 Pro and M3 Max, in letting ultraportable notebooks have powerful graphics performance. A chiplet-based processor, not unlike the desktop socketed "Raphael," and mobile BGA "Dragon Range," the "Strix Halo" processor consists of one or two CCDs containing CPU cores, wired to a large die, that's technically the cIOD (client I/O die), but containing an oversized iGPU, and an NPU. The point behind "Strix Halo" is to eliminate the need for a performance-segment discrete GPU, and conserve its PCB footprint.

According to leaks by Harukaze5719, a reliable source with AMD leaks, "Strix Halo" comes in a BGA package dubbed FP11, measuring 37.5 mm x 45 mm, which is significantly larger than the 25 mm x 40 mm size of the FP8 BGA package that the regular "Strix Point," "Hawk Point," and "Phoenix" mobile processors are built on. It is larger in area than the 40 mm x 40 mm FL1 BGA package of "Dragon Range" and upcoming "Fire Range" gaming notebook processors. "Strix Halo" features one or two of the same 4 nm "Zen 5" CCDs featured on the "Granite Ridge" desktop and "Fire Range" mobile processors, but connected to a much larger I/O die, as we mentioned.

At this point, the foundry node of the I/O die of "Strix Halo" is not known, but it's unlikely to be the same 6 nm node as the cIOD that AMD has been using on its other client processors based on "Zen 4" and "Zen 5." It wouldn't surprise us if AMD is using the same 4 nm node as it did for "Phoenix," for this I/O die. The main reason an advanced node is warranted, is because of the oversized iGPU, which features a whopping 20 workgroup processors (WGPs), or 40 compute units (CU), worth 2,560 stream processors, 80 AI accelerators, and 40 Ray accelerators. This iGPU is based on the latest RDNA 3.5 graphics architecture.

For perspective, the iGPU of the regular 4 nm "Strix Point" processor has 8 WGPs (16 CU, 1,024 stream processors). Then there's the NPU. AMD is expected to carry over the same 50 TOPS-capable XDNA 2 NPU it uses on the regular "Strix Point," on the I/O die of "Strix Halo," giving the processor Microsoft Copilot+ capabilities.

The memory interface of "Strix Halo" has for long been a mystery. Logic dictates that it's a terrible idea to have 16 "Zen 5" CPU cores and a 40-Compute Unit GPU share even a regular dual-channel DDR5 memory interface at the highest possible speeds, as both the CPU and iGPU would be severely bandwidth-starved. Then there's also the NPU to consider, as AI inferencing is a memory-sensitive application.

We have a theory, that besides an LPDDR5X interface for the CPU cores, the "Strix Halo" package has wiring for discrete GDDR6 memory. Even a relatively narrow 128-bit GDDR6 memory interface running at 20 Gbps would give the iGPU 320 GB/s of memory bandwidth, which is plenty for performance-segment graphics. This would mean that besides LPDDR5X chips, there would be four GDDR6 chips on the PCB. The iGPU even has 32 MB of on-die Infinity Cache memory, which seems to agree with our theory of a 128-bit GDDR6 interface exclusively for the iGPU.

Sources: Harukaze_5719 (Twitter), Olrak29 (Twitter), VideoCardz

Add your own comment

40 Comments on AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

#26

ADB1979

mrnagantAnd it still has infinity cache. That is neat.

One thing I have been curious about, with this path AMD has been going down. Why have GPU AI accelerators and a dedicated NPU? Is the space the GPU accelerators take up mostly insignificant? What kind of capability overlap do they have? What makes them unique?

RDNA 3.5 is being used in packages like this and won't be offered on a dedicated discrete card where those AI accelerators make sense, as you'd likely not have an NPU. It seems like if your package has an NPU, you could have designed RDNA 3.5 to not have those AI accelerators at all. But AMD chose to leave them there for a reason. I wonder what that reasoning is.

To my knowledge the "AI accelerators" in the GPU that you are talking about do not exist, it's simply just using the GPU to process AI workloads which has been happening for years.

The NPU (Neural Processing Unit "AI engine" in Intel speak) is a different thing and a separate unit that can or cannot be included into the die design. AMD seems to be making a beeline for all of their "client" (end user) chips to have a separate NPU, they have started with the mobile line because the NPU can save a lot of battery power due to being able to do the same job as the GPU or CPU at much lower power consumption. This is not very critical on desktops, and many desktops also have more powerful dedicated GPU's and more powerful CPU's that can handle the AI workload (that MicroShaft etc envisage) so they can wait a while to get dedicated NPU's in future desktop CPU versions (Zen 6 I expect).

#27

R0H1T

alwayssts273gbps with just LPDDR5x; that's not even enough bandwidth for a desktop 7600

But this isn't really meant for desktops, as of now any way. Think of it as a competitor to Mx Pro not the Max or ultra Max uber edition :laugh:

#28

HisDivineOrder

If Valve is working on a Steam Deck Home console variant, I hope this (or something similarly strong) is the chip they use for it.

#29

ymdhis

ToTTenTranzWhy? It'll probably be cheaper and faster to get a 12/16-core Ryzen 9 with a discrete GPU. Especially if you wait for RDNA4.

Well we don't yet know the price so it's too early to tell. However RDNA4 has already been all but confirmed to have RDNA3 perf + improved RT.

As for why I'd rather have this? I have specific needs where a dGPU would be a crutch, and building a new AM5 machine would be significantly more efficient in power draw, thermals, noise, physical space used (I find the current trend for 3 slots + 35cm cards to be absolutely disgusting), peripheral support, and it wouldn't be that much worse in cost either since selling my current AM4 machine would cover a large chunk of the cost.

#30

Wirko

R0H1TWhat do you mean by split? It has 256bit LPDDR5x support & "possibly" separate support for GDDR6 ~ you can't split memory interfaces like that IIRC.

Two separate and different memory controllers, LPDDR for the CPU, GDDR for the GPU, each 128 bits wide. I too see it as a possibility, unless we consider all leaks as reliable and accurate (huh). There would be one big downside to that: no unified memory pool.

#31

AnotherReader

alwaysstsThere we definitely disagree. Samsung GDDR6 can run at 1.1v (as opposed to 1.05v for LPDDR5x; not a huge difference)...and the bandwidth between the two could make a substantial performance difference.

Certainly the difference between playing a game at 1080p60 or not.

Also, if 4 (16Gb/2GB) chips...8GB. That's what, like ~8-12W? I would take that trade-off 100% of the time, personally.

273gbps with just LPDDR5x; that's not even enough bandwidth for a desktop 7600 (given the similar cache) alone, ntm 7600 was always borderline-acceptable as a contemporary GPU; less-so moving forward.

By all accounts the mobile N33 was a failure (for not meeting even close to that standard); hence why we got the 7600XT (16GB) on desktop; why would they settle for trying to attract the same failed market?

I have to imagine this thing was created to keep pace with (at least) the PS5 (or in laptop GPU terms; at least a 4070 mobile), at least as an *option*.

They could run very low clocks w/o it and that's all fine and good wrt power or competing with Intel...but not when versus a contemporary discrete GPU; most generally target at least that metric.

Add to this...Navi 32 was never released as a mobile part AFAIK. I wonder why that could be...IMO probably because it would be a more-efficient (if-expensive) option than this.

That's why this part makes very little sense to me without some kind of additional bandwidth option....why create something so large if not to do battle with something like a 4070 mobile (and win)?

Besides the obvious reasons, a 'sideport' (yes, I know it's not exactly the same thing) of GDDR6 would make sense for a host of reasons, some that are in that article from 20 years ago.

DRAM power efficiency is measured in pJ per transferred bit. LPDDR5 and HBM3 are both around 3.5 pJ per bit while GDDR6 is close to 8 pJ per bit. Note that this doesn't account for device power consumption; only transmission power is considered.

#32

alwayssts

R0H1TBut this isn't really meant for desktops, as of now any way. Think of it as a competitor to Mx Pro not the Max or ultra Max uber edition :laugh:

Right, but that wasn't my point. The point is the laptop version didn't sell because it didn't perform to that (barely-enough) level of performance from the desktop version.

4070 mobile (really more of a low-clocked 4060 Ti Super) more-or-less can/does, depending on which model you buy and how you use it.

This could (and should) be roughly equal (without) and better than all of those (with it), as perf is perf.

This could actually be a chip that hits that sweet-spot of better than most mobile 4070's and much, much, cheaper than a 4080 mobile (which is actually a cut-down 4070 desktop)...literally in the center and good-enough for general laptop (or even general [1080p60] PC) gaming...but it needs the BW....which the current LPDRR5x/cache simply cannot provide.

As I say, they can go at it with low clocks and high efficiency, and that's fine (as there is a market for that), but sub-optimal GPU perf is still sub-optimal GPU perf regardless.

I'm saying there is a market for what they COULD do, which IMO is the only reason you specifically make this chip. I'm sure a vanilla option will still will look nice wrt power/perf.

The thesis of adding GDDR6 clicked everything into place for me, as I had not even considered that possible.

In reality though, it makes perfect sense (for those willing to use a higher power envelope, just as those whom would buy a discrete 80-120w nVIDIA laptop GPU would do).

I'm simply saying before it looked like they were attempting to kill some small birds (other CPU/SoCs) with a very big stone because the it lacked bw to push it into competing with a discrete GPU; maybe take some market share from <4070 laptops...but they aren't for (most, non e-sport) gaming anyway.

Now it would appear they can kill multiple birds with one stone...and some GDDR6. They could/should be able to compete with, if not exceed the performance of 4070 laptops in a tangible way for less money.

People could actually have a decent 1080p60 SoC laptop

AnotherReaderDRAM power efficiency is measured in pJ per transferred bit. LPDDR5 and HBM3 are both around 3.5 pJ per bit while GDDR6 is close to 8 pJ per bit. Note that this doesn't account for device power consumption; only transmission power is considered.

I assume you are using 1.35v with your metric. Try using 1.1v. I don't know if any current products use Samsung @ 1.1v? I think most use Hynix @ 1.35v (and sometimes substitute in Samsung at 1.35v).

I'm not saying it's MORE efficient, I'm saying it's (potentially) not nearly as bad as you're implying, and the power/performance trade-off would be worth it for someone that was deciding between a productivity machine and a budget gaming laptop in a similar price range; especially versus something like a laptop with a 4070/4080 mobile inside of it (which would be much more expensive and/or likely use even more power).

I guess we'll just see how it goes?

It will be interesting to see how such setups without (or conceivingly with) GDDR6 match-up against competing solutions (both in productivity and gaming; iGPU/7600/4060/4070 laptops). I guess I'm just more optimistic this will be a low-cost good-enough option for many different kinds of people/markets vs their direct competition regardless of TDP configuration...although it will be interesting to see the power required to achieve parity with 4060/4070 mobile.

WirkoTwo separate and different memory controllers, LPDDR for the CPU, GDDR for the GPU, each 128 bits wide. I too see it as a possibility, unless we consider all leaks as reliable and accurate (huh). There would be one big downside to that: no unified memory pool.

That is indeed possible; unified pool or not there's always conceivably crossbar/HUMA.

I messed up though thinking it would be 128+128, not 256+128-bit. I don't know why I subtracted from the 'known' 256-bit LPDDR5x controller to add the possible GDDR6 controller. Whoops. :laugh:

Again, who knows....Like you say: leaks and rumors...maybe it could be 128+128 after-all. My (perhaps wrong) thinking was that 256-bit was known, but it was unknown 128-bit could be wired out to GDDR6.

Just making conversation and attempting conceivable projections and their use-cases. Never trying to proclaim infalability vs what might actually transpire.

The thing I never understood about a 256-bit LPDDR5x controller is...wouldn't that require 4 sticks of ram? That's pretty weird. Not impossible; just unconventional.

#33

Wirko

alwaysstsThe thing I never understood about a 256-bit LPDDR5x controller is...wouldn't that require 4 sticks of ram? That's pretty weird. Not impossible; just unconventional.

No ... because LPDDR5 doesn't come in sticks but rather in funny car-shaped LPCAMM modules, which almost no one has seen so far anyway. Each of them is 128 bits wide.

#34

Minus Infinity

DavenThe rumors still say 256-bit 8000 LPDDRX. Not sure why TPU now says 128 bit because of the socket size.

Edit: Oh the article is saying 128-bit GDDR6 not LPDDRX. Now that would be cool.

It's right there in the image: 256 bit 8533 LPDDR5X!

mikesgStrix Point (RDNA 3.5) / 890M 16CU (in a GPD Duo) has scored in the region of a RTX 3050 in Time Spy.

With more than double the CU's and TDP headroom it's easily a 4050-4060-5050 competitor.

Strix Point is more for thin laptop/mini PC. Strix Halo would suit desktop a lot more. Within one day you would go shopping for the biggest cooler.

Yes, and TDP is ~120-130W so not too limiting. As long as the chassis isn't stupidly low on volume and can exhaust that heat, it should do really well. Would be a great laptop for photo/video editing and not just gaming, which seems a waste of it's potential.

Chrispy_The number of things that use OpenCL to any success these days is dwindling by the day, and ROCm support is a noble attempt but its influence so far on the software market is somewhere between "absolute zero" and "too small to measure". It's why Nvidia is now the most valuable company on earth, bar none. I certainly don't like that fact, but it's the undeniable truth.

That's true, but AMD is working hard on ROCm and even in ayear people are saying working with LLM's has gone from broken to functional in that time and hopefully with AMD becoming more software focused they throw more resources at ROCm. I still AMD will still get more support for ROCm than Intel will for SYCL.

#35

ToTTenTranz

Sound_CardAs much as I am hyped for Strix Halo, it does make me wonder why not make a APU that is 60cu and 12 cores? or 80cu and 8 cores? Do we really need 16cores 32threads for gaming? They could easily market a 'gamer 3D APU' that is 8 cores, 60cu, with 3D cache and Infinity Cache. The mini PC market would go absolute bonkers.

ymdhisWell we don't yet know the price so it's too early to tell.

Some seem to be convinced Strix Halo is a gaming-oriented chip going into medium-priced laptops, Mini ATX motherboards and NUCs. It's not.

Strix Halo is a premium chip for premium windows laptops that will compete with the premium MacBook Pro models. It's above all a competitor to the M3 Max and probably M4 Max.

It's a great-all-around, no-cut-corners big SoC for laptops that has a capable GPU for gaming and GPU-accelerated tasks like video/image editors, a whopping 16-core Zen5 (no Zen5c BTW) for demanding multithreaded tasks like simulation and product development tools, a powerful ~50 TOPs NPU to run generative AI models and access to a truckload of RAM thanks to its 256bit width. It's everything at once.

In fact, there's more than 16 Zen5 cores in the solution. There's an additional 4 Zen5LP cores inside the I/O+GPU chip that consume very little power and clock very slowly, but take over the OS tasks while the system is idling. It's AMD's answer to Qualcomm's superior power efficiency on low demand loads, so that these premium windows laptops get the same 12-16h battery life on light usage.

So don't count on the full Strix Halo to appear on anything that isn't a premium laptop above $2500. As much as even I'd love to see ~$1000 gaming handhelds with a cut-down version of Strix Halo, the chip is going into laptops competing against the $4000 MBP M3 Max.

#36

TechLurker

WirkoOnly the Threadripper socket could be a candidate for that.

I've been pushing this idea since Ryzen TR first came out; a "Super APU" of some kind built on TR and making use of all that extra space. It would be a great way to add a lot of I/O while also integrating an iGPU capable of some moderate level gaming or rendering/AI. Even if the I/O is truncated down a few lanes and/or downgraded a PCIe generation if slotted with such a theoretical CPU.

#37

R0H1T

That's probably coming a couple of gens down the road, when they can afford to sell a massive APU in large numbers. The caveat being if they can also do unified memory in that time? That probably will seal the deal.

#38

ST33LDI9ITAL

Yes! This is what we want and need. C'mon AMD please deliver.

#39

dafolzey

I could see this chip being popular in mid level gaming laptops like the Legion 5 or maybe G14. Of course the Nvidia brand still carries a lot of cachet so high end products will have GeForce chips - not necessarily just the highest performance, but anything marketed as a premium product like XPS. Maybe some gaming oriented mini-PCs. But I doubt there's any reason for there to be a socketed version, it will never be cheaper or higher performance than traditional build methods.

#40

ADB1979

TechLurkerI've been pushing this idea since Ryzen TR first came out; a "Super APU" of some kind built on TR and making use of all that extra space. It would be a great way to add a lot of I/O while also integrating an iGPU capable of some moderate level gaming or rendering/AI. Even if the I/O is truncated down a few lanes and/or downgraded a PCIe generation if slotted with such a theoretical CPU.

The primary problem there is that EPYC (and thus Threadripper) was designed without any video I/O, and the only video I/O comes from a third party chip for remote secure access and to provide that video output.

I do not know what difference this would make and whether or not some of those many traces could be used for video output instead of PCIe or whether it would require a whole new design. On that note, the current EPYC (and thus Threadripper) socket is physically capable of 12x DDR5 memory channels, restricted with Threadrippers to 8 and 6 channels, and FYI there is also the "SP6 socket" to look at, it (for now at least) uses the same physical IOD as EPYC and Threadripper chips but is in a smaller physical chip substrate, smaller socket, fewer memory channels and PCIe lanes and is (for now) restricted to 4x 15-core Zen 4c chiplets. This to me is a much closer basis for a new HEDT platform with reduced costs.

SP6 also shows that AMD is in a phase of serious expansion in all markets, specifically here "lower end servers" (and hopefully low end Threadrippers) that are currently limited to 64x Zen 4c cores and "only" 5 channels of DDR5 and 96 PCIe 5 lanes. Whatever socket that Strix Halo uses will be yet another new socket in a short period of time and IMHO a whole new family of products all using 256-bit RAM. As Strix Halo is going to be the first of a new line of products, it's success, it's pros and cons etc will all be scrutinised and no-doubt will be tested to the nth degree and people will find some interesting niches for this product and potential future avenues to aim towards if the need to tweak the socket for the 2nd generation (RAM will typically do that), but also whether Strix Point will be a product that then spawns it's own split in product lines, one towards HEDT/Threadripper that is affordable, and the other as a high performance SoC that does not require a dedicated GPU. Time will tell, and IMHO Strix Halo is my most anticipated product this year (even if it's delayed yet again and is released next year), specifically because it is essentially a whole new class of product, otherwise, Zen 6 is going to bring a minor revolution at the technical level and highlight technologies to come and direction of travel at the mass market desktop level.

Add your own comment

AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

40 Comments on AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

Related News

40 Comments on AMD "Strix Halo" a Large Rectangular BGA Package the Size of an LGA1700 Processor

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts