• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.
  • The forums have been upgraded with support for dark mode. By default it will follow the setting on your system/browser. You may override it by scrolling to the end of the page and clicking the gears icon.

ARC "driver issues" turning out to be actually hardware deficiencies - Battlemage reveal

Joined
May 25, 2022
Messages
139 (0.12/day)
Contrary to expectations(but some of us have suspected) ARC's issues with being unable to pull of performance corresponding to the x70 die size and compatibility issues with some games have to do with Alchemist's hardware, not software.

Battlemage revelations by Tom "TAP" Peterson at Computex:
The Xe-cores in Xe2 have been improved for higher performance, better utilisation, and greater compatibility with games. That last point is particularly important, going off Intel's previous form.
These changes take various forms, though I'm told it's not only improvements to the software stack, but changes to the silicon itself to make it gel more easily with modern games.

There's hardware support for commonly used commands, such as execute indirect, which causes headaches and slows performance on Alchemist. Another command, Fast Clear, is now supported in the Xe2 hardware, rather than having to be emulated in software as it was on Alchemist.
Another is execute indirect support baked into the hardware, via the Command Front End, which is a command used commonly in game engines, including Unreal Engine 5. This was previously emulated in software on Alchemist, which led to slowdown
The Xe2 architecture's Render Slice includes improvements to deliver 3x mesh shading performance, 3x vertex fetch throughput, and 2x throughput for sampling without filtering. Bandwidth requirements should be lower, and commands are more in line with what games often use.
Execute Indirect not being in hardware is why games like Hellblade 2, Remnant 2 and Nightingale all underperform on ARC, because Alchemist emulates it in software, an instruction that UE5 uses. Battlemage makes Execute Indirect 12.5x faster!

It's also pointed out that compatibility issues lie partly due to using SIMD8 for base width, and moving to SIMD16 will improve compatibility with games without needing hand-tuning by the driver team.

Hardware has been bottlenecking ARC
NOT
Driver has been bottlenecking ARC

The driver team would have needed to hand-tune each game to avoid the weaknesses in the architecture such as those requiring Execute Indirect. It'll no longer need Day 1 driver for each and every game to fix compatibility issues with games needing SIMD16. This means not only driver development will be FASTER and more effective, but in many cases won't need intervention at all, which will be a boon for especially older titles.

Things such as Fast Clear is also funny as it has been in AMD/Nvidia architectures for more than a decade now. Welcome to the modern world ARC. Chips and cheese talked about how Alchemist requires high workload to take full advantage of the shaders and even memory subsystem. 512GB/s on Alchemist is not really 512GB/s because of architectural design choices.

Bonus:
-Ray Tracing Unit width increases from 2 traversal pipelines to 3.
More capable RT unit means less performance loss when RT is on.
 
Good work uncovering this, and I’m guessing Intel wasn’t super forthright with their hardware limitations. It was easy to assume driver immaturity was the issue, but in reality, the drivers were propping up significant weaknesses in the hardware.

I’m curious about the design decisions behind round one. I wonder if it was just a matter of getting a much-delayed project out the door. Surely they knew this stuff but maybe it would have taken even longer to address it, and by then the window to sell it at all would be gone.
 
you have to learn to walk before you can run and there aint any shame in falling over when youre learning to walk. Cant wait to see what the next generation of cards offers!
 
I still don't see how they can make this work, their next generation cards need to be 200-250% faster to be competitive while maintain similar pricing and that doesn't seem feasible.

Alchemist is extremely underpowered for it's size and power envelope, I always suspected that SIMD8 was an extremely bad architectural choice, 16 still seems way too narrow for a GPU, AMD are at 64 and Nvidia at 128, the problem is that the more granular the architecture is the more power and overhead you need to schedule instructions.
 
I think I speak for everyone when I ask if battlemage goes brr?
 
I think I speak for everyone when I ask if battlemage goes brr?

Yes, it'll go "brr", while RDNA 4 will go "BRR" and Blackwell will be going "BRRRR" :)
 
Yes, it'll go "brr", while RDNA 4 will go "BRR" and Blackwell will be going "BRRRR" :)
Compared to Alchemist which variably ranged from "..." to "br", going "brr" is progress
 
Last edited:
while RDNA 4 will go
Will it? Latest quote-unquote "progress" in AMD GPUs doesn't even attempt to sell a story of RDNA4 being at least not worse than predecessors. Being marginally faster in pure raster when your GPUs bend in everything else doesn't cut it anymore.

If a Battlemage $300 GPU outperforms (more than 25 percent advantage on average) RTX 4060 Ti then what's wrong with that? We've already seen that Alchemist GPUs, despite being raw, unready and immature, are decent for their money. Far from ideal but they do their thing more or less. With this metric ton of bugs being fixed, Battlemage sounds like a probably OK generation. We'll see.
 
Happy about these modifications! It’s already incredible what the driver team did. Recognizing what you can improve on the silicon level is good for any company.

I watched this a few days ago; but nothing really shocking. Improvement is natural progression and if they weren’t making silicon changes that would spell disaster.
 
Will it? Latest quote-unquote "progress" in AMD GPUs doesn't even attempt to sell a story of RDNA4 being at least not worse than predecessors. Being marginally faster in pure raster when your GPUs bend in everything else doesn't cut it anymore.

If a Battlemage $300 GPU outperforms (more than 25 percent advantage on average) RTX 4060 Ti then what's wrong with that? We've already seen that Alchemist GPUs, despite being raw, unready and immature, are decent for their money. Far from ideal but they do their thing more or less. With this metric ton of bugs being fixed, Battlemage sounds like a probably OK generation. We'll see.

I feel it's established enough to do BRR, it just won't do BRRRR lol

Weird that we can all understand each other though
 
I feel it's established enough to do BRR
I'll be fairly surprised if it does. Odds for Battlemage doing BRR are a wee higher and yet, they're almost non-existent. BRRRR isn't available for either party, $3K utter deficite 500 W chomp chomp chonker with a couple disabled GPUs (oh excuse me, GPUs with physical limitations or how do you say it in a politically correct way) for $1K and $800 respectively alongside it only sparks BS. I agree with that being A+ to have a GPU like that but c'mon, almost nobody can afford RTX 4090, let alone probably even more expensive 5090. Lower tier SKUs are "can we get away with cutting it dow~ ah freak it, cut it down twice as hard, our competition is AMD anyway."
 
Will it? Latest quote-unquote "progress" in AMD GPUs doesn't even attempt to sell a story of RDNA4 being at least not worse than predecessors. Being marginally faster in pure raster when your GPUs bend in everything else doesn't cut it anymore.

If a Battlemage $300 GPU outperforms (more than 25 percent advantage on average) RTX 4060 Ti then what's wrong with that? We've already seen that Alchemist GPUs, despite being raw, unready and immature, are decent for their money. Far from ideal but they do their thing more or less. With this metric ton of bugs being fixed, Battlemage sounds like a probably OK generation. We'll see.
I second this. Nobody is expecting Battlemage to take the performance crown, but it doesn't need to; it just needs to squeeze the other players in the low-mid-range. In particular, if BM can significantly out-RT NVIDIA's offerings at the same price point, it would be a very tempting proposition.

That said, given Intel's dGPU marketshare is back at 0% after peaking at 4%, they're basically going to have to start from scratch with the marketing. The biggest challenge, I feel, will be getting BM to pick up marketshare and sustain it to the next generation.
 
The odd thing is - they had HiZ (hierarchical Z) and no Z-clear, ATi implemented that in the original Radeon and nvidia in GeForce 3 if not in GeForce256.
Intel is really not doing well - and to think they've been producing decelerators complementing some of their north-bridges/CPU's all these years, oh wait...
 
That's hardly a feat considering where GPU prices are still headed, Blackwell will almost certainly be more expensive than the current gen & AMD will probably up their prices too ~ Intel will try to fill a bigger hole in the lower range!

but it doesn't need to; it just needs to squeeze the other players in the low-mid-range.
It's almost like the last crypto rage except it's "AI" now :nutkick:
 
If Intel is serious about GPUs they will have to embark in a multi year effort to hammer out all the design flaws both on the software and hardware front. I am not sure if they have the money to sustain this adventure.
 
Yes, it'll go "brr", while RDNA 4 will go "BRR" and Blackwell will be going "BRRRR" :)
Why do people keep imitating vibrators these days, I don't get it. Or their phone's constantly ringing while on silent mode or something.
 
This is great news... ARC being a first gen design was rough, but most first gen designs are pretty rough/ unoptimized.
 
Why do people keep imitating vibrators these days, I don't get it. Or their phone's constantly ringing while on silent mode or something.
The origin of this meme is the printer used by the US government to print money in an excessive amount.

1718198018903.jpeg


However, your vibrator reference is quite accurate, modern economics is pretty much full of compulsive wanking.
 
I remember when Arc was first released there was some hints that they knew they wouldnt get the performance they wanted out of Alchemist due to design choices taken early on. This was why when you went from 4k down to 1440p to 1080p a lot of the time in initial reviews there was little to no performance gain/loss depending on which way you looked at the figures.

I may be mis-remembering this but back just before they actually launched there was rumours they were going to can Alchemist completely and wait till battlemage came out due to these known limitations but the monetary situation at Intel wasnt going great and they couldnt rely on just waiting while burning through billions again without seeing how the market would actually react.


However the good we have seen from this is that the Driver team have been doing work and showing that even with compromises they have bene successful in gettings lots of performance out of the hardware even with limitations. Now if they can get all the performance that is being suggested for decent power consumption and decent MSRP I think AMD will be in serious trouble in the low/mid end and nVidia will have to take note as Intels RT performance was always good and XeSS doesnt seem to be a lesser competitor to DLSS on Intel hardware.
 
I still don't see how they can make this work, their next generation cards need to be 200-250% faster to be competitive while maintain similar pricing and that doesn't seem feasible.
Battlemage top end is supposed to have 2x the Xe cores with significantly improved power efficiency. The increased perf/W is on an architectural level, and there's process on top of that. It's on a next generation process but having a similar 400mm2 die as Alchemist.

Actually Alchemist had many other flaws that severely limited marketshare:
-ReBar requirement: Cheap GPU but can't work in older, cheap systems well
-Idle power management not fine grained enough to power down in all cases, meaning high idle power
-Other iGPU-mentality limitations(Described well here: https://forums.anandtech.com/thread...evidently-not-cancelled.2526376/post-41227768)
-Compatibility issues

Imagine an A770 with exactly the same specs but with the above problems fixed. They could keep the price higher or keep the same price and sell more. If they can get 2x the performance with significantly improved power efficiency, even in late 2024 it will be decent, unlike Alchemist which was a disaster in the beginning.

Battlemage will also introduce a limited hardware FP64 engine for compatibility reasons, same introduced with Meteorlake.
Good work uncovering this, and I’m guessing Intel wasn’t super forthright with their hardware limitations. It was easy to assume driver immaturity was the issue, but in reality, the drivers were propping up significant weaknesses in the hardware.

I’m curious about the design decisions behind round one. I wonder if it was just a matter of getting a much-delayed project out the door. Surely they knew this stuff but maybe it would have taken even longer to address it, and by then the window to sell it at all would be gone.
They had hardware issues many times before.

Remember the infamous X3000 series? The unified shader architecture with the hardware T&L engine? The driver team did took forever, coming nearly a year later but then it was found the performance sucked. You were sometimes better off with a faster CPU and software rendering.

Reason? Drivers? Nope.

The X3000 had anemic geometry unit. The successor GMA 4000 series doubled the geometry performance, eliminating the cases where software was faster. GMA HD made it even faster by introducing Hi-Z, which served to further improve geometry performance by lowering bandwidth requirements.

Intel also had a problem having 23.976Hz rendering for video playback. It was fixed on Ivy Bridge. But Ivy Bridge had a problem where the GT Turbo would be stuck at 900MHz, rather than the 1.15GHz or higher. It was fixed on Haswell.

Real world experience can't be substituted. I knew that they would have problems with their first dGPU. Despite that, because of the long-standing base using their iGPU, they are the only viable 3rd competitor.
 
Last edited:
If Intel is serious about GPUs they will have to embark in a multi year effort to hammer out all the design flaws both on the software and hardware front. I am not sure if they have the money to sustain this adventure.
Intel has the money, but do they have the commitment? That would be the bigger question.

They have been very trigger happy in areas that arent making massive money or they dont see as having a viable future recently.

SSDs -> Soldigm
NUC -> Asus
Optane -> Binned
1st party Intel Boards - Binned

I suspect the only thing that is "buying" the GPU team is the ability to use some of their develeopment for dedicated AI accelerators and also for upping the performance of their iGPUs similar to AMDs APUs. Celestial will be the make or break for them really to see if they are a realistic 3rd choice in the GPU but if it doesnt take off and is religated down where Alchemist was/is currently then I cant see them making consumer cards much longer beyond that. Those teams will be split off into AI and into iGPU only.
 
Battlemage top end is supposed to have 2x the Xe cores with significantly improved power efficiency. The increased perf/W is on an architectural level, and there's process on top of that. It's on a next generation process but having a similar 400mm2 die as Alchemist.

Actually Alchemist had many other flaws that severely limited marketshare:
-ReBar requirement: Cheap GPU but can't work in older, cheap systems well
-Idle power management not fine grained enough to power down in all cases, meaning high idle power
-Other iGPU-mentality limitations(Described well here: https://forums.anandtech.com/thread...evidently-not-cancelled.2526376/post-41227768)
-Compatibility issues

Imagine an A770 with exactly the same specs but with the above problems fixed. They could keep the price higher or keep the same price and sell more. If they can get 2x the performance with significantly improved power efficiency, even in late 2024 it will be decent, unlike Alchemist which was a disaster in the beginning.

Battlemage will also introduce a limited hardware FP64 engine for compatibility reasons, same introduced with Meteorlake.

They had hardware issues many times before.

Remember the infamous X3000 series? The unified shader architecture with the hardware T&L engine? The driver team did took forever, coming nearly a year later but then it was found the performance sucked. You were sometimes better off with a faster CPU and software rendering.

Reason? Drivers? Nope.

The X3000 had anemic geometry unit. The successor GMA 4000 series doubled the geometry performance, eliminating the cases where software was faster. GMA HD made it even faster by introducing Hi-Z, which served to further improve geometry performance by lowering bandwidth requirements.

Intel also had a problem having 23.976Hz rendering for video playback. It was fixed on Ivy Bridge. But Ivy Bridge had a problem where the GT Turbo would be stuck at 900MHz, rather than the 1.15GHz or higher. It was fixed on Haswell.

Real world experience can't be substituted. I knew that they would have problems with their first dGPU. Despite that, because of the long-standing base using their iGPU, they are the only viable 3rd competitor.
I’m curious to see how Snapdragon X is going to actually do with its soon-to-be-realized real-world experience. They are talking games in their prerelease slides, but call me skeptical. They won’t have a desktop class GPU, but they will be competing directly against Xe2 and RDNA3.5.
 
Intel has the money, but do they have the commitment? That would be the bigger question.

They have been very trigger happy in areas that arent making massive money or they dont see as having a viable future recently.

SSDs -> Soldigm
NUC -> Asus
Optane -> Binned
1st party Intel Boards - Binned

I suspect the only thing that is "buying" the GPU team is the ability to use some of their develeopment for dedicated AI accelerators and also for upping the performance of their iGPUs similar to AMDs APUs. Celestial will be the make or break for them really to see if they are a realistic 3rd choice in the GPU but if it doesnt take off and is religated down where Alchemist was/is currently then I cant see them making consumer cards much longer beyond that. Those teams will be split off into AI and into iGPU only.
They may reorient it towards supporting AI/ML workloads, but I don't think they have the luxury of ignoring these kind of processors anymore. ARM is threatening to make them irrelevant.
 
Back
Top