Thursday, April 16th 2020

Intel Gen12 Xe iGPU Could Match AMD's Vega-based iGPUs

Intel's first integrated graphics solution based on its ambitious new Xe graphics architecture, could match AMD's "Vega" architecture based iGPU solutions, such as the one found in its latest Ryzen 4000 series "Renoir" iGPUs, according to leaked 3DMark FireStrike numbers put out by @_rogame. Benchmark results of a prototype laptop based on Intel's "Tiger Lake-U" processor surfaced on the 3DMark database. This processor embeds Intel's Gen12 Xe iGPU solution, which is purported to offer significant performance gains over current Gen11 and Gen9.5 based iGPUs.

The prototype 2-core/4-thread "Tiger Lake-U" processor with Gen12 graphics yields a 3DMark FireStrike score of 2,196 points, with a graphics score of 2,467, and 6,488 points physics score. These scores are comparable to 8 CU Radeon Vega iGPU solutions. "Renoir" tops out at 8 CUs, but shores up performance to the 11 CU "Picasso" levels by other means. Besides tapping into the 7 nm process to increase engine clocks, improve the boosting algorithm, and modernizing the display- and multimedia engines; AMD's iGPU is largely based on the same 3-year old "Vega" architecture. Intel Gen12 Xe makes its debut with the "Tiger Lake" microarchitecture slated for 2021.
Source: _rogame (Twitter)
Add your own comment

45 Comments on Intel Gen12 Xe iGPU Could Match AMD's Vega-based iGPUs

#26
Turmania
More players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
Posted on Reply
#27
zlobby
TurmaniaMore players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
Huawei and Microsoft are heavily stirring the status quo with ther ARM push. Huawei made decent server Taishan ARM CPU, while Microsoft is porting more and more to ARM.

I wonder how long x86 has left outside gaming and workstation segments?
Posted on Reply
#28
wiak
NC37Given that AMD has simply rehashed the same design since APUs began, it makes sense that Intel can catch them. The core problem is the Shared VRAM. AMD solves for it in consoles but does nothing for PC market. All Intel needs to do is solve for that and they can beat any APU. Which they have shown a willingness to do in their IRIS platform.
depends on which vega you mean, the vega cores in the latest Ryzen Mobile 4000 series is much faster than previous cores,
Posted on Reply
#29
zlobby
wiakdepends on which vega you mean, the vega cores in the latest Ryzen Mobile 4000 series is much faster than previous cores,
Faster by how much, and under what thermal envelope and workloads?
Posted on Reply
#30
Darmok N Jalad
zlobbyHuawei and Microsoft are heavily stirring the status quo with ther ARM push. Huawei made decent server Taishan ARM CPU, while Microsoft is porting more and more to ARM.

I wonder how long x86 has left outside gaming and workstation segments?
If Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
Posted on Reply
#31
Vayra86
Darmok N JaladIf Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
Well spoken. I also think history by now has pointed out to us that one does not exclude the other.

That goes for gaming. It goes for x86 / ARM. The market has become so all encompassing, there IS no one size fits all. It also echoes in MS's Windows RT attempt, for example. People want Windows for specific reasons. Windows phone..., same fate. And even within Windows x86, the cross compatibility just doesn't happen.
Posted on Reply
#32
zlobby
Darmok N JaladIf Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
Yeah, I also forgot to mention Apple, but you didn't. Thanks for the reminder. :)

Microsoft is missing a huge chunk of the mobile and wearables pie. ARM is their re-entry trajectory for these markets, so I really don't think they have abandoned ARM.
Posted on Reply
#33
Valantar
zlobbyFaster by how much, and under what thermal envelope and workloads?
AMD claims around 59% increased perf/CU for Renoir over Picasso. I haven't seen any detailed reviews yet doing like-for-like comparisons, but leaks and preliminary data suggest it's not far off at least. But again, a significant part of this is due to faster RAM. The best case scenario for Picasso was DDR4-2400, and now pretty much the worst case scenario is DDR4-3200 with LPDDR4X-4266 being a shoo-in for anything thin and light. That'll be an immense boost for the 15W SKUs (and especially the ones configured to 25W).
Posted on Reply
#34
medi01
zlobbyFaster by how much, and under what thermal envelope and workloads?
The IPC jumped 1.6 times, according to AMD.
Posted on Reply
#35
Valantar
TurmaniaMore players in every segment is a good thing for all of us. It would be great if Nvidia entered cpu segment as well in the long run.
Nvidia tried their hand at ARM SoCs back in the early 2010s, and bowed out due to stiff competition and small margins. While they still make them for their automotive customers (... and Nintendo, though that design is ancient by now), they aren't likely to return to custom ARM chips for consumer or enterprise use any time soon - designing the chips is too expensive and difficult and competition against entrenched players with years of experience is likely too much to take on (though I could see them buying an ARM server vendor if that suited their long term goals). And of course they don't have (and are never getting) an X86 licence (why would Intel grant them one?), so that door is shut.
Darmok N JaladIf Apple makes the jump like the rumors keep suggesting, that will be a big move, especially if they surprise us with the performance. Their mobile SOCs are really powerful, but they are running in primarily single-intensive-task-at-a-time devices. Still, they are very performant for basic needs, and even things like image editing run with no lag. MS isn’t going to move the needle on ARM adoption, IMO. People like Windows for its compatibility, and MS has tried for years to make a break from legacy support, and all those products fail.
While I tend to mostly agree with you, Windows on ARM has promise simply due to the emulated compatibility layer (and the reportedly upcoming expansion of it to 64-bit). That would make thin-and-light ARM Windows laptops pretty great if the performance was good enough. Of course Qualcomm and the others are still miles behind Apple in this regard, so the ideal combo there would be an iPad or ARM MacBook running WoA :p
Posted on Reply
#36
Darmok N Jalad
zlobbyYeah, I also forgot to mention Apple, but you didn't. Thanks for the reminder. :)

Microsoft is missing a huge chunk of the mobile and wearables pie. ARM is their re-entry trajectory for these markets, so I really don't think they have abandoned ARM.
The thing is, they tried to have a presence in that market and completely fumbled away their progress. Nokia made some great phones, and WP8 was decent, WP8.1 was the pinnacle of MS’s mobile efforts. W10M was an utter disaster. As was MS buying Nokia. As was Windows RT. I know, because I was one of those heavily invested in MS’s mobile consumer push—I had purchased several Nokia WPs, both Surface RT and Surface 2 (even a Surface 3 non-pro), and I even tried MS band 2. The pre-MS Lumias were great. The Lumix 950 was literally a hot mess—mine got blazing hot doing absolutely nothing. Band 2 was a great idea, but the thing was so poorly made that it fell apart inside 3 months, and it’s warranty replacement did too.

I was all-in with MS, but I’ve had so many bad experiences with their hardware that I’ve vowed to never buy anything with their name on it that isn’t a mouse or keyboard. I’ll use their OS and Office, but that’s it–they‘ve shown no real commitment to anything else. I don’t even trust Surface. If you look at that brand’s track record, few devices have really been successful. Their App Store is a joke too. The few apps I’ve purchased or tried from there won’t even install correctly or run after the fact. MS can’t even master what other software companies have managed to do–install software on Windows!
Posted on Reply
#37
zlobby
Darmok N JaladThe thing is, they tried to have a presence in that market and completely fumbled away their progress. Nokia made some great phones, and WP8 was decent, WP8.1 was the pinnacle of MS’s mobile efforts. W10M was an utter disaster. As was MS buying Nokia. As was Windows RT. I know, because I was one of those heavily invested in MS’s mobile consumer push—I had purchased several Nokia WPs, both Surface RT and Surface 2 (even a Surface 3 non-pro), and I even tried MS band 2. The pre-MS Lumias were great. The Lumix 950 was literally a hot mess—mine got blazing hot doing absolutely nothing. Band 2 was a great idea, but the thing was so poorly made that it fell apart inside 3 months, and it’s warranty replacement did too.

I was all-in with MS, but I’ve had so many bad experiences with their hardware that I’ve vowed to never buy anything with their name on it that isn’t a mouse or keyboard. I’ll use their OS and Office, but that’s it–they‘ve shown no real commitment to anything else. I don’t even trust Surface. If you look at that brand’s track record, few devices have really been successful. Their App Store is a joke too. The few apps I’ve purchased or tried from there won’t even install correctly or run after the fact. MS can’t even master what other software companies have managed to do–install software on Windows!
All you say is true. I still remember my Lumia 1020, its godly camera and buggy microphones...:rolleyes:

However, Microsoft failing a few times doesn't mean they will also fail the next time they give it a try. Let's face it, future is all mobile and MS have 0 presence in mobile, so 2+2=4? It's only a matter of time until we see their next attempt at it.
Posted on Reply
#38
Flanker
zlobbyAll you say is true. I still remember my Lumia 1020, its godly camera and buggy microphones...:rolleyes:

However, Microsoft failing a few times doesn't mean they will also fail the next time they give it a try. Let's face it, future is all mobile and MS have 0 presence in mobile, so 2+2=4? It's only a matter of time until we see their next attempt at it.
I still have my Lumia 950 and still love the phone. It's just the apps available is :banghead:

Their android apps, like office and edge browser, are pretty good imo.
Posted on Reply
#39
mtcn77
NC37All Intel needs to do is solve for that and they can beat any APU. Which they have shown a willingness to do in their IRIS platform.
Easier said than done when you need mainstream settings just to meet the challenge with your esram special.
Vya DomusThere is nothing to "beat", you simply need faster memory and at this point this means DDR5. AMD had the closest thing to a solution with the HBC thing but that never made it's way to APUs.
Total disagreement. Logic dictates AMD ought to have a full house with the advent of compute graphics, however Nvidia still holds together with grace. Numbers aren't the issue, it is internal bandwidth and Nvidia knows it best to adopt mobile rasterization for this purpose.
One could say, yeah it is just a bunch of 3D stages seperated nicely into performance numbers, however that would overlook the runtime of data in flight. It is the architecture that makes bandwidth possible. AMD has its shot, not by the dint of its memory, but heterogeneous memory address space. If they can keep addressing to a minimum with cpu serving the scalar graphics pipeline.
Posted on Reply
#40
Vya Domus
mtcn77Total disagreement. Logic dictates AMD ought to have a full house with the advent of compute graphics, however Nvidia still holds together with grace. Numbers aren't the issue, it is internal bandwidth and Nvidia knows it best to adopt mobile rasterization for this purpose.
One could say, yeah it is just a bunch of 3D stages seperated nicely into performance numbers, however that would overlook the runtime of data in flight. It is the architecture that makes bandwidth possible. AMD has its shot, not by the dint of its memory, but heterogeneous memory address space. If they can keep addressing to a minimum with cpu serving the scalar graphics pipeline.
I read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
Posted on Reply
#41
mtcn77
Vya DomusI read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
Fury X has bandwidth, too. The difference of external bandwidth is it is not agnostic and only available as a benchmark. Not internal bandwidth, though. It is truly agnostic.
Posted on Reply
#42
Vya Domus
mtcn77Fury X has bandwidth, too. The difference of external bandwidth is it is not agnostic and only available as a benchmark. Not internal bandwidth, though. It is truly agnostic.
Again, I have no idea what you are trying to say, there is no such thing as internal or external bandwidth for VRAM, it's just bandwidth, that's it.

A GPU is engineered to function with any amount of memory bandwidth available but to operate optimally with a specific minimal level. There is no point in putting faster GPUs in APUs when the memory isn't getting faster and there is no going around that, it's a hard limit.

This is probably the last time I respond as I have no clue what exactly are you arguing against, what you're writing is just borderline incoherent to me.
Posted on Reply
#43
mtcn77
You cannot change the gpu regime using bandwidth as a springboard. The gpu access patterns are the same. It takes 4 cycles to do a full read on anisotropic filtering primer(don't expect articulate nomenclature please). You cannot apply supersampling to leverage the memory bandwidth better, in supposition that since the gpu is big-die and the memory is cutting edge hbm that you will leverage it to the full extent, doing more work in the same number of cycles. You will not. The rendering depends on rasterization hardware, it is not thoroughput, well it says it is, but truly it is latency dependent. AF takes multiple bilinear attempts, supersampling takes multiple AF attempts. It is always conformant, never divergent. It is just that in the past, pixel shaders did the work, now the compute does the same. Is it faster? Well, only if you are programming it. It is just a proxy for hardware buffers(forgot the name).
Vya DomusA GPU is engineered to function with any amount of memory bandwidth but to operate optimally with a specific minimal level. There is no point in putting faster GPUs in APUs when the memory isn't getting faster and there is no going around that, it's a hard limit.
Well, if you have a better gpu, you have a more up to date gpu compiler. You can only utilise performance that the compiler can optimize the rasterization pipeline for. If you want all things equal, you have to look at the pipeline. As again, AMD has hardware support to parallelize the pipeline, but it is what it is.

The external bandwidth has data the gpu is unaware of. That is the difference. At full speed, it takes 250MHz to read memory end to end. Every cycle a single module of the GDDR5 system bursts just 4 bytes. It is not online memory like the registers are. Those are crazy.

I guess the correct term is,
it may take up to 768 threads to completely hide latency.
^that was how quick the registers are. 250MHz vs 768Hz.

Plus, GDDR5 is aligned. You get seriously worse performance when you introduce timings. Signals need to phase in.
Posted on Reply
#44
Valantar
Vya DomusI read your comment multiple times and I honestly couldn't understand one iota of what you wrote. Bandwidth is bandwidth, it's an architecture agnostic characteristic.
Sadly that is par for the course with that user's posts - they tend to be a word salad of the highest order. I think it's a language barrier thing, but it's further compounded by an outright refusal on their part to even attempt to clarify what they are trying to say. (Might be they are using a translation service? That would definitely complicate any explanation, though they seem to fundamentally refuse to accept even the slightest suggestion that they have been unclear about anything whatsoever, and seem to assume that any failure of comprehension is entirely due to the reader's lack of knowledge rather than their writing. It's rather fascinating.) Sometimes parts of it makes sense, but I've never seen a post of theirs longer than a few sentences make sense as a whole, and typically not even that.

Sadly, that doesn't stop me from trying. You might call this tilting at windmills, but one day - one day! - I want to have a coherent and comprehensible discussion with them.
mtcn77You cannot change the gpu regime using bandwidth as a springboard. The gpu access patterns are the same. It takes 4 cycles to do a full read on anisotropic filtering primer(don't expect articulate nomenclature please). You cannot apply supersampling to leverage the memory bandwidth better, in supposition that since the gpu is big-die and the memory is cutting edge hbm that you will leverage it to the full extent, doing more work in the same number of cycles. You will not. The rendering depends on rasterization hardware, it is not thoroughput, well it says it is, but truly it is latency dependent. AF takes multiple bilinear attempts, supersampling takes multiple AF attempts. It is always conformant, never divergent. It is just that in the past, pixel shaders did the work, now the compute does the same. Is it faster? Well, only if you are programming it. It is just a proxy for hardware buffers(forgot the name).

Well, if you have a better gpu, you have a more up to date gpu compiler. You can only utilise performance that the compiler can optimize the rasterization pipeline for. If you want all things equal, you have to look at the pipeline. As again, AMD has hardware support to parallelize the pipeline, but it is what it is.

The external bandwidth has data the gpu is unaware of. That is the difference. At full speed, it takes 250MHz to read memory end to end. Every cycle a single module of the GDDR5 system bursts just 4 bytes. It is not online memory like the registers are. Those are crazy.

I guess the correct term is,
^that was how quick the registers are. 250MHz vs 768Hz.

Plus, GDDR5 is aligned. You get seriously worse performance when you introduce timings. Signals need to phase in.
You misunderstand the issues being raised against you. You are claiming that increasing external memory bandwidth wouldn't help iGPUs because they are limited by internal restrictions. While parts of what you say are true, the whole is not. While there are of course bandwidth limitations to the internal interconnects and data paths of any piece of hardware, these interconnects have massive bandwidth compared to any external memory interface, and these internal pathways are thus rarely a bottleneck. For an iGPU this is especially true as the external memory bandwidth is comparatively tiny. Compounding this is the fact that architecturally the iGPUs are the same as their larger dGPU siblings, meaning they have the same internal characteristics. If what you say was true, then a Vega 64 at the same clocks as a Vega 8 iGPU would perform the same as they would both be limited by internal bandwidth. They obviously don't, and thus aren't.

Beyond this, your post is full of confused terminology and factual errors.

Simple ones first: how is supersampling (a form of anti-aliasing) related to anisotropic filtering (texture filtering)? And how does the computational cost of performing an operation like that become "bandwidth"? What you are describing is various aspects of the processing power of the GPU. Processing of course has its metrics, but bandwidth is not one of them, as bandwidth is a term for data transfer speed and not processing speed (unless used wrongly or metaphorically). Of course this could be relevant through the simple fact that no shader can compute anything without having data to process, which is dependent on external memory. You can't do anisotropic filtering on a texture that isn't available in time. But other than that, what you are saying here doesn't relate much to bandwidth.

Second: the statement "it takes 250MHz to read memory end to end" is a meaningless statement. Hz is a measure of cycles per second. Any amount of memory can be read end to end at any rate of cycles/second if given sufficient time. Do you mean to read a specific amount of memory end to end within a specific time frame over a specific bus width? You need to specify all of these data points for that statement to make sense. Also, the point of memory bandwidth is to be able to deliver lots of data rapidly, but not to read the entire memory end to end - most data in VRAM is unused at any given time. The point of increased memory bandwidth is thus not to be able to deliver the full amount of memory faster, but to be able to keep delivering the necessary amount of data to output a frame at either a higher detail level/resolution at the same rate, or at the same detail level/resolution at a higher rate.

Also, how does 768 threads become 768Hz? A thread is not a cycle. 768 threads means 768 parallel (or sequential, though that would be rare for a GPU) threads at the given speed, for however long the threads are running. The statement you quoted seems to be saying that at a given speed (I assume this is provided on your source) 768 threads would be needed to overcome the latency of the system as compared to the X number of threads (again, I assume provided in your source) the current system actually has. (Btw, where is that quote from? No source provided, and not enough context to know what you're quoting.) The quote certainly doesn't seem to say what you mean it to say.
Posted on Reply
Add your own comment
Jun 30th, 2024 15:42 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts