To be fair all three of those things changed things as to them changing "everything" idk whom made such absurd claims and remarks lol, but if you think HBM2 isn't great I'd hate to disappoint you what do you think both companies are using for their professional tier graphics cards exactly. Mantle isn't any worse than other GPU tech that requires developer support be it DLSS/RTX/CF/SLI/PhysX or any other proprietary 3D voodoo FX GPU hardware + developer magic. As for 7nm it's changed plenty just look at what it changed for Ryzen chips go on tell it hasn't changed anything or are you still clinging to a quad core Intel chip!? I mean let's not pretend 7nm hasn't made any difference obviously it has and will continue to do so TSMC has plenty of time for 7nm++++++++++++++ I mean Intel has paved the way for it.
I'd say the 5600m and the new Radeon Pro VII are both intriguing parts and Renoir as well. AMD just need shuffle something of the things together that's got or worked on. That would include Radeon Pro/Vega HBCC particularly the card where they utilized a M.2 slot. I think AMD is in position to do a lot of intriguing things on the GPU side similar to how Ryzen was able to shake things up on the CPU side of things. I'm not saying it'll happen immediately, but I have a feeling they are going to hit back hard again one of these days on the GPU side. Chances are rather likely that it'll be during a period when the CPU side of the business begins to wane or is waning again. It would stand to reason that would be a transition period of time where they'd certainly make a concerted effort to double or triple down on the R&D of their GPU division portfolio to leverage them as they up with a new CPU architecture design win again.
I would like to think on the gaming side inverse of the new Radeon Pro VII's Double 64FP precision they'll go more in the opposite direction with half precision floating point 16FP which seems like it would tie in with variable rate shading more appropriately actually. The double precision 64FP seems like it would be more beneficial to less stringent non "real time" rendering requirements and flexibility while half precision I'd think to be more the opposite enabling more fine granularity though a mixture of that 32FP and double 64FP is likely in order at some stage or another for gaming to leverage them all with variable rate shading to the best extent.
Probably something like 50% going to 32FP 37.5% to half precision 16FP and 12.5% double precision 64FP for gaming cards is what I'd expect in the future while more compute world loads would reverse half precision in favor of double precision. That ratio might be closer to 6.25%/43.75% for half precision/double precision, but I'd expect 32FP to remain rather neutral that or we could see quarter precision and quad precision take more of a split of resource allocations, but keeping FP32 the majority of resource allocation. In that scenario it would be more like 6.25% FP8, 6.25% FP16, 50% FP32, 18.75% FP64 18.75% FP128 or you could inverse the FP64/FP128 with FP8/FP16 between gaming and compute consumer orientated graphics cards. I'm mostly speculating on that, but I think more granularity is certainly beneficial especially with variable rate shading. In terms of the floating point precision aspect though I'd say that applies to AMD and Nvidia as well as Intel "if" they do ultimately become competitive at discrete graphics.
On the APU side I could see AMD teaming a APU with a x16 discrete APU that matches it's specs for both the CPU/GPU cores/cu's increasing the overall combined system resources for both tasks in the future. I maybe perhaps it's too late for that right now with it's latest APU or perhaps not, but I do see that as a very potential possibility in the future and I really do think that would have a big appeal to a great deal of people that just want a nice affordable balance and handy upgrade path. I mean sure maybe perhaps GPU's wouldn't scale perfectly being teamed together in a CF format in all instances, but additional CPU cores is likely to still be beneficial in instances where perhaps that doesn't apply so it could still be a overall net gain and positive. Basically even if that only ticks 1 out of 2 check boxes between the two it's still a net gain of either or scenario which is a cool thing to think about and AMD is best positioned right now to offer it to consumers because Intel hasn't exactly proven itself in that area nearly as well at this point then again perhaps they have more than we give them credit given how their integrated GPU's have slowly been eroding discrete graphics over the years then again that's true of any of the companies making integrated graphics in any form or another form Nvidia back on LGA775 to Intel today as well as AMD.
An add-on APU AIC over PCIe would be a terrible idea unless it included modifications to the Windows scheduler that strictly segregated the two chips with no related processes ever crossing between the two. Without that you would have absolutely
horrible memory latency issues and other NUMA-related performance issues, just exacerbated by being connected over (for this use) slow PCIe. Remember how 1st and 2nd generation Threadripper struggled to scale due to NUMA issues? It would be that, just multiplied by several orders of magnitude due to the PCIe link latency. It could work as a compute coprocessor or something similar (running its own discrete workloads), but it would be useless for combining with the existing CPU/APU. Scaling would be horrendous.
As for FP32/16/8, most if not all modern GPU architectures (
Vega and onwards from AMD) support Rapid Packed Math or similar techniques for "packing" multiple smaller instructions (INT8 or FP16) into FP32 execution units for 100% performance scaling (i.e. 2:1 FP16 to FP32 or 4:1 INT8 to FP32). No additional hardware is needed for this beyond the changes to shader cores that have already been in existence for several years. So any modern GPU with X TFLOPS FP32 should be able to compute 2X TFLOPS FP16 or 4X INT8. FP64 needs additional hardware as it is (at least for now, in consumer GPUs) not possible to combine multiple FP32 units to one FP64 unit or anything like that (might be possible if they built it that way), but FP64 as you say has little utility in consumer applications that isn't happening. CDNA is likely to aim for everything between INT8 and FP64 as the full range is useful for HPC, ML and other datacenter uses.
It will be very interesting to see if game engine developers start to utilize FP16 more in the coming years, now that GPUs generally support it well and frameworks for its utilization have been in place for a while. It could be very useful to speed up rendering of less important parts of the screen, perhaps especially if combined with foveated rendering for HMDs with eye tracking.
I can't imagine that I live in a world where fanbois argue about pre-release specs that come out of the advertising department ... and, on top of that, arguing that their brand's fake specs are all real and the other guys are all fake. Save ya arguing for then the cards are tested. My bet is we just going to see more of the same ...
Mantle was gonna change everything ... it didn't
HBM2 was gonna change everything ... it didn't
7nm was gonna change everything ... it didn't
What we do know is that the GPU market stopped being competitive with the 7xx versus 2xx series, where nVidia walked away with the top two tiers (all cards overclocked). AMD lost another tier against the 970 and another tier against the 1060. The next generation didn't go well for both sides in some respects ... AMD had to make huge price cuts; nVidia didn't because they didn't have to. The bright shining light to was the 5600 XT, pretty much nothing else got me excited out of AMD ... if they can scale that up into the upper tiers, things may finally get interesting.
Well ...
Mantle paved the way for Vulkan and DX12, the current dominant graphics API and the clear runner-up. Without AMD's push for closer-to-the-hardware APIs we might not have seen this arrive as quickly. Has it revolutionized performance? No. But it leaves us a lot of room for growth that DX11 and OpenGL was running out of due to overhead issues. While there are typically negligible performance differences between the different APIs in games that support several (and the older often perform better), this is mainly down to a few factors: more familiarity with programming for the older API, needing to program for the lowest common denominator (i.e. no opportunity to specifically utilize the advantages of newer APIs), etc.
HBM(2) represents a true generational leap in power efficiency per bandwidth, and is still far superior to any GDDR or DDR technology. The issue is that adoption has been slow and the only major markets have been high-margin enterprise products, leading to prices stagnating at very high levels. Though to be fair, given the high price of GDDR6 this is less of an issue than two years ago. Still, the cost of entry is higher due to the need for an interposer (or something EMIB-likes) and more exotic packaging technology, and this means that GPUs using HBM have typically been expensive. Of course it's also gotten a worse than deserved reputation due to the otherwise unimpressive performance of the GPUs it's been paired with. Nonetheless, GPUs like the recently announced Radeon Pro 5600M shows just how large of an impact it can have on power efficiency while delivering excellent performance. I'm still hoping for HBM2(e?) on "big Navi".
7nm (and Zen 2, of course) took AMD from "good performance, great value for money, particularly with multithreaded applications" to "clear overall performance winner,
clear efficiency winner, minor ST disadvantage" in the CPU space. It in combination with RDNA (which is not to be discounted in terms of efficiency when compared to 7nm GCN in the Radeon VII) brought AMD to overall perf/W parity with Nvidia even in frequency-pushed SKUs like the 5700 XT, which we hadn't seen since the Kepler/early GCN era before that. We've also seen that lower clocked versions of 7nm RDNA (original BIOS 5600 XT and Radeon Pro 5600M) are able to notably surpass anything Nvidia has to offer in terms of efficiency. Now, of course there is a significant node advantage in play here, but 7nm has nonetheless helped AMD reach a point in the competitive landscape that it hasn't seen on either the CPU or GPU side for many, many years. With AMD promising 50% improved perf/W for RDNA2 (even if that is peak and the average number is, say, 30%) we're looking at some very interesting AMD GPUs coming up.
It's absolutely true that AMD has a history of over-promising and under-delivering, particularly in the years leading up to the RDNA launch, but things are looking like that has changed. The upcoming year is going to be exciting from both GPU makers, consoles are looking exciting, and even the CPU space is showing some signs of actually being interesting again (though mostly in mobile).