Tuesday, September 22nd 2020
AMD Radeon "Navy Flounder" Features 40CU, 192-bit GDDR6 Memory
AMD uses offbeat codenames such as the "Great Horned Owl," "Sienna Cichlid" and "Navy Flounder" to identify sources of leaks internally. One such upcoming product, codenamed "Navy Flounder," is shaping up to be a possible successor to the RX 5500 XT, the company's 1080p segment-leading product. According to ROCm compute code fished out by stblr on Reddit, this GPU is configured with 40 compute units, a step up from 14 on the RX 5500 XT, and retains a 192-bit wide GDDR6 memory interface.
Assuming the RDNA2 compute unit on next-gen Radeon RX graphics processors has the same number of stream processors per CU, we're looking at 2,560 stream processors for the "Navy Flounder," compared to 80 on "Sienna Cichlid." The 192-bit wide memory interface allows a high degree of segmentation for AMD's product managers for graphics cards under the $250-mark.
Sources:
VideoCardz, stblr (Reddit)
Assuming the RDNA2 compute unit on next-gen Radeon RX graphics processors has the same number of stream processors per CU, we're looking at 2,560 stream processors for the "Navy Flounder," compared to 80 on "Sienna Cichlid." The 192-bit wide memory interface allows a high degree of segmentation for AMD's product managers for graphics cards under the $250-mark.
135 Comments on AMD Radeon "Navy Flounder" Features 40CU, 192-bit GDDR6 Memory
Anyways, although I appreciate sincerely the fact that you are trying to educate me on how to choose my sources, there's no need for it really, I suggest you apply the methodology that you preach and share with us your solid sources and information about the architecture of the coming Navi21. And that would have the added quality of being on topic.
As for my sources: I don't have any, as I haven't been making any claims about the architecture of these GPUs. I've speculated loosely on the basis of existing chips and known data about the nodes, and I have engaged with the speculations of others by attempting to compare them with what we know about current products and generalizable knowledge about chip production and design, but I have made zero claims about how Navi 21 will be. Especially architecturally, as there is no way of knowing that without a trustworthy source. So I don't quite see how your objection applies. If I had been making any claims, I obviously would need to source them (which, for example, I did in rebutting the RTX 3080 launch being a paper launch).
As for pointing out ad hominem arguments: If you're alluding to @Assimilator's semi-rant against MLID, that post presents an entirely valid criticism of a specific sub-genre of tech youtubers. It does of course engage with their motivations - in this case, preferring clicks, ad-views and profit over trustworthy sourcing - but that is a relevant argument when looking into the trustworthiness of a source. One would have a very, very hard time judging the quality of a source if one wasn't allowed to account for their motivations in such a judgement, and there's nothing unfair or personal about that. For example, there's long-running debate about open access vs. paywalled publication of research in academia, a situations in which the arguments presented by publishers and paywalled journals obviously need to be contrasted with their motivations for profit and continued operation, as such motivations can reasonably cause them to be biased. Just like the statements of most career politicians obviously need to be judged against their desire to get reelected.
Now can we please move past this silly bickering?
Ad hominem it's attacking the person instead of their arguments. It's rude and it's a simple way of trying to win an argument without being right. If you attacked me instead of my ideas, that would be indeed both rude and a sophism, but you're not doing that, you're just a tad patronizing, but hey I've seen worse on fora.
I haven't alluded to anything, I've put in a quote that you asked, the opening of the quote is a typical ad hominem.
Now, I already told you that you have a weird way of not putting enough effort to understand the other's ideas, but putting a lot of effort to argue with them.
Probably the fastest way to end the bickering would be to use the ignore button, but that would be a pity because from time to time you guys do say interesting things. But at the same time most discussions end up feeling like a waste of my time so, yeah, maybe that would be the better solution.
Now back to the topic of this discussion: there are some rumors about AMD having overhauled the memory architecture of Big Navi. The 2 guys talking about that are RGT and MLID. These are just rumors, although RGT said the rumors came with photos of the real cards and showed them. As always, they might be true, they might not.
If you have a source that says otherwise, or an argument to why that is not true, please share. If you have nothing to contribute to the conversation than personal attacks and free advice about how to check sources, there's really no need for it, and you already did that.
So, let's get on to the topic, please.
I meant the official prices by nVidia.
I watch them all(tubers, websites etc, etc)
, and extract small amounts of trend data personally, then use salt.
Flounder's eh , where's that salt.
As for being a waste of your time, get over yourself and stop wasting our time with your pompous attitude.
I guess it boils down to performance, cost, and efficiency and how they all interrelate together. I would be viewed a bit like GTX970 though doing it in that manner, but the difference is they could have another cache tier in NVME boosting the performance parity drop off and variable rate shading is the other major factor had that been a thing with the GTX970 some of those performance points might not have been as pronounced and big a issue. I only see it happening if makes sense from a relative cost to performance perspective otherwise it seems far more likely they use one or the other.
Something to speculate is if they did do infinity cache they might scale it alongside the memory bus bit rate something like 192-bit rate 96MB infinity cache 6GB VRAM /256-bit rate 128MB infinity cache 12GB VRAM/ 320-bit rate160MB infinity cache 18GB VRAM. It's anyone's guess what AMD's game plan is for RDNA2, but we'll know soon enough.
www.techpowerup.com/forums/threads/amd-radeon-navy-flounder-features-40cu-192-bit-gddr6-memory.272437/post-4357107
I'm pretty dubious about the chance of any dual concurrent VRAM config though. That would be a complete and utter mess on the driver side. How do you decide which data ought to live where? It also doesn't quite compute in terms of the configuration: if you have even a single stack of HBM2(e), adding a 192-bit GDDR6 bus to that ... doesn't do all that much. A single stack of HBM2e goes up to at least 12GB (though up to 24GB at the most), and does 460GB/s bandwidth if it's the top-end 3.6Gbps/pin type. Does adding another layer of GDDR6 below that actually help anything? I guess you could increase cumulative bandwidth to ~800GB/s, but that also means dealing with a complicated two-tier memory system, which would inevitably carry significant complications with it. Even accounting for the cost of a larger interposer, I would think adding a second HBM2e stack would be no more expensive and would perform better than doing a HBM2e+GDDR6 setup. If it's actually that the fully enabled SKU gets 2x HBM2e, cut-down gets 192-bit GDDR6, on the other hand? That I could believe. That way they could bake both into a single die rather than having to make the HBM SKU a separate piece of silicon like the (undoubtedly very expensive) Apple only Navi 12 last time around. It would still be expensive and waste silicon, but given the relatively limited amount of wafers available from TSMC, it's likely better to churn out lots of one adaptable chip than to tape out two similar ones.
For the other thing, I'll try to give you a proper answer via PM later on.
The additional thing that makes this a "not gonna happen" is the amount of die area that's going to be needed for ray-tracing hardware. Considering how large Turing and Ampere dies are, wasting space on inactive MCs would be an exceedingly poor, and therefore unlikely, decision on AMD's part.
As for the hybrid GDDR/HBM on a single card... that's pie-in-the-sky BS, always has been.
As for the side memory thing; I don't believe they will use both configs at the same time, rather it is believed to have 2 memory controllers, one for HBM and one for gddr6
RedGamingTech reported and is adamant that hbm is not for the gaming cards, and that the gaming cards will have some sort of side memory.
I'm curious to see if there is any truth behind that report. If side memory uses less energy than gddr6 and is cheaper than hbm then it's a win I suppose. I hope that is the case honestly because of how useful it would be on APUs where bandwidth is limited.
It could also be a step before multi chip gaming gpus where that side memory basically acts as a L4 cache to feed the chips, so to begin moving in that direction makes the transition easier perhaps.
I don't see why that being much lower is an issue, efficiencies can improve.
Ah and invest more $$$ into driver quality control...yeah
Nvidia has a higher transistor count on a larger node for starters and we've seen what Intel's done with 14nm+++++++++++++ as well the idea that 7nm isn't maturing and hasn't is just asinine to think it defiantly has improved from a year ago and AMD defiantly can squeeze more transistors into the design as well at least as many as Nvidia's previous designs or more is reasonable to comprehend being entirely feasible. We can only wait and see what happens. Let's also not forget AMD also use to be fabrication company and spun off global foundries the same can't be said of Nvidia they could certainly be working closely with TSMC on improvements to the node itself for their designs and we some some signs that they did for Ryzen in fact work alongside TSMC to incorporate some node tweaks to get more out of the chip designs on the manufacturing side.
It's just one of those things where everyone is going to have to wait and see what AMD did come up with for RDNA2 will it underwhelm, overwhelm, or be about what you can expect from AMD all things taken into consideration!!? Nvidia is transitioning to a smaller node so the ball is more in their court in that sense however AMD's transistor count is lower so it's defiantly not that simple. If AMD incorporated something clever and cost effective they could certainly make big leaps in performance and efficiency though and we know that AMD's compression is already trailing Nvidia's so they have room to improve there as well. Worth noting is AMD is transitioning toward RTRT hardware, but we really don't at all know to what extent and how invested into it they plan incorporate into that on this initial pushing into it. I think if they match a RTX2080 on the RTRT side non SUPER model they honestly are doing fine with it RTRT isn't going to take off overnight and the RDNA3 design can be more aggressive things will have changed a lot by then and hopefully it'll be 5nm by that point and perhaps HBM costs will have improved.
- The recent leaks specifically mention clock speeds. Whether IPC has changed is thus irrelevant. 2.5GHz is 2.5GHz unless AMD has redefined what clock speed means (which they haven't). That's a 30-50% increase in clock speed from the fastest RDNA 1 SKU. If the rumors are accurate about this and about the power requirements - 150W! - and assuming IPC or perf/Tflop is the same, that's a more than 100% increase in perf/W before any IPC increases.
- An increase in both absolute performance and performance per watt without moving to a new node is unprecedented. We have not seen a change like that on the same node for at least the past decade of silicon manufacturing, and likely not even the decade before that. Both silicon production and chip design is enormously complex and highly mature processes, making revolutionary jumps like this extremely unlikely. Can it happen? Sure! Is it likely to? Not at all.
- The relationship between clock speed and transistor density is far too complex to be used in an argument the way you are doing. Besides, I never made any comparison to Intel or Nvidia, only to AMD's own previous generation, which is made on the same node (though it is now improved) and is based on an earlier version of the same architecture. We don't know the tweaks made to the node, nor how changed RDNA 2 is from RDNA 1, but assuming the combination is capable of doubling perf/W is wildly optimistic.
- Your example from Intel actually speaks against you: they spent literally four years improving their 14nm node, and what did they get from it? No real improvement in perf/W (outside of low power applications at least), but higher boost clocks and higher maximum power draws. They went from 4c/8t 4.2GHz boost/4GHz base at 91W (with max OC somewhere around 4.5-4.7GHz) to 10c/20t 3.7GHz base/various boost speeds up to 5.3GHz at 125W to sustain the base clock or ~250W for boost clocks (max OC around 5.3-5.4GHz). For a more apples to apples comparison, their current fastest 4c/8t chip is the 65W i3-10320 at 3.7GHz base/4.6GHz 1c/4.4GHz all-core. That's a lower TDP, but it still needs 90W for its boost clocks, and the base clock is lower. IPC has not budged. So, Intel, one of the historically best silicon manufacturing companies in the world, spent four years improving their node and got a massive boost in maximum power draw and thus maximum clocks, but essentially zero perf/W improvement. But you're expecting AMD to magically advise TSMC into massively improving their node in a single year?
- There's no doubt AMD is putting much more R&D effort into Radeon now than 3-5 years ago - they have much more cash on hand and a much stronger CPU portfolio, so that stands to reason. That means things ought to be improving, absolutely, but it does not warrant this level of optimism.
- I never said 7nm wasn't maturing. Stop putting words in my mouth.
You're arguing as if I'm being extremely pessimistic here or even saying I don't expect RDNA 2 to improve whatsoever, which is a very fundamental misreading of what I've been saying. I would be very, very happy if any of this turned out to be true, but this looks far too good to be true. It's wildly unrealistic. And yes, it would be an unprecedented jump in efficiency - bigger even than Kepler to Maxwell (which also happened on the same node). If AMD could pull that off? That would be amazing. But I'm not pinning my hopes on that.