I got my hands on the expected Brazilian pricing and launch dates, the ones that launch 3 February are pre-orders, the ones 12 and 15 Feb are standard orders. I can't vouch for the authenticity of this list with 100% certainty so take this with a grain of salt, but I think this math is mostly mathin'.
This has been confirmed to be fake, it was just the supposed values in USD converted to BRL. Do notice how some actually end up below the US MSRP prices.
Curious though, what do you think happened? They got some info from what nvidia is going to do at the last minute and bailed to get back to the drawing board? Maybe there wasn't any plan to announce the 9070 at all and people just assumed?
Ian Cutress wrote up a bit on that, AMD did a Q&A with some journalists trying to explain that:
Timing is Everything
morethanmoore.substack.com
TLDR; they said that the product was not yet finished, they wouldn't have enough time to showcase it during their overall presentation, and that nvidia's announcement did have a part on the decision to not showcase RDNA4 (they want to undercut it).
Honestly I think they made a big bet on MCM for the gaming division hoping for a Zen moment and it did not pan out.... Sounds like they did have a high end RDNA4 chip planned but canceled it.
The reality is the low end stuff from Nvidia is getting worse and worse and AMD isn't offering alternatives that get people excited. If RDNA4 is a bust lets hope UDNA is the answer.
If they can get UDNA right, that would simplify a lot of things given they'll be able to do exactly what they done with Zen: have chiplets that provide great value in the enterprise (which brings big bucks), and that can also be used in the consumer market, all of those out of the same fabrication line. This is also exactly what Nvidia has been doing for quite a long time (albeit not with chiplets).
Their GPU division currently has both CDNA and RDNA, which not only need to compete in engineering time, but also fab allocation. Given how CDNA is bringing more money than RDNA, it makes sense to focus on that.
Yeah MCM or not, even if they had a bigger chip, they should have also had RT performance laying on the shelf to go with it. Which they might not have after all; RDNA3 was supposed to perform better even regardless of MCM. I think there were mostly promises, hopes and dreams flying around but people simply did not (manage to...) deliver. And this seems to be a recurring thing, not exclusive to Raja.
MCM is more about the fab efficiency, but the architecture design is a bit different from that. See how Zen has both MCM and monolithic products, and also how their RDNA design exists in both MCM and also in monolithic designs in iGPUs.
It's actually great in iGPUs, I guess they're just lacking in resources to scale it up because it makes more sense to put efforts into CDNA as the "big product" instead.
And some people don't get why I spoke so harshly against the misleading MFG performance data in the keynote. This is why. People are very easy to manipulate, and that is exactly what Nvidia is doing.
Tbf most users here won't fall for that, and everyone will wait for the proper reviews nonetheless, so that's like preaching to a choir.
That's the real upgrade path here. 2nd hand last gen as all the n00bs upgrade to the latest greatest that didn't gain them anything.
I got my 3090s used for like 1/2 and then 1/4 of their launch prices here after the mining craze, can't beat such value
You mean NV is comparing apples to apples in this one?, this would be nice (I guess since there is no fineprint, it might be so). Then the AI TOPS would indeed be massively improved (+70% when adjusting for the power increase of +25% for the 5070 vs 4070: 988/(466*1.25)).
According to "nvidia-ada-gpu-architecture.pdf", the 4090 is:
So it's either 1321 INT8 sparse or 1321 INT4 dense? Anyway, what matters more, is that it's an apples to apples comparison.
Funnily enough, yes. The numbers are for INT4 dense, sparsity is a nvidia-exclusive thing that's not that easy to use (you have to rearrange your tensors to make use of it).
They're running models at half the precession on 50 series cards and comparing them to FP8 on 40 series because presumably at the same precision they're not faster at all. Pretty much everything they've shown is a smokescreen, this might just be the most disingenuous marketing material they've ever released, there's not a single example of a performance claim where they haven't screwed with it in some way.
For those of you that don't know lower precision quantized models are worse, often unusable for some applications, so even the "14638746728463287 gazillion AI TOPS" meme is a lie.
I had explained it to someone else, but I'll write it up again:
Flux is often memory-bound, just like LLMs. The gains you see there are mostly from the extra 80% in memory bandwidth the 5090 has. Even running it in FP8 (which my 3090 doesn't even have support for) leads to a really minor perf diference, while using FP8 vs FP16 on a 4090 barely nets a perf gain, something around 5~10% in both scenarios. Same likely goes for this FP4 vs FP8 comparison.
You are also forgetting that there are different types of quantizations. Your Q4, gguf, ggml stuff is about compressing stuff for storage/memory, but you still do the maths in fp16, which leads to a noticeable lower performance. Doing proper quantization on a model through some extra fine-tuning with precision-awareness leads to way better quality than just shoving the original weights in a smaller data type.
Just take a look by yourself at the results from their model vs the bfp16 one:
BF16 on the left and FP4 on the right
Our new collaboration with NVIDIA marks a significant leap forward in making our FLUX models more universally accessible and efficient. Through reduced memory requirements, faster performance…
blackforestlabs.ai
Clearly not as good as the FP16, but way better than your usual Q8 quants.
Yeah sure, they're not lying, just showing you a ginormous bar chart where the thing is 2X times faster and then a miniscule text below telling you that actually it's not.
Unlike games, for inference you often aim for the smallest supported data type for both vram savings and extra throughput. When tensor cores came out, everyone switched to FP16. When Ada/Hopper came out, everyone started doing FP8. The trend still goes this way.