Monday, January 6th 2025
NVIDIA 2025 International CES Keynote: Liveblog
NVIDIA kicks off the 2025 International CES with a bang. The company is expected to debut its new GeForce "Blackwell" RTX 5000 generation of gaming graphics cards. It is also expected to launch new technology, such as neural rendering, and DLSS 4. The company is also expected to highlight a new piece of silicon for Windows on Arm laptops, showcase the next in its Drive PX FSD hardware, and probably even talk about its next-generation "Blackwell Ultra" AI GPU, and if we're lucky, even namedrop "Rubin." Join us, as we liveblog CEO Jensen Huang's keynote address.02:22 UTC: The show is finally underway!02:35 UTC: CTA president Gary Shaprio kicks off the show, introduces Jensen Huang.02:46 UTC: "Tokens are the building blocks of AI"
02:46 UTC: "Do you like my jacket?"02:47 UTC: NVIDIA recounts progress all the way till NV1 and UDA.02:48 UTC: "CUDA was difficult to explain, it took 6 years to get the industry to like it"02:50 UTC: "AI is coming home to GeForce". NVIDIA teases neural material and neural rendering. Rendered on "Blackwell"02:55 UTC: Every single pixel is ray traced, thanks to AI rendering.02:55 UTC: Here it is, the GeForce RTX 5090.03:20 UTC: At least someone is pushing the limits for GPUs.03:22 UTC: Incredible board design.03:22 UTC: RTX 5070 matches RTX 4090 at $550.03:24 UTC: Here's the lineup, available from January.03:24 UTC: RTX 5070 Laptop starts at $1299.03:24 UTC: "The future of computer graphics is neural rendering"03:25 UTC: Laptops powered by RTX Blackwell: staring prices:03:26 UTC: AI has come back to power GeForce.03:28 UTC: Supposedly the Grace Blackwell NVLink72.03:28 UTC: 1.4 ExaFLOPS.03:32 UTC: NVIDIA very sneakily teased a Windows AI PC chip.
03:35 UTC: NVIDIA is teaching generative AI basic physics. NVIDIA Cosmos, a world foundation model.03:41 UTC: NVIDIA Cosmos is trained on 20 million hours of video.
03:43 UTC: Cosmos is open-licensed on GitHub.
03:52 UTC: NVIDIA onboards Toyota for its next generation EV for full-self driving.
03:53 UTC: NVIDIA unveils Thor Blackwell robotics processor.03:53 UTC: Thor is 20x the processing capability of Orin.
03:54 UTC: CUDA is now a functional safe computer thanks to its automobile certifications.04:01 UTC: NVIDIA brought a dozen humanoid robots to the stage.
04:07 UTC: Project DIGITS, is a shrunk down AI supercomputer.04:08 UTC: NVIDIA GB110 "Grace-Blackwell" chip powers DIGITS.
02:46 UTC: "Do you like my jacket?"02:47 UTC: NVIDIA recounts progress all the way till NV1 and UDA.02:48 UTC: "CUDA was difficult to explain, it took 6 years to get the industry to like it"02:50 UTC: "AI is coming home to GeForce". NVIDIA teases neural material and neural rendering. Rendered on "Blackwell"02:55 UTC: Every single pixel is ray traced, thanks to AI rendering.02:55 UTC: Here it is, the GeForce RTX 5090.03:20 UTC: At least someone is pushing the limits for GPUs.03:22 UTC: Incredible board design.03:22 UTC: RTX 5070 matches RTX 4090 at $550.03:24 UTC: Here's the lineup, available from January.03:24 UTC: RTX 5070 Laptop starts at $1299.03:24 UTC: "The future of computer graphics is neural rendering"03:25 UTC: Laptops powered by RTX Blackwell: staring prices:03:26 UTC: AI has come back to power GeForce.03:28 UTC: Supposedly the Grace Blackwell NVLink72.03:28 UTC: 1.4 ExaFLOPS.03:32 UTC: NVIDIA very sneakily teased a Windows AI PC chip.
03:35 UTC: NVIDIA is teaching generative AI basic physics. NVIDIA Cosmos, a world foundation model.03:41 UTC: NVIDIA Cosmos is trained on 20 million hours of video.
03:43 UTC: Cosmos is open-licensed on GitHub.
03:52 UTC: NVIDIA onboards Toyota for its next generation EV for full-self driving.
03:53 UTC: NVIDIA unveils Thor Blackwell robotics processor.03:53 UTC: Thor is 20x the processing capability of Orin.
03:54 UTC: CUDA is now a functional safe computer thanks to its automobile certifications.04:01 UTC: NVIDIA brought a dozen humanoid robots to the stage.
04:07 UTC: Project DIGITS, is a shrunk down AI supercomputer.04:08 UTC: NVIDIA GB110 "Grace-Blackwell" chip powers DIGITS.
446 Comments on NVIDIA 2025 International CES Keynote: Liveblog
If they give it 12, they will cannibalize the 5070. This entire Blackwell stack is positioned such that it doesn't look terrible if you still have Ada, while still giving Ada owners an incentive to upgrade. They can after all, sell their cards and buy a replacement at near cost neutrality.
This is precisely what is happening with the 4090s on the market right now. Nvidia's executing a perfect strategy here because AMD isn't even playing.
Look here. Perfect price parity with the 5000+ shader count 5090.
We will have the 4090 taking the slot between the 5080 and 5090 for the foreseeable future. Nvidia doesn't need anything in between.
The above is just under half the number of sellers on this site, now...
Here's another search just for the lulz. There are almost no (literally 3 in Netherlands!!) sellers of a 7900XTX. AMD ensured its own stagnation, these owners will sooner or later jump ship. Fantastic plan, going midrange!
That 30% improvement there is likely what we can really expect in the overwhelming majority of games. The 5080 has 15% more compute (cores*clocks) and sucks down more power despite being a newer, more efficient node, so the other 15% likely comes from the 4080 being sandbagged by power limits.
The left most bars indeed don't say DLSS. But they do say RT.
Raster performance might be at a complete standstill, just RT ON is improved, going by this chart. It does not say a thing about raster perf.
Technically msrp was 599 but that was when Nvidia started the FE BS the price was reduced a year later when the 1080ti released to 499/549FE
They're running models at half the precession on 50 series cards and comparing them to FP8 on 40 series because presumably at the same precision they're not faster at all. Pretty much everything they've shown is a smokescreen, this might just be the most disingenuous marketing material they've ever released, there's not a single example of a performance claim where they haven't screwed with it in some way.
For those of you that don't know lower precision quantized models are worse, often unusable for some applications, so even the "14638746728463287 gazillion AI TOPS" meme is a lie.
morethanmoore.substack.com/p/where-was-rdna4-at-amds-keynote
TLDR; they said that the product was not yet finished, they wouldn't have enough time to showcase it during their overall presentation, and that nvidia's announcement did have a part on the decision to not showcase RDNA4 (they want to undercut it). If they can get UDNA right, that would simplify a lot of things given they'll be able to do exactly what they done with Zen: have chiplets that provide great value in the enterprise (which brings big bucks), and that can also be used in the consumer market, all of those out of the same fabrication line. This is also exactly what Nvidia has been doing for quite a long time (albeit not with chiplets).
Their GPU division currently has both CDNA and RDNA, which not only need to compete in engineering time, but also fab allocation. Given how CDNA is bringing more money than RDNA, it makes sense to focus on that. MCM is more about the fab efficiency, but the architecture design is a bit different from that. See how Zen has both MCM and monolithic products, and also how their RDNA design exists in both MCM and also in monolithic designs in iGPUs.
It's actually great in iGPUs, I guess they're just lacking in resources to scale it up because it makes more sense to put efforts into CDNA as the "big product" instead. Tbf most users here won't fall for that, and everyone will wait for the proper reviews nonetheless, so that's like preaching to a choir. I got my 3090s used for like 1/2 and then 1/4 of their launch prices here after the mining craze, can't beat such value :p Funnily enough, yes. The numbers are for INT4 dense, sparsity is a nvidia-exclusive thing that's not that easy to use (you have to rearrange your tensors to make use of it). I had explained it to someone else, but I'll write it up again:
Flux is often memory-bound, just like LLMs. The gains you see there are mostly from the extra 80% in memory bandwidth the 5090 has. Even running it in FP8 (which my 3090 doesn't even have support for) leads to a really minor perf diference, while using FP8 vs FP16 on a 4090 barely nets a perf gain, something around 5~10% in both scenarios. Same likely goes for this FP4 vs FP8 comparison.
You are also forgetting that there are different types of quantizations. Your Q4, gguf, ggml stuff is about compressing stuff for storage/memory, but you still do the maths in fp16, which leads to a noticeable lower performance. Doing proper quantization on a model through some extra fine-tuning with precision-awareness leads to way better quality than just shoving the original weights in a smaller data type.
Just take a look by yourself at the results from their model vs the bfp16 one:
BF16 on the left and FP4 on the right
blackforestlabs.ai/flux-nvidia-blackwell/
Clearly not as good as the FP16, but way better than your usual Q8 quants. Unlike games, for inference you often aim for the smallest supported data type for both vram savings and extra throughput. When tensor cores came out, everyone switched to FP16. When Ada/Hopper came out, everyone started doing FP8. The trend still goes this way.
Many LLMs are running in Q4, Q6 and Q8 out there in production by many different providers. It does matter because that's how people are going to run it given the hardware support, period.
They could have had separate charts showcasing VRAM usage as well, making a point about being able to run these things on lesser GPUs with less memory but they're so hell bent on lying and being as disingenuous as possible they don't even know when to use this to their advantage.
But a 70B Q4 model is still way better than a 30B Q8 one, and a bigger model at bigger data types is useless if you can't get it to run to begin with, or if performance is not good enough.
For smaller models the quantization perf loss is not that significative. I get your point about "fairness", but I bet you performance would be pretty close to what was shown at FP8 because, as I've told before, this is mostly a memory bw issue. Doing it at FP4 is a showcase of a newly supported data type that we didn't have before (with only a minor perf uplift in this specific case).
You'd be better going back to complaining about the FG comparisons. Given how most people here are hoping that a 5080 matches or surpasses a 4090, I don't think people here bought into the 5070=4090 idea.
Look, the thing is, there was another company at CES that compared their 120w CPU vs the competitions 17w chip. With no small letters btw. No one is talking about it being misleading, but we have 50 different threads 20 pages long complaining about nvidia. Makes you wonder
The CPU in question was strix point (390Ai). But you know, it's amd, so it's not trying to mislead us :D