• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA 2025 International CES Keynote: Liveblog

SRS

New Member
Joined
Oct 12, 2024
Messages
8 (0.08/day)
Nvidia isn't calling the 5090 a Titan card, even though the pricing and tech gap between it and the 5080 is much larger than Titan cards were, from my recollection. I could be wrong but I do not remember Titan being 50% more powerful than the next step down. I vaguely recall Titan also being more appealing because of its prosumer use cases.

Why would Nvidia do this? It has most likely decided that it will obtain more money by having reviewers (perhaps unwittingly) pressure buyers into the 5090 purchase (because it's not an extravagent Titan... it's simply the best of the regular consumer cards and is therefore the standard for most every review's benchmarks). It seems to be mainly a matter of optics.

It also exposes the problem of Nvidia's monopoly over higher-end/enthusiast consumer GPUs. Nvidia would never be able to get away with such a large gap if something even approaching adequate competition were in place. (I really loathe the minute difference = new product tier strategy but the fact is that consumers have shown they are williing to tolerate having so many products with tiny differences. 50% as a gap is excessive, however, even for me.)

I find it rather amusing to read so many "Oh... so inexpensive!" comments from the first two pages or so of this thread. $2500–$3000 is inexpensive? We all know how the game works by now, right? Only a tiny number of people will get the Founder's card (or whatever they're calling it) and there will be Reddit pages with people hoping, tracking, bragging about their BestBuy escapades. Everyone else will deal with scalpers, shortages, and 3rd-party cards with minuscule overclocks and a higher price. We've also lived through two cycles of mining-driven shortages at least, which compounded AMD's higher-end cards that seemed to be more designed for mining than for gaming. (Now we don't even have those to be disappointed with.)

Instead of hoping for things to be different this time... what I'd like to see is competition. Not a giant void where even a dupolist could be half-heartedly competing with cards that have too-small dies at too-high clocks with too-small coolers like the vaunted Radeon VII.

Most everything I've written above is debatable to some degree. However, the fact that the higher-end consumer GPU space (both for gaming and home AI, such as text-to-image) is occupied by a single corporate entity is not. That's not capitalism.
 
Joined
May 10, 2023
Messages
498 (0.81/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
Nvidia isn't calling the 5090 a Titan card, even though the pricing and tech gap between it and the 5080 is much larger than Titan cards were, from my recollection. I could be wrong but I do not remember Titan being 50% more powerful than the next step down. I vaguely recall Titan also being more appealing because of its prosumer use cases.
Titans never had much of a gap, if any, to begin with. I posted this in another thread:
I did a graph some time ago comparing the different generations and their % to the top die, similar to the one that has been floating around here, but that compares an assumed "top product" for each generation instead of the actual die nvidia uses:

I decided to give it a go and update it today with the known values of the blackwell gen:

Conclusions are up to each one.
Not only is the gap between the highest and the 2nd one bigger, seems like we have a new trend where consumers don't even get the full die (or close to it) anymore.
$2500–$3000 is inexpensive?
For games it's pretty expensive. But for productivity? It's still quite cheap.
I had heard a rumor about 80% of the 4090s being used in AI farms, but I don't have anything to back this up, but this would mean that indeed those cards are not even being used for games anymore, and Nvidia is pretty aware of that.
Most everything I've written above is debatable to some degree. However, the fact that the higher-end consumer GPU space (both for gaming and home AI, such as text-to-image) is occupied by a single corporate entity is not. That's not capitalism.
It is awful for sure. But I the same time, I don't feel it's like Nvidia has been bribing companies to not use their competitors. They have been just doing a good job and the others didn't try to stack up to it, which is bad but I'm not sure we could fine Nvidia for that.
IIRC, France did try to investigate Nvidia on monopolistic practices and didn't come up with anything.
 
Joined
Sep 19, 2014
Messages
114 (0.03/day)
Can be very good for 5070 12GB

  • RTX Neural Texture Compression
    uses AI to compress thousands of textures in less than a minute. Their neural representations are stored or accessed in real time or loaded directly into memory without further modification.
    The neurally compressed textures save up to 7x more VRAM or system memory than traditional block compressed textures at the same visual quality.
 
Joined
Jan 14, 2019
Messages
13,453 (6.13/day)
Location
Midlands, UK
Processor Various Intel and AMD CPUs
Motherboard Micro-ATX and mini-ITX
Cooling Yes
Memory Overclocking is overrated
Video Card(s) Various Nvidia and AMD GPUs
Storage A lot
Display(s) Monitors and TVs
Case The smaller the better
Audio Device(s) Speakers and headphones
Power Supply 300 to 750 W, bronze to gold
Mouse Wireless
Keyboard Mechanic
VR HMD Not yet
Software Linux gaming master race
Can be very good for 5070 12GB

  • RTX Neural Texture Compression
    uses AI to compress thousands of textures in less than a minute. Their neural representations are stored or accessed in real time or loaded directly into memory without further modification.
    The neurally compressed textures save up to 7x more VRAM or system memory than traditional block compressed textures at the same visual quality.
That's marketing. Marketing always sounds good. Let's see how it works in action.
 
Joined
Jun 26, 2023
Messages
31 (0.05/day)
In agree that 12GB is a problem, I haven't seen usage in Indiana with path tracing at FHD base resolution to check it and also 5070 will support neural texture compression enabling similar quality textures with smaller memory footprint (or higher quality textures in a given memory budget) but what you said even if we suppose that it isn't true today (FHD base...) certainly will be in the near future.
And tomorrow is here, see my Indiana Jones example, and I don't think the neural texture compression -- it was like 0.4 - 0.5 GB less VRAM demand, according to a vid -- is going to help with that bc the VRAM utilization in that IJ scenario rises to ~15GB. Ofc not many exmples, but in the future more game gonna demand similar VRAM requiremetns (16GB VRAM for 1440p with even the lowest pathtracing setting), but better than nothing and it's free for all gens(?). Ofc, need to wait for independent reviews whether there's any catch with that.
The transformer based upscaling/DLSS is also free for all gens, that's also very nice (waiting for independent reviews whether theres any catch with that too).

For all AI LLM selfhosters enthusiasts, the GeForce 5090' 512-bit bus width theoretically means NV could release a clamshell RTX 6000 Blackwell 64GB VRAM workstation card (in a few months), like they usually do (384-bit: GeForce 4090 24 GB VRAM and "RTX 6000 Ada" 48GB VRAM).
 
Last edited:
Joined
Oct 27, 2020
Messages
818 (0.53/day)
And tomorrow is here, see my Indiana Jones example, and I don't think the neural texture compression -- it was like 0.4 - 0.5 GB less VRAM demand, according to a vid -- is going to help with that bc the VRAM utilization in that IJ scenario rises to ~15GB. Ofc not many exmples, but in the future more game gonna demand similar VRAM requiremetns (16GB VRAM for 1440p with even the lowest pathtracing setting), but better than nothing and it's free for all gens(?). Ofc, need to wait for independent reviews whether there's any catch with that.
The transformer based upscaling/DLSS is also free for all gens, that's also very nice (waiting for independent reviews whether theres any catch with that too).

For all AI LLM selfhosters enthusiasts, the GeForce 5090' 512-bit bus width theoretically means NV could release a clamshell RTX 6000 Blackwell 64GB VRAM workstation card (in a few months), like they usually do (384-bit: GeForce 4090 24 GB VRAM and "RTX 6000 Ada" 48GB VRAM).
It's not only 0,4-0,5GB, but we will see in the near future (I hope) with actual implementations the memory requirement difference
 
Joined
May 10, 2023
Messages
498 (0.81/day)
Location
Brazil
Processor 5950x
Motherboard B550 ProArt
Cooling Fuma 2
Memory 4x32GB 3200MHz Corsair LPX
Video Card(s) 2x RTX 3090
Display(s) LG 42" C2 4k OLED
Power Supply XPG Core Reactor 850W
Software I use Arch btw
For all AI LLM selfhosters enthusiasts, the GeForce 5090' 512-bit bus width theoretically means NV could release a clamshell RTX 6000 Blackwell 64GB VRAM workstation card (in a few months), like they usually do (384-bit: GeForce 4090 24 GB VRAM and "RTX 6000 Ada" 48GB VRAM).
Don't forget the 3GB modules that are coming really soon. That would mean a 48GB model without clamshell, and a 96GB with.
 
Joined
Jun 26, 2023
Messages
31 (0.05/day)
Depends what they mean. Right now it's all marketing: ".. than traditional block compressed textures", What percentage make these kind of textures up vs other ones, etc? Indeed, let's wait for independent reviews. But whatever it is, I don't believe the game VRAM consumption will be reduced by 7x, obv, nobody expects that. The vid I'm talking about showed ~0.5GB, which is actually more believable, bc this is the actual believable part of the ".. same visual quality.". It's all marketing so far (7x of a fraction is still not much).
Don't forget the 3GB modules that are coming really soon. That would mean a 48GB model without clamshell, and a 96GB with.
3GB modules? true. But using a 512bit big and expensive chip for 16 chips * 3GB per chip vs the already available RTX 6000 Ada with the same 48GB VRAM would be kinda..? But maybe the bandwith increase from 960 GB/s to 1792 GB/s + other Blackwell features are what customers (would) demand, but it's still 48GB so no sure. I'm also not sure NV is going to relese a 16chips * 3GB per chip * 2 [clamshell] = 96GB version, I mean it would be very nice, but I think they will stretch this one out to at least next generation. But even then, in ~2.5 years, I'm not sure bc then they can claim that 3GB VRAM modules are widely available to then now in quantities and they can do 384 bit ones (btw, 384bit/32bit per chip=12 chips, for someone who doesn't know): 12chips * 3GB per chip * 2 [clamshell] = 72GB. So 96GB VRAM maybe in next next generation in ~5 years?
Though I have to say that the 32GB VRAM of the 5090 came as a slight positive surprise, bc I was thinking that no way they would want a competitor to ther expensive 32GB workstation card and instead the'd would release a 28-30GB VRAM card (ofc, people who need the workstation features, won't switch to the 5090, but other who need 32GB VRAM in one card and don't care about workstation card featurs now don't need to buy the expensive workstation card either).
 
Last edited:
Joined
Jul 4, 2023
Messages
52 (0.09/day)
Location
You wish
Next Nvidia architecture will be called Gaslight and the AI will design it itself. What could possibly go wrong.

It's barely 2025 and I already can't hear that marketing babble spillage and it will get worse since everyone seems to find ai hallucinations totally fine and a part of the empirical reality. Can someone please reboot the universe?

Every GPU ad I've seen from CES so far sounds like a marketing manager on Ritalin overdose
 
Joined
Apr 24, 2020
Messages
2,780 (1.61/day)
Every GPU ad I've seen from CES so far sounds like a marketing manager on Ritalin overdose

GPU Makers know that AI isn't important, but it also doesn't "cost" very much to add FP16 or FP4 (or whatever) matrix-multiplication units to your GPUs.

Seriously, 4-bit numbers? Okay NVidia. I see what you're doing.
 
Joined
Jun 19, 2024
Messages
235 (1.12/day)
System Name XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor Everything from Turion to 13900kf
Motherboard MSI - they own the OEM market
Cooling Air on laptops, lots of air on servers, AIO on desktops
Memory I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s) A pile up to my knee, with a RTX 4090 teetering on top
Storage Rust in the closet, solid state everywhere else
Display(s) Laptop crap, LG UltraGear of various vintages
Case OEM and a 42U rack
Audio Device(s) Headphones
Power Supply Whole home UPS w/Generac Standby Generator
Software ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores 1.21 GigaBungholioMarks
GPU Makers know that AI isn't important, but it also doesn't "cost" very much to add FP16 or FP4 (or whatever) matrix-multiplication units to your GPUs.

Seriously, 4-bit numbers? Okay NVidia. I see what you're doing.

What is Nvidia doing, besides optimizing instruction sets for inference?
You didn’t know AMD is doing the same apparently.

  • The first product in the AMD Instinct MI350 Series, the AMD Instinct MI350X accelerator, is based on the AMD CDNA 4 architecture and is expected to be available in 2025. It will use the same industry standard Universal Baseboard server design as other MI300 Series accelerators and will be built using advanced 3nm process technology, support the FP4 and FP6 AI datatypes and have up to 288 GB of HBM3E memory.
Maybe a little googling before posting will be helpful.
 
Joined
Apr 24, 2020
Messages
2,780 (1.61/day)
Maybe a little googling before posting will be helpful.

Or maybe I might be talking about something different.

64-bit multiplications are difficult, because multiplication scales by O(n^2) (assuming Dadda Multiplier architecture). This means that a 64-bit multiplier requires 4x as many adders as a 32-bit multiplier, or 16x as many adders as a 16-bit multiplier, or 32x as many adders as a 8-bit multiplier, or 64x more adders than a 4-bit multiplier.

So having 16 x 4-bit multipliers is still only 25% of the area (!!!!) of a 64-bit bit multiplier because of this O(n^2) scaling.

Case in point, the extreme 1-bit multiplier is also known as "AND" gate. (0*0 == 0 AND 0. 1*0 == 1 AND 0. 1*1 == 1 AND 1). When we go all the way down to the smallest number of bits, multiplication gets ridiculously easy to calculate. IE: Its very cheap to add 16x 4-bit multipliers, 8x 8-bit multipliers, 4x 16-bit multipliers to a circuit. Especially if that circuit already has the big-honking 64-bit or 32-bit multiplier off to the side. And all GPUs have 32-bit multipliers because 32-bits is the standard for video games.

All of this AI stuff is just gold to NVidia and AMD. They barely have to do any work (from a computer design perspective), they just need to lol give incredibly easy 4-bit designs and then sell them for more money.

------------

If these "AI Companies" can sell 2-bit or even 1-bit multiplication, I guarantee you that they will do so. Its not about efficacy, its about how little work they need to do yet still able to sell something for more money.
 
Joined
Jun 19, 2024
Messages
235 (1.12/day)
System Name XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor Everything from Turion to 13900kf
Motherboard MSI - they own the OEM market
Cooling Air on laptops, lots of air on servers, AIO on desktops
Memory I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s) A pile up to my knee, with a RTX 4090 teetering on top
Storage Rust in the closet, solid state everywhere else
Display(s) Laptop crap, LG UltraGear of various vintages
Case OEM and a 42U rack
Audio Device(s) Headphones
Power Supply Whole home UPS w/Generac Standby Generator
Software ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores 1.21 GigaBungholioMarks
Joined
Apr 24, 2020
Messages
2,780 (1.61/day)
It’s FP4. Floating point. A wee bit more complicated than you make it out to be. Perhaps you should look it up. Here’s a starting point https://papers.nips.cc/paper/2020/file/13b919438259814cd5be8cb45877d577-Paper.pdf

Oh right. Good point. Its 1-bit sign, 3-bit exponent, 0-bit mantissa. So "multipliers" are actually 3-bit adders.

Thanks for making me look it up. Its even worse than I expected. ("Multiplication" of exponents simply becomes addition. 2^4 * 2^3 == 2^7).
 
Joined
Jun 19, 2024
Messages
235 (1.12/day)
System Name XPS, Lenovo and HP Laptops, HP Xeon Mobile Workstation, HP Servers, Dell Desktops
Processor Everything from Turion to 13900kf
Motherboard MSI - they own the OEM market
Cooling Air on laptops, lots of air on servers, AIO on desktops
Memory I think one of the laptops is 2GB, to 64GB on gamer, to 128GB on ZFS Filer
Video Card(s) A pile up to my knee, with a RTX 4090 teetering on top
Storage Rust in the closet, solid state everywhere else
Display(s) Laptop crap, LG UltraGear of various vintages
Case OEM and a 42U rack
Audio Device(s) Headphones
Power Supply Whole home UPS w/Generac Standby Generator
Software ZFS, UniFi Network Application, Entra, AWS IoT Core, Splunk
Benchmark Scores 1.21 GigaBungholioMarks
Oh right. Good point. Its 1-bit sign, 3-bit exponent, 0-bit mantissa. So "multipliers" are actually 3-bit adders.

Thanks for making me look it up. Its n worse than I expected. ("Multiplication" of exponents simply becomes addition. 2^4 * 2^3 == 2^7).
if that’s what you got out of the IBM research paper then you’re too blinded by hate to understand. Upset that your gaming GPU are expensive because they found other users for then I presume.
 
Joined
Apr 24, 2020
Messages
2,780 (1.61/day)
if that’s what you got out of the IBM research paper then you’re too blinded by hate to understand. Upset that your gaming GPU are expensive because they found other users for then I presume.

Dude, its elementary to design a 1-bit sign / 3-bit exponent / 0-bit mantissa multiplier. Seriously, look at the damn format and think about how the circuit would be made. This is the stuff a 2nd year computer engineer can do.

If you aren't into computer engineering, then I promise you, its not very complex. The hard stuff are the larger multiplier circuits (aka: Wallace Trees and whatnot), which are closer to 4th year or even masters-degree level in my experience.

Think like an NVidia or AMD engineer. Think about how to wire adders together so that it'd make a multiplication circuit. Some of these things are absurdly easy to do.
 
Joined
Jul 4, 2023
Messages
52 (0.09/day)
Location
You wish
Dude, its elementary to design a 1-bit sign / 3-bit exponent / 0-bit mantissa multiplier. Seriously, look at the damn format and think about how the circuit would be made. This is the stuff a 2nd year computer engineer can do.

If you aren't into computer engineering, then I promise you, its not very complex. The hard stuff are the larger multiplier circuits (aka: Wallace Trees and whatnot), which are closer to 4th year or even masters-degree level in my experience.

Think like an NVidia or AMD engineer. Think about how to wire adders together so that it'd make a multiplication circuit. Some of these things are absurdly easy to do.
But its fp4 ! I'm gonna repeat the catchword over and over again and then I'll get it.
if that’s what you got out of the IBM research paper then you’re too blinded by hate to understand. Upset that your gaming GPU are expensive because they found other users for then I presume.
And you are some kind of GPU Jedi or how did you remotely get from fp4 to this? Back to the ocean! Now!
 
Top