• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Is NVIDIA comparing FP4 vs FP8 in the consumer site' Specs table without a disclaimer/clearly stating it?

Does not disclaiming, that it's FP4 vs FP8, is misleading the customers?


  • Total voters
    4
Joined
Jun 26, 2023
Messages
91 (0.14/day)
Processor 7800X3D @ Curve Optimizer: All Core: -25
Motherboard TUF Gaming B650-Plus
Memory 2xKSM48E40BD8KM-32HM ECC RAM (ECC enabled in BIOS)
Video Card(s) 4070 @ 110W
Display(s) SAMSUNG S95B 55" QD-OLED TV
Power Supply RM850x
AI_TOPS.png

I was wondering how NVIDIA managed to improve the "AI TOPS" (see Specs comparison table) by roughly 2 times, despite basically (re)using the same process node and same chips for the RTX 50 series (same power efficiency and/or same chip size, same transistor count, same everything basically (the 5070 has a smaller chip but it's still the same node)) vs the RTX 40 series.

According to @W1zzard, NVIDIA is comparing FP4 vs FP8:
FP4 is just 4 bits, vs 8 bits on FP8, so half the data = twice the performance, no surprises here. TOPS = "operations", doesn't specify how many bits
I already have assumed for this maybe to be the case. But they are not clarifying it on their site: These are consumer GPUs and the site, which contains the Specs comparison table, comes up as the first search result when a customer/consumer searches for e.g. "5070" (not that I would buy a 12GB VRAM GPU again). A consumer doesn't see and can not know that it's FP4 vs FP8?

NVIDIA did this 4bit vs 8bit comparison in their Blackwell presentation a year ago, but there they did put a disclaimer (well, they put it pretty well visible directly on the graph) that it is FP8 vs FP4.

Does this mislead customers/consumers, who are mainly non-experts, what do you think?
 
Joined
Dec 25, 2020
Messages
7,942 (5.15/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000 (5090 shipping to me soon™)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
No, it is completely irrelevant for gamers, and for anyone who cares about FP4 vs. FP8, they are specifically interested in this GPU for its FP4 support (AI inferencing people), so I don't consider it to be misleading. It's an important architectural improvement in Blackwell, which Ada cannot support. RDNA 4 can, though.

Being able to process more smaller numbers at once is what raises its teraops per second rating, as I understand it, if you use too high precision math for an operation that doesn't require it, it simply lowers the performance because the extra data is not useful. Being able to process lower precision numbers also greatly decreases error rates and improves energy efficiency.
 
Joined
Oct 6, 2021
Messages
1,815 (1.44/day)
System Name Raspberry Pi 7 Quantum @ Overclocked.

Do you really think the person who blatantly lies to the world with statements like "5070 has the performance of a 4090" is concerned about something technical and imperceptible like that? Friend, I have bad news for you. :pimp:
 
Joined
Mar 21, 2016
Messages
2,635 (0.80/day)
No, it is completely irrelevant for gamers, and for anyone who cares about FP4 vs. FP8, they are specifically interested in this GPU for its FP4 support (AI inferencing people), so I don't consider it to be misleading. It's an important architectural improvement in Blackwell, which Ada cannot support. RDNA 4 can, though.

Being able to process more smaller numbers at once is what raises its teraops per second rating, as I understand it, if you use too high precision math for an operation that doesn't require it, it simply lowers the performance because the extra data is not useful. Being able to process lower precision numbers also greatly decreases error rates and improves energy efficiency.

That's a overly simplistic view on the subject matter.

FP4 (4-bit floating point) and FP8 (8-bit floating point) are both low-precision formats designed to optimize performance and efficiency in AI and computational tasks. However, they serve slightly different purposes, and their advantages depend on the specific use case.

FP4 Advantages:​

  • Memory Efficiency: FP4 uses half the memory of FP8, which can be a significant advantage in scenarios where memory bandwidth is a bottleneck.
  • Speed: With fewer bits to process, FP4 can theoretically achieve faster computation speeds, especially in hardware optimized for such low-precision formats.
  • Quantized Processing: FP4 is particularly effective for tasks like real-time quantized subpixel-level post-processing, where precision requirements are lower, and speed is critical.

FP8 Advantages:​

  • Higher Precision: FP8 provides better numerical accuracy compared to FP4, making it more suitable for tasks that require a balance between speed and precision, such as mixed-precision training in AI models.
  • Broader Range: FP8 can handle a wider range of values, which is essential for applications where FP4's limited range might lead to underflows or overflows.
In real-time applications like subpixel-level post-processing, FP4 can excel due to its speed and efficiency, provided the precision trade-offs are acceptable. However, for tasks requiring more stability and accuracy, FP8 might be the better choice.
 
Joined
Jan 27, 2024
Messages
17 (0.04/day)
System Name BigCat
Processor i9-10900X
Motherboard Asus X-299
Memory 160GB
Video Card(s) RTX 3060 12GB, RTX 4070
Storage 10TB SSD/NVME
Display(s) Dual Acer B326HK 4K 32"
Software Windows 10, Fedora Linux
No, it is completely irrelevant for gamers, and for anyone who cares about FP4 vs. FP8, they are specifically interested in this GPU for its FP4 support (AI inferencing people), so I don't consider it to be misleading. It's an important architectural improvement in Blackwell, which Ada cannot support. RDNA 4 can, though.

Being able to process more smaller numbers at once is what raises its teraops per second rating, as I understand it, if you use too high precision math for an operation that doesn't require it, it simply lowers the performance because the extra data is not useful. Being able to process lower precision numbers also greatly decreases error rates and improves energy efficiency.
I understand FP4 is useful for AI software where it about doubles the number of calculations per second compared to FP8, only if the software is written to use it. Comparing FP4 to FP8 performance in general is misleading, and there is a much smaller increase from RTX 4000 to RTX 5000 series if you compare FP8 to FP8.
Maybe this also makes a difference with games if the DLSS software takes advantage of FP4 hardware.
 
Joined
Mar 21, 2016
Messages
2,635 (0.80/day)
FP4 hardware will be taken advantage of by savvy developers in games absolutely provided they've got time to optimize a bit. What we're liable to see going forward with the hardware is just tighter integration of FP4 techniques being applied into the game engines themselves behind the scenes and easier to implement.

It don't know if people have played around with leveraging FP4 in reshade yet with the hardware being so new, but I'm sure it's going to happen and have a overall positive impact. As a example I routinely often use two configurations of FXAA together one that's more aggressive in strength and the other lower that offset each other perfectly and combine together nicely with FP4 I could theoretically run both of those at the same performance as FP8 effectively interpolating them in essence with sub-pixel quantization of minor qualitative differences, but higher effective IPC by a order of magnitude. Where that becomes more noticeably is when you apply across most post process techniques with greater overhead and additional layers which cumulatively can add up to a lot.

To summarize you can basically get more layers of post process at around the same IPC with minimal quantized perceptive differences that improves things a lot. You end up with more interpolation of frames effectively with better tonal quality of effects that translate more nicely on screen for roughly the same overhead or maybe even slightly less.

Like if you think DRAM refresh intervals the same principle applies. You'll get more post process between refresh rate intervals with FP4 vs FP8 because latency matters.

As far as the misleading part "AI TOPS" is a bit implied I would say so you might view that either way.
 
Last edited:
Joined
Dec 25, 2020
Messages
7,942 (5.15/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000 (5090 shipping to me soon™)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
That's a overly simplistic view on the subject matter.

FP4 (4-bit floating point) and FP8 (8-bit floating point) are both low-precision formats designed to optimize performance and efficiency in AI and computational tasks. However, they serve slightly different purposes, and their advantages depend on the specific use case.

FP4 Advantages:​

  • Memory Efficiency: FP4 uses half the memory of FP8, which can be a significant advantage in scenarios where memory bandwidth is a bottleneck.
  • Speed: With fewer bits to process, FP4 can theoretically achieve faster computation speeds, especially in hardware optimized for such low-precision formats.
  • Quantized Processing: FP4 is particularly effective for tasks like real-time quantized subpixel-level post-processing, where precision requirements are lower, and speed is critical.

FP8 Advantages:​

  • Higher Precision: FP8 provides better numerical accuracy compared to FP4, making it more suitable for tasks that require a balance between speed and precision, such as mixed-precision training in AI models.
  • Broader Range: FP8 can handle a wider range of values, which is essential for applications where FP4's limited range might lead to underflows or overflows.
In real-time applications like subpixel-level post-processing, FP4 can excel due to its speed and efficiency, provided the precision trade-offs are acceptable. However, for tasks requiring more stability and accuracy, FP8 might be the better choice.

Overly simplistic, yes. But I often use ELI5 language to make for smoother conversation (especially when laymen are mentioned in the OP), it's not exactly wrong. It's not like this architecture doesn't support FP8, it even supports a middle ground FP6 if more precision than 4 but less than 8 is needed! Ergo: it's an improvement. "Misleading" by introducing improvements... how would that even work? Ultimately, these are marketing numbers. If it does 3325 AI TOPS with any method within its capabilities, it's not exactly a lie.

Everyone knows that accounted for the exact same workload running the same codepath on both, the 5090 is only about ~35% faster than the 4090, not twice as fast. The point of marketing a new product is showing that it knows a few extra tricks over the old dog, IMO. So, not a misleading quote in my opinion. Misleading was the 5070 = 4090 claim with MFG crutch.
 
Joined
Jan 10, 2011
Messages
1,530 (0.30/day)
Location
[Formerly] Khartoum, Sudan.
System Name 192.168.1.1~192.168.1.100
Processor AMD Ryzen5 5600G.
Motherboard Gigabyte B550m DS3H.
Cooling AMD Wraith Stealth.
Memory 16GB Crucial DDR4.
Video Card(s) Gigabyte GTX 1080 OC (Underclocked, underpowered).
Storage Samsung 980 NVME 500GB && Assortment of SSDs.
Display(s) ViewSonic VA2406-MH 75Hz
Case Bitfenix Nova Midi
Audio Device(s) On-Board.
Power Supply SeaSonic CORE GM-650.
Mouse Logitech G300s
Keyboard Kingston HyperX Alloy FPS.
VR HMD A pair of OP spectacles.
Software Ubuntu 24.04 LTS.
Benchmark Scores Me no know English. What bench mean? Bench like one sit on?
Does this mislead customers/consumers, who are mainly non-experts, what do you think?
I would have said yes, if I wasn't aware of the simple fact that a non-expert consumer wouldn't know what the hell is an "AI TOPS." What I would call misleading is putting it there in the first place, in the sense that it just throws around numbers that, even if they were true, the target audience wouldn't know if it benefits them or not. "Bigger Numbers!"

Those who do care for this metric probably have other spec sheets they look at.
 
Joined
Mar 21, 2016
Messages
2,635 (0.80/day)
The difference is in the relative quality, but that can be subjective and depends on all the factors that go into it not simply was FP4 utilized. I know it supports FP8 and doesn't take away from those capabilities this isn't 32bit PhysX being axed we're talking about. It's not exactly a lie, but not a clear indication of the performance comparison being FP4 vs FP8 as opposed to FP8 vs FP8 capabilities. They clearly chose to deliberately mask that a thru labeling it AI TOPS, but that's kind of a industry terminology today which just implies peak AI TOPS performance and less to do about how why or quantitative differences if I'm not mistaken.
 
Joined
Dec 25, 2020
Messages
7,942 (5.15/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000 (5090 shipping to me soon™)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
The difference is in the relative quality, but that can be subjective and depends on all the factors that go into it not simply was FP4 utilized. I know it supports FP8 and doesn't take away from those capabilities this isn't 32bit PhysX being axed we're talking about. It's not exactly a lie, but not a clear indication of the performance comparison being FP4 vs FP8 as opposed to FP8 vs FP8 capabilities. They clearly chose to deliberately mask that a thru labeling it AI TOPS, but that's kind of a industry terminology today which just implies peak AI TOPS performance and less to do about how why or quantitative differences if I'm not mistaken.

In my point of view, the marketed number is simply the highest throughput achievable with the lowest precision available on the hardware. It would be equally "misleading" to compare an FP8 compatible GPU with one that only does FP16... the one that does 16 bit math only is gonna lose at every single turn if we're talking about absolute TOPS number achievable. That's why I can't really fault it.

If only the highest precision number mattered, then every GPU would need to focus on FP64... and you know what card matches the 5090 at FP64? The GTX Titan from 12 years ago, the difference being Titan executes FP64 at 1:3 rate and 5090 at 1:64. The Titan Black with the full GK110 core is sizably faster.

What happened is that instead of increasing precision, GPUs have decreased it over time.
 
Joined
Mar 21, 2016
Messages
2,635 (0.80/day)
It's just pyramid precision really with different layers of scaling. Towards the top you have low precision, but low latency due to low overhead. It's a bit like path tracing versus full screen render which is better in one pass versus time to render a single pass?
 
Joined
Dec 25, 2020
Messages
7,942 (5.15/day)
Location
São Paulo, Brazil
System Name "Icy Resurrection"
Processor 13th Gen Intel Core i9-13900KS
Motherboard ASUS ROG Maximus Z790 Apex Encore
Cooling Noctua NH-D15S upgraded with 2x NF-F12 iPPC-3000 fans and Honeywell PTM7950 TIM
Memory 32 GB G.SKILL Trident Z5 RGB F5-6800J3445G16GX2-TZ5RK @ 7600 MT/s 36-44-44-52-96 1.4V
Video Card(s) NVIDIA RTX A2000 (5090 shipping to me soon™)
Storage 500 GB WD Black SN750 SE NVMe SSD + 4 TB WD Red Plus WD40EFPX HDD
Display(s) 55-inch LG G3 OLED
Case Pichau Mancer CV500 White Edition
Audio Device(s) Sony MDR-V7 connected through Apple USB-C
Power Supply EVGA 1300 G2 1.3kW 80+ Gold
Mouse Microsoft Classic IntelliMouse (2017)
Keyboard IBM Model M type 1391405
Software Windows 10 Pro 22H2
Benchmark Scores I pulled a Qiqi~
It's just pyramid precision really with different layers of scaling. Towards the top you have low precision, but low latency due to low overhead. It's a bit like path tracing versus full screen render which is better in one pass versus time to render a single pass?

Kinda, I suppose you can put it that way. FP64 fell out of favor for some reason, it had traditionally been kept enabled only on professional GPUs, but this started to happen much faster during the Maxwell/Hawaii era.

The "GTX Titan X" (GM200) doesn't have the enhanced FP64 of its Kepler predecessors (Nvidia physically removed the FP64 units from this silicon), and the FP64 cores were already disabled on GTX 780 (and Ti). Same thing for AMD, FirePro W9100 does it at 1:2 rate compared to FP32, R9 290X only at 1:8, by Fiji AMD had already largely removed it from their architectures as well (both Instinct and Fury X have the same 1:16 ratio).

It never made it back with the duo of Pascal Titan X/Xp cards, only to briefly return with the Titan V (which was basically just a harvested datacenter card sold on the "cheap" and I use those quotes very loosely, it was still $3000 - but nicer on the wallet than a $8800 Quadro GV100).

Turns out lowering the precision was much better suited to the massively parallel workloads commonly used in GPUs, which is why the cards went the other way around. Starting with Turing, they went for better FP16, then 8, now we're at 4. AI inference in particular likes low precision calculations, so nowadays FP64 capability is kept to a minimum performance level for compatibility reasons. Standard precision for most workloads is still FP32, but FP64 is only used in very specific niches these days.
 
Joined
Mar 21, 2016
Messages
2,635 (0.80/day)
Variable rate shading is a term you hear much less of these days, but essentially that's what more granular quantization sub-pixel control is and is the direction things are going with AI, post process, and upscale. It's pretty much a flocking herd colony strength in numbers approach vs a Mike Tyson heavy weight boss encounter. You have to keep in mind Maxwell's big feature was stronger compression and basically quantization in general. I wouldn't say it was like a new approach either, but more a adaptation of stuff like early AA techniques that were quantized like even ATi when it was still named as such was doing mixed precision.

Everything old is new again you might say. We borrow inspiration from the past like Yuzo doing crazy programmable musical stuff with assembly language is in no small part of the growth and evolution of modern day DAW's. We still have early music software like Cubase and such, but it's people pushing the envelop of creativity and innovation that are driving the future.
 
Top