You can't handle the truth when you censor a debate when you can't win.
View attachment 149236
Meanwhile, NVIDIA PR throws in RT cores' TFLOPS into marketing.
Expect AMD PR to weaponize RT cores TFLOPS when "Big Navi" arrives.
Why debate about FP32 general-purpose shader compute (not generalize like SSE) when future game titles have significant RT workloads?
Current shaders accelerate Z-buffer accelerated structures while RT cores accelerate BVH accelerated structures.
Lol, censoring the debate? It's not my fault you're not able to keep a civil tone in a discussion or keep yourself from personal attacks. That's your own responsibility, not mine. You need to calm down and stop projecting your own missteps onto me.
And again, as addressed in my previous post: Nvidia adopting a bad marketing practice does not in any way wake it a good marketing practice. You apparently need to be spoon fed, so let's go through this point by point.
-TFLOPS in GPU performance metrics is generally accepted to mean FP32 TFLOPS, as that is the "baseline" industry-standard operation (single-precision compute) as opposed to higher or lower precisions (FP64, FP16, INT8, INT4, etc.).
-In GPUs these operations are performed by shader cores, which are fundamentally FP32 compute cores (though sometimes with various degrees of FP64 support either through dedicated hardware or the ability to combine two FP32 cores), which can also perform lower precision workloads either natively at the same speed or faster by combining several operations in one core.
-FP32 compute is a very broad category of general compute operations. Some of these operations can be done by various forms of specialized hardware, or can be done in lower precisions at higher speed (through methods like rapid packed math) without sacrificing the quality of the end result.
-Due to FP32 being a broad category a lot of FP32 operations can also be performed more efficiently by making specialized hardware for a subset of operations. This hardware, by virtue of being specialized for a specific subcategory of operations,
is not capable of performing general FP32 compute operations.
-As the operations done on the specialized hardware can also be done on FP32 hardware, you can give an approximation of the equivalent FP32 performance necessary to match the performance of the specialized hardware. I.e. you can say things like "to match the performance of our RT cores you would need X number of FP32 FLOPS". These calculations are then dependent on - among other things - how efficient your implementation of said operation through general FP32 compute is. Two different solutions will very likely perform differently, and will thus result in different numbers for the same hardware.
-This is roughly equivalent to how fixed-function video encode/decode blocks can do this specialized subset of work faster and more efficiently than the same work performed on a CPU or GPU. That doesn't mean you can run your OS or games off a video encode/decode block, as this block is only capable of a small set of operations.
-
These comparisons can't be expanded to other tasks, as the specialized hardware is not capable of general FP32 compute. FP32 hardware
can do RT; RT hardware
can't do FP32. I.e. you
cannot say that "our RT cores are capable of X FP32 FLOPS" - because that statement is fundamentally untrue - your RT hardware is capable of
zero FP32 FLOPS. That your F1 car (specialized hardware) can do some of the things your Civic (general hardware) can do - driving on a flat surface - and is "X times better" at that (i.e. faster around a track) does not mean that this can be transferred to the other things the general hardware can do - your F1 car has nowhere to put your groceries and would get stuck on the first speed bump you encountered, so it is fundamentally incapable of grocery shopping. It would also be fundamentally incapable of driving your friends around, or letting you listen to the radio while commuting. Just because specialized hardware can be compared to general hardware
in the task the specialized hardware can do does not mean this comparison can be expanded into the other tasks that general hardware can do -
because the specialized hardware is fundamentally incapable of doing these things.
-So, to sum up: AMD made a claim in marketing that, while technically true,
needs to be understood in a very specific way to be true, and is very easy to misunderstand and thus misrepresent the capabilities of the hardware in question. The Xbox Series X is capable of 12.1 TFLOPS of FP32 compute. When performing combined rasterization and RT graphics workloads, it is capable of performing an amount of RT compute that would require 13 TFLOPS of FP32 compute to achieve if said workload was run on pure FP32 hardware (which it isn't, it's run on RT hardware). It is not, and will never be, capable of 25 TFLOPS of FP32 compute. Nvidia copying this does not in any way make it less problematic - I would say it makes it a lot
more problematic, as there's no way of knowing if the two companies' ways of performing RT workloads on FP32 cores is equally performant, and unless they are, any comparisons are entirely invalid. Especially problematic is the fact that conversions like this make worse performance look better: if your RT-through-FP32 implementation is
worse than the competition, you can claim that your RT hardware is equivalent to
more FP32 hardware than theirs is. This tells us
nothing of actual performance, only performance relative to something unknown and unknowable.
This just boils down to a very clear demonstration of how utterly useless FP32 FLOPS are as a metric of GPU performance. Not only is the translation from FP32 compute (TFLOPS) into gaming performance not 1:1 but dependent on drivers, hardware utilization, and architectural features, but this now adds another stack abstraction layers, meaning that any numbers made in this way are
completely and utterly incomparable. Comparing FLOPS from pure shader hardware across AMD and Nvidia was already comparing apples and oranges, but now it's more like comparing apples and ... hedgehogs. Or something.
Btw, I would sincerely like to see you point out what of the above (or my previous posts on this) makes me an AMD fanboy. The ball's in your court on that one.