• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

Editorial Closer to the Metal: Shader Intrinsic Functions

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Shader intrinsic functions stand as a partial solution for granting developers more control over existing computational resources and how they are leveraged. This capability (much touted by AMD as a performance-enhancing feature on their GCN-based products) essentially exposes features and capabilities that exist on the hardware developers are programming for, but wouldn't generally be able to access. This can happen either because they're being abstracted by a high-level API (Application Programming Interface, like DX11), or because the API isn't functionally able to access them. To understand why high-level APIs such as DX11 don't usually offer support for a piece of hardware's full feature list, or full processing capabilities, we must first look at the basic architecture of a given computer system.





As you can see, there are usually multiple layers a given task must go through in order for it to be processed at a hardware level. You might be wondering why do we even need so many layers in the first place, and why wasn't this enabled before. There are many technical reasons for this, but one of the strongest is simply the breadth of different hardware available for your buying and assembling pleasure. Unlike the console ecosystem, where hardware is fixed and, as a result, predictable in its performance metrics and command execution, the PC ecosystem is fractured in countless hardware combinations. You may have an AMD, CMT-enabled (Clustered Multi-Threading) FX-8350, an SMT-enabled (Simultaneous Multi-Threading) i7 6700K or anything in between, paired with a GCN RX 480 or a Pascal GTX 1070… And all that hardware has particularities in regards as to how it processes the same task, and the type of commands you need to input in order to get a given result. So, DX11, DX12 and Vulkan serve as what we call an abstraction layer.



Abstraction layers essentially simplify the programmer's work - they "hide" and automate a given command's underlying processes, particular implementation and hardware-specific code paths, so that the programmer only has to worry about what commands he wants to use - and voila. The high-level API converts a given command (let's imagine, for simplicity's sake, "draw frame") into its equivalent, non-abstracted, hardware code, and runs it with a good-enough optimization on most hardware to deliver those awesome (insert your favorite game here) frames. To elaborate a little: imagine you have a command called "Stack". On a high-level API like DX11, this command will be interpreted and values for its inner workings wil be automatically given, based on general hardware compatibility: how many levels to stack, when to stack them, and when to stop the operation. But since these aren't optimized, your hardware will use somewhat of a brute-force approach. With a low-level API, developers can now set the exact values for the "Stack" command's inner workings, optimized for your hardware, so it never goes out of budget, and none of those sexy stream processors are left idle.

The problem with the former, high-level approach, of course, is that generalizations and simplifications aren't as efficient as running an optimized, hardware-specific code path, and may sometimes even deny access to hardware features over lack of support from the high-level API. The thing with DX12 and Vulkan's low-level capabilities is that with them, in specific scenarios, developers can mostly ignore abstraction layers (some compiler checks are still used to make sure the code is within expected parameters). This allows them to code so as to take advantage of hardware-specific features, sometimes accelerating workloads by up to 2x compared to the high-level approach. This is the basic principle of low-level API's: and something that is enabled, at least partially, by shader intrinsic functions.


Going back to the different layers on a system, imagine, for argument's sake, that it takes 5 ms for a task to compute and go through each of the layers until it is executed by the hardware - in the example image given above, that would mean 5x 5ms = 25 ms. And now imagine you can effectively avoid going through all those figurative hoops, going straight from the app's hardware processing requirements to the hardware. You now have reduced your 25 ms computation to a mere 10 ms, which frees up computation time for other tasks. This is what shader intrinsic functions really are: pieces of code that when recognized by the low-level API, are allowed to move directly towards the hardware, bypassing other, time-consuming layers.

The problem with this approach must seem obvious to you by now: while abstraction layers do add overhead to any given computing task, they do so while simplifying, sometimes by orders of magnitude, the coding process. Closer to the metal programming has in its greatest strength what also amounts to its greatest flaw: the ability to directly leverage hardware resources needs specific, time-consuming programming for the functions that were largely automatic before. This not only means more developer resources, but also a system that is more prone to errors: and debugging five lines of code is very different from debugging fifty lines of it. One must also keep in mind that closer to the metal programming, on behalf of it targeting more specifically only a subset of existing hardware, ends up leaving behind users of older, unsupported hardware.



AMD's specific application of shader intrinsic functions in low-level graphics APIs such as Vulkan and DX12 stem from AMD's grasp on the console market (with their CPUs and GPUs powering all three current-generation games consoles), as well as their previous work on Mantle, which went on to become embedded in today's Vulkan library, and arguably gave Microsoft the push it needed to include low-level access to their DX12. This means that programmers are already leveraging optimized, feature-specific code paths in their console game implementations, which in turn, leads to AMD wanting to give them access to those same features on the PC hardware that supports it, reaping the benefits of hardware-specific optimizations for their GCN architecture. That said, this doesn't mean NVIDIA doesn't have their own shader intrinsic functions that developers can take advantage of: through their GameWorks initiative, NVIDIA allows programmers to add extensions not natively supported by DX's HLSL (High Level Shading Language), while also allowing shader intrinsic functions to be leveraged as part of their CUDA ecosystem. An important distinction between the two companies' approach is that while NVIDIA requires developers to use their specific GamesWorks libraries (which are proprietary, and not accessible on AMD's cards), AMD's approach is more open, being accessible in open standards such as GPUOpen and Vulkan's libraries.

Shader intrinsics are just a part of what a low-level API needs to be, and aren't particularly game-changing in and of themselves. That said, shader intrinsics will never be at their best on PC hardware, simply because of how the ecosystem is fractured by the amount of possible, updated or not-so-up-to-date systems. The best part of PC gaming is also, in this case and at this point in time, its greatest drawback towards obtaining perfect performance from any given system. But shader intrinsics are indeed a step forward towards giving developers more control over the features they implement and how they are run, and stand side by side with other technologies which will, in time, steer us towards ever more performant systems.



View at TechPowerUp Main Site
 
Last edited:
Joined
Nov 18, 2010
Messages
7,492 (1.47/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
Where I can give this man a beer?
 

the54thvoid

Super Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
12,992 (2.39/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
Joined
Nov 4, 2005
Messages
11,960 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.
 
Joined
Feb 20, 2007
Messages
372 (0.06/day)
Location
Where the beer is good
System Name Karl Arsch v. u. z. Abgewischt
Processor i5 3770K @5GHz delided
Motherboard ASRock Z77 Professional
Cooling Arctic Liquid Freezer 240
Memory 4x 4GB 1866 MHz DDR3
Video Card(s) GTX 970
Storage Samsung 830 - 512GB; 2x 2TB WD Blue
Display(s) Samsung T240 1920x1200
Case Bitfenix Shinobie XL
Audio Device(s) onboard
Power Supply Cougar G600
Mouse Logitech G500
Keyboard CMStorm Ultimate QuickFire (CherryMX Brown)
Software Win7 Pro 64bit
Nice!

If you guys continue with these kind of articels, it would be good imo to create a seperate section for them with a direct link somewhere in the top bar between Home-Reviews-Forum. Would be a shame for all the effort that goes into such a piece, to just get forgotten between news articels.
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.

Thanks, man :peace:
That is indeed a relevant distinction. I'll try and sprinkle it on the piece :toast:
 
Last edited:
Joined
Aug 20, 2007
Messages
21,406 (3.41/day)
System Name Pioneer
Processor Ryzen R9 9950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage Intel 905p Optane 960GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11 Enterprise IoT 2024
I've tried to detail this to users of TPU in the past, but never had the gumption to do a full writeup. Props.
 
Joined
Sep 25, 2012
Messages
2,074 (0.47/day)
Location
Jacksonhole Florida
System Name DEVIL'S ABYSS
Processor i7-4790K@4.6 GHz
Motherboard Asus Z97-Deluxe
Cooling Corsair H110 (2 x 140mm)(3 x 140mm case fans)
Memory 16GB Adata XPG V2 2400MHz
Video Card(s) EVGA 780 Ti Classified
Storage Intel 750 Series 400GB (AIC), Plextor M6e 256GB (M.2), 13 TB storage
Display(s) Crossover 27QW (27"@ 2560x1440)
Case Corsair Obsidian 750D Airflow
Audio Device(s) Realtek ALC1150
Power Supply Cooler Master V1000
Mouse Ttsports Talon Blu
Keyboard Logitech G510
Software Windows 10 Pro x64 version 1803
Benchmark Scores Passmark CPU score = 13080
This is great stuff. Thank you!
 
Joined
Aug 30, 2015
Messages
166 (0.05/day)
Location
Copenhagen, Denmark
System Name Royal Fortune (Main)/Adventure Galley (NAS)/Little Ranger (HTPC)
Processor Intel i5 4460/AMD C-70/Intel Pentium G3258 Anniversary Ed.
Motherboard Gigabyte ga-z97x-gaming 5/Asrock C-70M1/Asrock Z97 Anniversary
Cooling Phanteks PH-TC12DX/Stock/Raijintek Triton Core
Memory 8GB Team Group Dark 1600 CL9/8GB Team Group Elite 1600 CL9/8GB Avexir Core 1600
Video Card(s) VTX3D R9 280X 3GB/APU/Palit GTX 750 TI StormX Duo
Storage 120GB Team Group Ultra L5 SSD + 1TB WD Black/4 X 2TB WD Blue/120 GB Kingston V300
Display(s) Dell 2310/AOC e2070Swn 19.5"/TV
Case In Win 707/Bitfenix Prodigy M/Dimastech Easy V3
Audio Device(s) N/A
Power Supply EVGA Supernova GS 650W/be quiet! System Power 7 350W/Xigmatek Maverick 400W
Mouse Logitech G303 Daedalus Apex/Razer Abyssus/-
Keyboard Corsair K70 Red/Steelseries Apex Raw/Logitech K400
Software Win10/FreeNAS 9.3/KodiBuntu
Thank you, this was like a drink of cold water in the desert! I really hope that we will see more of these articles in the future...informative, factual and well written.
 
Joined
Sep 15, 2007
Messages
3,946 (0.63/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
Well shave my legs and call me Harry, that was a well written piece.

It worth pointing out the difference in open architecture that AMD is offering VS the walled garden of Nvidia. One has to wonder how long that garden will be closed as the move towards a common architecture between both companies and DX12 will reveal code path to developers and many others as it becomes an increasingly thin shim between the OS and hardware.

I think nvidia saw the writing on the wall with Vulkan. Why else would they be so quick to get onboard?
 
Joined
Nov 21, 2010
Messages
2,350 (0.46/day)
Location
Right where I want to be
System Name Miami
Processor Ryzen 3800X
Motherboard Asus Crosshair VII Formula
Cooling Ek Velocity/ 2x 280mm Radiators/ Alphacool fullcover
Memory F4-3600C16Q-32GTZNC
Video Card(s) XFX 6900 XT Speedster 0
Storage 1TB WD M.2 SSD/ 2TB WD SN750/ 4TB WD Black HDD
Display(s) DELL AW3420DW / HP ZR24w
Case Lian Li O11 Dynamic XL
Audio Device(s) EVGA Nu Audio
Power Supply Seasonic Prime Gold 1000W+750W
Mouse Corsair Scimitar/Glorious Model O-
Keyboard Corsair K95 Platinum
Software Windows 10 Pro
I think nvidia saw the writing on the wall with Vulkan. Why else would they be so quick to get onboard?

Nope, something is being offered at no detriment, common sense says take advantage of it and that they are doing. It also let's the plug whatever proprietary 'useful if it went mainstream, useless for the vast majority otherwise" tech as a bullet point on their list of features.
 
Joined
Sep 15, 2007
Messages
3,946 (0.63/day)
Location
Police/Nanny State of America
Processor OCed 5800X3D
Motherboard Asucks C6H
Cooling Air
Memory 32GB
Video Card(s) OCed 6800XT
Storage NVMees
Display(s) 32" Dull curved 1440
Case Freebie glass idk
Audio Device(s) Sennheiser
Power Supply Don't even remember
Nope, something is being offered at no detriment, common sense says take advantage of it and that they are doing. It also let's the plug whatever proprietary 'useful if it went mainstream, useless for the vast majority otherwise" tech as a bullet point on their list of features.

But that would mean using something that AMD wants to use! *hiss*

Normally, they would ignore it and say it's useless and no one will use it. After a few years, they would be drug kicking and screaming into compliance. I think they know it will be the API with its design to be universal and big names behind it.
 
Joined
Sep 15, 2011
Messages
6,685 (1.39/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
Closer to the metal programming has in its greatest strength what also amounts to its greatest flaw: the ability to directly leverage hardware resources needs specific, time-consuming programming for the functions that were largely automatic before. This not only means more developer resources, but also a system that is more prone to errors: and debugging five lines of code is very different from debugging fifty lines of it. One must also keep in mind that closer to the metal programming, on behalf of it targeting more specifically only a subset of existing hardware, ends up leaving behind users of older, unsupported hardware.

Actually I wanted to ask you about this while reading the article, but you summarize it nicely. So basically, the closer you get to metal programing, the more you are segregating different GPU architectures and techniques. To be honest, as much as I love going closer to directly programing the GPU with skipping layers, no sure is actually such a good idea. This not only involves wasting more time into programing drivers for each GPU, but also the game developers needs to waste A LOT more time in developing different programing techniques for different GPU, therefore increasing developing time and cost.
Just my 50 cents...
 
Joined
Jul 16, 2014
Messages
8,195 (2.18/day)
Location
SE Michigan
System Name Dumbass
Processor AMD Ryzen 7800X3D
Motherboard ASUS TUF gaming B650
Cooling Artic Liquid Freezer 2 - 420mm
Memory G.Skill Sniper 32gb DDR5 6000
Video Card(s) GreenTeam 4070 ti super 16gb
Storage Samsung EVO 500gb & 1Tb, 2tb HDD, 500gb WD Black
Display(s) 1x Nixeus NX_EDG27, 2x Dell S2440L (16:9)
Case Phanteks Enthoo Primo w/8 140mm SP Fans
Audio Device(s) onboard (realtek?) - SPKRS:Logitech Z623 200w 2.1
Power Supply Corsair HX1000i
Mouse Steeseries Esports Wireless
Keyboard Corsair K100
Software windows 10 H
Benchmark Scores https://i.imgur.com/aoz3vWY.jpg?2
Nvidia fought this, they lost. Open libraries are going to make it easier to adapt and create even better more efficient libraries. Nvidia will lag behind because of the desire for proprietary software, they just dont want to share the spotlight, which could lead to its downfall unless they jump in with two feet.

I dont see why we cant benefit and eat the cake too. Nvidia, IMO, is holding us back now.
 
Joined
Jun 10, 2014
Messages
2,978 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
While it's nice to see some editorial content instead of simply recirculation of other's content, it would be nice if the author demonstrated a good understanding of the subject.

Shader intrinsic functions stand as a partial solution for granting developers more control over existing computational resources and how they are leveraged. This capability (much touted by AMD as a performance-enhancing feature on their GCN-based products) essentially exposes features and capabilities that exist on the hardware developers are programming for, but wouldn't generally be able to access.
This is just AMD's PR department inventing a new name to an old feature, like most of the other stuff in their "GPUOpen" initiative. Writing hardware specific shader programs in assembly has been possible for a long time, in fact I remember this before HLSL and GLSL was even a thing, like 15 years ago. Using GPU shader features in assembly code has been possible which each new hardware generation.

This can happen either because they're being abstracted by a high-level API (Application Programming Interface, like DX11), or because the API isn't functionally able to access them. To understand why high-level APIs such as DX11 don't usually offer support for a piece of hardware's full feature list, or full processing capabilities, we must first look at the basic architecture of a given computer system.
You are mixing the API calls with the GPU shader programs. While APIs usually bundles API calls and a shader language, they constitutes separate parts of the rendering; one executing in the CPU and one in the GPU. Of course both may provide some degree of abstractions and vendor specific features.
- APIs such as Direct3D, OpenGL, Vulkan, and vendor specific ones like Mantle and Cg, all provide a set of API calls which serves as an interface between the game and the driver.
- Shader programs are pieces of code executing in the GPU cores. Shader programs are usually written in a high level language like HLSL or GLSL, converted to an IR for distribution and compiled to machine code by the driver. Shader intrinsic functions is all about writing hardware specific shader programs directly in assembly, potentially creating more optimal code. It has nothing to do with potential abstractions in the API calls on the CPU side.


As you can see, there are usually multiple layers a given task must go through in order for it to be processed at a hardware level. You might be wondering why do we even need so many layers in the first place, and why wasn't this enabled before. There are many technical reasons for this, but one of the strongest is simply the breadth of different hardware available for your buying and assembling pleasure.
This illustration does not match rendering at all, this is 100% wrong.

Rendering is done by using a number of API calls to build and control what we call a pipeline. Traditionally we had what we call a "fixed pipeline", allowing the programmer to only enable and slightly adjust some hardware implemented features. Back then there were no shader programs, everything was done through a huge number of API calls, and yet it was not very flexible.

Shader programs allows the programmer to implement parts of the rendering pipeline themselves. The pipeline is still controlled by API calls, but stages of the pipline can be customized to a large extent, allowing the developer to implement vertex manipulations, lightning effects, fog, transparency, texture blending, and post-processing effects like blur themselves.


The term "shader program" is actually quite confusing, but in modern rendering it refers to pieces of code executing on the GPU, which can do geometry, compute and more. Initially it was primarily used for creating shading effects, so the name has stuck. (Actually, the term is also used for non-GPU shading code used in various 3D modelling programs dating all the way back to the late 80s.)

Getting back to your claims, the GPU shader code executes directly inside the GPU. The "OS and applications, "kernel", etc. has nothing to do with this. The only abstraction involved is the transition from a high level shading language to assembly.
As a little side note; a customized shader might of course require adjustments in the API calls used.

Despite all the actors describing Direct3D 12 and Vulkan as "low level APIs", it's important to understand what is meant by "low level features". These APIs leverages greater control over the internals of how the driver manages the queue, allocations, etc., but it does not greater abilities to control this on the GPU side, like the pipeline flow, GPU threads/internal scheduling, etc. So in terms of GPU features exposed to shader programs, the new APIs currently bring nothing new. I'm hoping the next iteration of APIs will do this; move more flexibility to the shaders.

Abstraction layers essentially simplify the programmer's work - they "hide" and automate a given command's underlying processes, particular implementation and hardware-specific code paths, so that the programmer only has to worry about what commands he wants to use...
We all know there can at times be great benefits from optimizing the pipeline and/or shaders for specific hardware, but usually it's a matter of resources. Most game developers don't even prioritize writing a decent pipeline in the first place, so doing these tweaks should not be the primary concern.

Talking of abstractions, most games have a much larger cause of overhead: the engine itself. Let's take a much hyped game like AofS, using like 100.000 API calls to render a pretty basic scene. Any graphics programmer would know they could well known techniques like instancing and batching to improve the performance by a factor of 10. Rendering with a high number of API calls is certainly the most inefficient way to utilize a GPU, customizing the pipeline or the shaders with such major "defects" is basically "putting lipstick on a pig".

Going back to the different layers on a system, imagine, for argument's sake, that it takes 5 ms for a task to compute and go through each of the layers until it is executed by the hardware - in the example image given above, that would mean 5x 5ms = 25 ms. And now imagine you can effectively avoid going through all those figurative hoops, going straight from the app's hardware processing requirements to the hardware. You now have reduced your 25 ms computation to a mere 10 ms, which frees up computation time for other tasks. This is what shader intrinsic functions really are: pieces of code that when recognized by the low-level API, are allowed to move directly towards the hardware, bypassing other, time-consuming layers.
As mentioned, that description has nothing to do with how rendering works, tasks does not propagate through the levels as you described. Regardless of which API a game use, the API is the interface towards the driver, which in turn sends native commands to the GPU. Each command doesn't propagate through the levels causing the program to wait for the result. Ever since conception, both Direct3D and OpenGL has been designed as async APIs*. The game builds what we call a queue (which builds up the pipeline) and dispatches it to the driver, the game then continues to build the queue for the next frame while the driver is feeding the GPU.
*) Not to be confused with the unrelated feature "async compute".

So your calculation of 5 × 5 ms = 25 ms have no relation to reality. And even if a game used 25 ms for compute, the whole frame would probably take more than 100 ms, resulting in less than 10 FPS, so this overhead clearly does not exist as you described.

AMD's specific application of shader intrinsic functions in low-level graphics APIs such as Vulkan and DX12 stem from AMD's grasp on the console market (with their CPUs and GPUs powering all three current-generation games consoles), as well as their previous work on Mantle, which went on to become embedded in today's Vulkan library, and arguably gave Microsoft the push it needed to include low-level access to their DX12.
That's quite a few mistakes in a single sentence.
1) Hardware specific shaders and API features have existed for many years, that's not new.
2) Direct3D 12 was in the works since 2010/2011, Mantle originated from early Direct3D 12 work, not the other way around.
3) Vulkan is built on SPIR-V. It got some inspiration from Mantle in terms of the front-end, but the underlying architecture is derived from SPIR.

An important distinction between the two companies' approach is that while NVIDIA requires developers to use their specific GamesWorks libraries (which are proprietary, and not accessible on AMD's cards), AMD's approach is more open, being accessible in open standards such as GPUOpen and Vulkan's libraries
That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years. And do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.
No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.
 
Joined
Aug 22, 2016
Messages
292 (0.10/day)
Will it allow for that same better cinematic FPS of 30 ?

That's what I've always wanted on my PC..........the "better" lower FPS.
 
Joined
Apr 21, 2010
Messages
576 (0.11/day)
System Name Home PC
Processor Ryzen 5900X
Motherboard Asus Prime X370 Pro
Cooling Thermaltake Contac Silent 12
Memory 2x8gb F4-3200C16-8GVKB - 2x16gb F4-3200C16-16GVK
Video Card(s) XFX RX480 GTR
Storage Samsung SSD Evo 120GB -WD SN580 1TB - Toshiba 2TB HDWT720 - 1TB GIGABYTE GP-GSTFS31100TNTD
Display(s) Cooler Master GA271 and AoC 931wx (19in, 1680x1050)
Case Green Magnum Evo
Power Supply Green 650UK Plus
Mouse Green GM602-RGB ( copy of Aula F810 )
Keyboard Old 12 years FOCUS FK-8100
That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years.

Care to explain What is Open standard? GPUOpen is base on MIT License while Gameworks is not , even Developer Can not change source code if he/she wants to optimise Source code to run better on rival hardware.remind you Gameworks runs Shiity on AMD cards rather than Nvidia cards! If you believe Open Means looking at source without changing source then don't bother to reply me.I don't believe this.

do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.
also, Nvidia has issue with Vulkan and DX12 Support.look at AtoS , It took them for a while to beat Fury X with GTX980Ti and Also Doom.where Is Day 1 Driver for BF1 DX12?
I really Hope AMD drops OpenGL Support.OpenGL has belonged to Past and needs to die.
 
Last edited:
Joined
Aug 20, 2007
Messages
21,406 (3.41/day)
System Name Pioneer
Processor Ryzen R9 9950X
Motherboard GIGABYTE Aorus Elite X670 AX
Cooling Noctua NH-D15 + A whole lotta Sunon and Corsair Maglev blower fans...
Memory 64GB (4x 16GB) G.Skill Flare X5 @ DDR5-6000 CL30
Video Card(s) XFX RX 7900 XTX Speedster Merc 310
Storage Intel 905p Optane 960GB boot, +2x Crucial P5 Plus 2TB PCIe 4.0 NVMe SSDs
Display(s) 55" LG 55" B9 OLED 4K Display
Case Thermaltake Core X31
Audio Device(s) TOSLINK->Schiit Modi MB->Asgard 2 DAC Amp->AKG Pro K712 Headphones or HDMI->B9 OLED
Power Supply FSP Hydro Ti Pro 850W
Mouse Logitech G305 Lightspeed Wireless
Keyboard WASD Code v3 with Cherry Green keyswitches + PBT DS keycaps
Software Gentoo Linux x64 / Windows 11 Enterprise IoT 2024
also, Nvidia has issue with Vulkan and DX12 Support

They really only have an issue with async compute.

If you read the article above, you'd realize you can't really have an issue with something that's "low level," only implementations within it.

I agree about open standards though. Except for the part about dropping OpenGL support. I like my old games to run, yo. That and almost ALL linux ports rely on it. It's hardly obsolete at this point, just being phased out. You're lining the accused up for the firing squad before you've even read them their rights.
 
Joined
Nov 18, 2010
Messages
7,492 (1.47/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
Here we go again.

FB_IMG_1476965062836.jpg
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
While it's nice to see some editorial content instead of simply recirculation of other's content, it would be nice if the author demonstrated a good understanding of the subject.

First, I would like to thank you for the lenght and minutiae of your post. This clearly means something to you, and your contribution and time dedicated to this deserve praise. Most of your criticism is constructive, and will allow readers to see into deeper detail what I tried to convey on the piece.

That said, you should have kept in mind that this isn't supposed to be neither a deep dive, nor a white paper. This is simply trying to explain in some more detail what exactly is meant by these shader intrinsic functions. So, you should look at this piece as an abstraction layer unto itself, not as an be-all-end-all exploration. Some inaccuracies are inevitable.


This is just AMD's PR department inventing a new name to an old feature, like most of the other stuff in their "GPUOpen" initiative. Writing hardware specific shader programs in assembly has been possible for a long time, in fact I remember this before HLSL and GLSL was even a thing, like 15 years ago. Using GPU shader features in assembly code has been possible which each new hardware generation.

You are right, of course, and I did mention HLSL, though I'm not familiar with previous implementations of the subject. And while this is, obviously, a PR spin, I think AMD deserves to do it, based on the fact that it is now much more relevant than it was before, simply based on the current architecture proximity between consoles and PC. You just have to look at XBOX 360's architecture and compare it to the XBOX One's or PS4's to see that today, GPUs in consoles are much closer to their PC counterparts than ever before. That is why I agree that this subject has more relevance now, and why I accept that AMD spins it that way.


You are mixing the API calls with the GPU shader programs. While APIs usually bundles API calls and a shader language, they constitutes separate parts of the rendering; one executing in the CPU and one in the GPU. Of course both may provide some degree of abstractions and vendor specific features.
- APIs such as Direct3D, OpenGL, Vulkan, and vendor specific ones like Mantle and Cg, all provide a set of API calls which serves as an interface between the game and the driver.
- Shader programs are pieces of code executing in the GPU cores. Shader programs are usually written in a high level language like HLSL or GLSL, converted to an IR for distribution and compiled to machine code by the driver. Shader intrinsic functions is all about writing hardware specific shader programs directly in assembly, potentially creating more optimal code. It has nothing to do with potential abstractions in the API calls on the CPU side.

Like I said above, thank you for this. If anyone wants to, they can read this and better understand what are the underlying systems.


This illustration does not match rendering at all, this is 100% wrong.

So your calculation of 5 × 5 ms = 25 ms have no relation to reality. And even if a game used 25 ms for compute, the whole frame would probably take more than 100 ms, resulting in less than 10 FPS, so this overhead clearly does not exist as you described.

It isn't 100% wrong, since it isn't meant to match rendering. This is simply so that readers can understand what is meant by layers, and how a "given computer system" operates. I never claimed it to be graphics-related. It just serves to show that there are usually underlying processes between the OS and the hardware executing code. I am fully aware that 25ms is impossibly huge, since for VR, for example, a single frame must be rendered at around 13.3ms for achieving the 90fps threshold. Like I said, "imagine, for argument's sake". It's an abstraction. Thank you again for the rest of your write-up, as it again goes into more detail than I wanted to in this piece, but is still very much relevant to the subject at hand.

No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.

I won't even dignify that with an answer. Just re-read what I wrote and you'll see how that was completely blown out of proportion and uncalled for.
 
Last edited:
Joined
Nov 4, 2005
Messages
11,960 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Nvidia has done everything to keep their stuff locked down,
That's not true at all. Both vendors have open and proprietary parts. Gameworks is mostly open, while some requires an NDA. Almost all of it runs on AMD GPUs, claiming otherwise is untrue. They also provide the most extensive collection of examples and best practices for modern graphics development.
And GPUOpen is no "open standard" at all, it's mostly a collection of renamed tools and libraries which has existed for years. And do I need to remind you which vendor who were the last to provide Vulkan support? And still to this date fail to provide stable OpenGL support.
No vendor is even close to perfect, but there is no doubt that no one has done more to promote open standards than Nvidia. So please show some professionalism, and stop painting the picture as one being the champion of openness while the other being the evil proprietary one.

http://gpuopen.com/compute-product/amd-openvx/

https://github.com/GPUOpen-Effects/TressFX/releases/tag/v3.1.1

Holy shit, source code!!!! Free to modify.


VS


http://docs.nvidia.com/gameworks/content/artisttools/hairworks/HairWorks_sdkSamples.html

path = "NvHairWorksDx11.win64.D.dll"

yayyy, they give us .dll's..... casue having the dll is the same as source code right? Like how MS releases their source code with every OS, and every program doesn't include .dll's? Right?


Cause when you download and agree to

""NVIDIA GameWorks SDK" means the set of instructions for computers, in executable form only and in any media (which may include diskette, CD-ROM, downloadable internet, hardware, or firmware) comprising NVIDIA's proprietary Software Development Kit and related media and printed materials, including reference guides, documentation, and other manuals, installation routines and support files, libraries, sample art files and assets, tools, support utilities and any subsequent updates or adaptations provided by NVIDIA, whether with this installation or as separately downloaded (unless containing their own separate license terms and conditions)."

"
In addition, you may not and shall not permit others to:

I. modify, reproduce, de-compile, reverse engineer or translate the NVIDIA GameWorks SDK; or
II. distribute or transfer the NVIDIA GameWorks SDK other than as part of the NVIDIA GameWorks Application."

"
3. Redistribution; NVIIDA GameWorks Applications. Any redistribution of the NVIDIA GameWorks SDK (in accordance with Section 2 above) or portions thereof must be subject to an end user license agreement including language that

a) prohibits the end user from modifying, reproducing, de-compiling, reverse engineering or translating the NVIDIA GameWorks SDK;
b) prohibits the end user from distributing or transferring the NVIDIA GameWorks SDK other than as part of the NVIDIA GameWorks Application;"
 

qubit

Overclocked quantum bit
Joined
Dec 6, 2007
Messages
17,865 (2.89/day)
Location
Quantum Well UK
System Name Quantumville™
Processor Intel Core i7-2700K @ 4GHz
Motherboard Asus P8Z68-V PRO/GEN3
Cooling Noctua NH-D14
Memory 16GB (2 x 8GB Corsair Vengeance Black DDR3 PC3-12800 C9 1600MHz)
Video Card(s) MSI RTX 2080 SUPER Gaming X Trio
Storage Samsung 850 Pro 256GB | WD Black 4TB | WD Blue 6TB
Display(s) ASUS ROG Strix XG27UQR (4K, 144Hz, G-SYNC compatible) | Asus MG28UQ (4K, 60Hz, FreeSync compatible)
Case Cooler Master HAF 922
Audio Device(s) Creative Sound Blaster X-Fi Fatal1ty PCIe
Power Supply Corsair AX1600i
Mouse Microsoft Intellimouse Pro - Black Shadow
Keyboard Yes
Software Windows 10 Pro 64-bit
Well, I'm surprised that there's a difference in featureset between the hardware and the API, as I thought the two were developed together.

So, for a GTX 580 for example, it supports DX11.0, so I'd expect the GF110 GPU in it to support this and not have any features "left over", or conversely, not fully support all of DX11.0's features.

I get that the hardware could support some unofficial and undocumented features not in the API allowing for "trick shot" special effects, but these shouldn't be significant.

Great article raevenlord. :)
 
Joined
Nov 18, 2010
Messages
7,492 (1.47/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
all of DX11.0's features.

Just a mess with naming... Direct X has always been like that. Remember DX8, 8a, 8b, 8.1 DX9 abc etc... And now everyone is confused... well... just as always...
 

Raevenlord

News Editor
Joined
Aug 12, 2016
Messages
3,755 (1.25/day)
Location
Portugal
System Name The Ryzening
Processor AMD Ryzen 9 5900X
Motherboard MSI X570 MAG TOMAHAWK
Cooling Lian Li Galahad 360mm AIO
Memory 32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s) Gigabyte RTX 3070 Ti
Storage Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s) Acer Nitro VG270UP (1440p 144 Hz IPS)
Case Lian Li O11DX Dynamic White
Audio Device(s) iFi Audio Zen DAC
Power Supply Seasonic Focus+ 750 W
Mouse Cooler Master Masterkeys Lite L
Keyboard Cooler Master Masterkeys Lite L
Software Windows 10 x64
Top