• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

NVIDIA "Pascal" GP100 Silicon Detailed

Joined
Nov 18, 2010
Messages
7,564 (1.48/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
Joined
Sep 17, 2014
Messages
22,482 (6.03/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
So, the consumer Geforce GP100 has

- No HBM2 (GDDR5X)
- No NVlink (no functionality on x86, no point otherwise)
- A lower enabled SM count (they will never start with the maximum GP100 can offer, they still need a later release top-end 'ti' version, which means we will see at least two SM's disabled, or more, unless they get really good yields - which is highly unlikely given the maturity of 14/16nm)
- Probably similar or slightly bumped clocks

Also, about positioning of cards, first version GP100 will have a much greater gap with the later version with all SM enabled because the difference will be at least 2 SM's. Nvidia will have a big performance jump up its sleeve, totally different from how they handled things with Kepler and 780ti.
 
Last edited:
Joined
Nov 26, 2007
Messages
310 (0.05/day)
Processor AMD Ryzen R7 3900X
Motherboard Gigabyte X570 Asrock X570 Taichi
Cooling 2x LL140, 4x LL120 / bequiet! Dark Rock 4
Memory 32GB Corsair Vengence RGB DDR4 (3600 Mhz)
Video Card(s) Red Devil 5700XT
Storage x1 Inland 1TB Nvme, 1x Samsung 860 EVO 1TB, 1x WD 1TB, 1x Crucial MX500 500GB, 1x Sandisk X400 256GB
Display(s) Samsung C32H711
Case Fractal Design Meshify C
Audio Device(s) Onboard
Power Supply Seasonic Prime Titanium 850W
Mouse Logitech G502 Proteus Spectrum
Keyboard Logitech G910
Software Windows 10 Pro
With all the bandwidth in a PCIe 3.0 X16 slot why is not possible to use that in place of NVLink? They are no where near saturated with modern cards from the testing I have seen.
 
Joined
Feb 18, 2013
Messages
2,184 (0.51/day)
Location
Deez Nutz, bozo!
System Name Rainbow Puke Machine :D
Processor Intel Core i5-11400 (MCE enabled, PL removed)
Motherboard ASUS STRIX B560-G GAMING WIFI mATX
Cooling Corsair H60i RGB PRO XT AIO + HD120 RGB (x3) + SP120 RGB PRO (x3) + Commander PRO
Memory Corsair Vengeance RGB RT 2 x 8GB 3200MHz DDR4 C16
Video Card(s) Zotac RTX2060 Twin Fan 6GB GDDR6 (Stock)
Storage Corsair MP600 PRO 1TB M.2 PCIe Gen4 x4 SSD
Display(s) LG 29WK600-W Ultrawide 1080p IPS Monitor (primary display)
Case Corsair iCUE 220T RGB Airflow (White) w/Lighting Node CORE + Lighting Node PRO RGB LED Strips (x4).
Audio Device(s) ASUS ROG Supreme FX S1220A w/ Savitech SV3H712 AMP + Sonic Studio 3 suite
Power Supply Corsair RM750x 80 Plus Gold Fully Modular
Mouse Corsair M65 RGB FPS Gaming (White)
Keyboard Corsair K60 PRO RGB Mechanical w/ Cherry VIOLA Switches
Software Windows 11 Professional x64 (Update 23H2)
With the consumer grade chips coming in a few months' time, I will wait for TPU to get their hands on some samples before I consider upgrading.
 

bug

Joined
May 22, 2015
Messages
13,788 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
Well yeah there's no x86 processor which have nvlink support(and I don't believe there ever will be such a processor). But GPU-to-GPU link should be possible with x86 processor, thus dual gpu cards could use it between gpus and use pcie to communicate with cpu(GTC 2016 nvlink graph).

Physically, were would you send the NVLink data if the card is only connected to a PCIe slot?
 
Joined
Jun 13, 2012
Messages
1,392 (0.31/day)
Processor i7-13700k
Motherboard Asus Tuf Gaming z790-plus
Cooling Coolermaster Hyper 212 RGB
Memory Corsair Vengeance RGB 32GB DDR5 7000mhz
Video Card(s) Asus Dual Geforce RTX 4070 Super ( 2800mhz @ 1.0volt, ~60mhz overlock -.1volts)
Storage 1x Samsung 980 Pro PCIe4 NVme, 2x Samsung 1tb 850evo SSD, 3x WD drives, 2 seagate
Display(s) Acer Predator XB273u 27inch IPS G-Sync 165hz
Audio Device(s) Logitech Z906 5.1
Power Supply Corsair RMx Series RM850x (OCZ Z series PSU retired after 13 years of service)
Mouse Logitech G502 hero
Keyboard Logitech G710+
This doesn't sound like much/any improvement on the async shaders front. It also only represents an increase of 16.7% in CUDA core count compared to Titan X. I think I'm disappointed.

Typical AMD fanboy being an idiot when talking about a card that doesn't have a single use for async shaders to start with. go back to your mom's basement. Tesla cards don't need it for the work they do which is NOT gaming.

So, the consumer Geforce GP100 has

- No HBM2 (GDDR5X)
- No NVlink (no functionality on x86, no point otherwise)
- A lower enabled SM count (they will never start with the maximum GP100 can offer, they still need a later release top-end 'ti' version, which means we will see at least two SM's disabled, or more, unless they get really good yields - which is highly unlikely given the maturity of 14/16nm)
- Probably similar or slightly bumped clocks

Also, about positioning of cards, first version GP100 will have a much greater gap with the later version with all SM enabled because the difference will be at least 2 SM's. Nvidia will have a big performance jump up its sleeve, totally different from how they handled things with Kepler and 780ti.

Nvlink isn't needed on consumer end at this time. As for HBM don't know what high end cards will have but mid range will be gddr5. Speculating and making claims based on just making up stupid crap only makes said person look an idiot. If high end x80 cards are based on GP100/P100 chips they should be HBM2 since that will be controller on them.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
This doesn't sound like much/any improvement on the async shaders front.
Do you have access to detailed information about the internal GPU scheduling?
Stop spinning the myth that Nvidia doesn't support async compute, it's a planned feature of CUDA 8 scheduled for June.

It also only represents an increase of 16.7% in CUDA core count compared to Titan X. I think I'm disappointed.
P100 increased the FP32 performance by 73% over Titan X, 88% over 980 Ti, using only 17% more CUDA cores. That's a pretty impressive increase in IPC.

Game performance doesn't scale linearly with FP32, but we should be able to get 50-60% higher gaming performance on a such chip.

Who cares....GP100 cards are still nearly a year away.

Wake me up a week before launch.
We wouldn't see any GP100 based graphics cards anytime soon, GP102 will be the fastest one in a graphics card this year.

With all the bandwidth in a PCIe 3.0 X16 slot why is not possible to use that in place of NVLink? They are no where near saturated with modern cards from the testing I have seen.
NVLink is designed for compute workloads, no graphic workload needs it.
 
Joined
Jul 24, 2007
Messages
244 (0.04/day)
Well, thank you for the information. Now we will still have to wait and see what end up being released and how fast it will truly be. (Or not be.)
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.45/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Typical AMD fanboy being an idiot when talking about a card that doesn't have a single use for async shaders to start with. go back to your mom's basement. Tesla cards don't need it for the work they do which is NOT gaming.
You're right, async shaders are useless in Tesla because graphics and compute don't intermingle. That said, FirePro cards still have ACEs to handle async compute workloads for compatibility sake--NVIDIA should too.

Do you have access to detailed information about the internal GPU scheduling?
Stop spinning the myth that Nvidia doesn't support async compute, it's a planned feature of CUDA 8 scheduled for June.
Unless there's major changes in the GigaThread Engine, there's nothing on the diagram that suggests it is fixed.

P100 increased the FP32 performance by 73% over Titan X, 88% over 980 Ti, using only 17% more CUDA cores. That's a pretty impressive increase in IPC.
You're right, it is baring in mind that Tesla is designed specifically for FP32 and FP64 performance. We'll have to wait and see if that translates to graphics cards.
 
Last edited:

the54thvoid

Super Intoxicated Moderator
Staff member
Joined
Dec 14, 2009
Messages
13,060 (2.39/day)
Location
Glasgow - home of formal profanity
Processor Ryzen 7800X3D
Motherboard MSI MAG Mortar B650 (wifi)
Cooling be quiet! Dark Rock Pro 4
Memory 32GB Kingston Fury
Video Card(s) Gainward RTX4070ti
Storage Seagate FireCuda 530 M.2 1TB / Samsumg 960 Pro M.2 512Gb
Display(s) LG 32" 165Hz 1440p GSYNC
Case Asus Prime AP201
Audio Device(s) On Board
Power Supply be quiet! Pure POwer M12 850w Gold (ATX3.0)
Software W10
We'll all wait and see what comes from Pascal. For my part, I think I'll be keeping my Maxwell till 2017. I think Vega and the consumer GP100 equivalent will be my next options.
As for the whole Async shambles, its not going to be Nvidia's downfall. If people think that's a reality, you need to ease off the red powder you're snorting.
Most likely outcome is very simply Pascal using brute power to deliver the experience.
I don't mind which way it goes, I just hope it goes one way to force the other side to go cheaper.
 
Joined
Sep 7, 2011
Messages
2,785 (0.58/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
You're right, async shaders are useless in Tesla because graphics and compute don't intermingle. That said, FirePro cards still have ACEs to handle async compute workloads for compatibility sake--NVIDIA should too.
FirePro's have ACE's because a FirePro is literally a Radeon bundled with an OpenCL driver. The only real differences aren't at the GPU level (aside from possibly binning) - they are memory capacity and PCB/power input layout. Nvidia began the bifurcation of its GPU line with the GK210, a revision of GK110 aimed solely at compute workloads, and a GPU that never saw the light of day as a consumer grade card. GP100 looks very much like the same design ethos.
You're right, it is baring in mind that Tesla is designed specifically for FP32 and FP64 performance. We'll have to wait and see if that translates to graphics cards.
A big selling point for GP100 is its mixed compute ability, which includes FP16 for deep learning neural networks.
Unless there's major changes in the GigaThread Engine, there's nothing on the diagram that suggests it is fixed.
Difficult to make any definitive comment based on a single HPC orientated part, but my understanding is Nvidia is leaving most of the architectural reworking until Volta (which apparently will have compatibility with Pascal hardware interfaces for HPC). Nvidia will rely on more refined preemption (this Pascal overview is one of the better around), better perf/watt due to the lower ALU to SM ratio, and a fast ramp of first-gen 16nm product at the expense of a time consuming architecture revision. Whether the brute force increase in GPU power, the perf/$, and Perf/watt offset any deficiencies in async compute in comparison to AMD will have to wait until we see the products, but I'm guessing that since AMD themselves are hyping perf/watt rather than all out performance, I really don't expect any major leaps from AMD either. Just as apropos, is whether AMD have incorporated conservative rasterization into Polaris.
As for the whole Async shambles, its not going to be Nvidia's downfall.
True enough. The difference in performance is marginal for the most part with people reduced to quoting percentages because the actual frames per second differences sound too trivial. It's great that AMD's products get an uplift using the DX12 code path using async compute, but I can't help but feel that the general dog's breakfast that is DX12 and its driver implementations at present tend to reduce the numbers to academic interest - mores o given that the poster boy for async compute- AotS - seems to be a game people are staying away from in droves.
Most likely outcome is very simply Pascal using brute power to deliver the experience.
Agreed. Nvidia seem to have targeted time to market and decided on a modest increase in GPU horsepower across their three most successful segments (if SweClockers is to be believed), while AMD seem content to mostly replace their more expensive to produce and less energy efficient GPUs - something long overdue in the discrete mobile market.
I don't mind which way it goes, I just hope it goes one way to force the other side to go cheaper.
Hopefully....but the cynic in me thinks that Nvidia and AMD might just continue their unspoken partnership in dovetailing price/product.
 
Last edited:

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.45/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Difficult to make any definitive comment based on a single HPC orientated part, but my understanding is Nvidia is leaving most of the architectural reworking until Volta (which apparently will have compatibility with Pascal hardware interfaces for HPC). Nvidia will rely on more refined preemption (this Pascal overview is one of the better around), better perf/watt due to the lower ALU to SM ratio, and a fast ramp of first-gen 16nm product at the expense of a time consuming architecture revision. Whether the brute force increase in GPU power, the perf/$, and Perf/watt offset any deficiencies in async compute in comparison to AMD will have to wait until we see the products, but I'm guessing that since AMD themselves are hyping perf/watt rather than all out performance, I really don't expect any major leaps from AMD either. Just as apropos, is whether AMD have incorporated conservative rasterization into Polaris.
That's what I was afraid of...hence my disappointment. :( Pascal appears to be an incremental update, not a major update; Polaris too for that matter. If Polaris doesn't have conservative rasterization, add that to my disappointment list. :laugh:

I think both AMD and NVIDIA are banking on higher IPC + higher clocks. If NVIDIA can manage 1.3 GHz clocks on TSMC where AMD can only manage 800-1000 MHz at GloFo, AMD is going to lose the performance competition; this may be why they're touting performance/watt, not performance. :(
 
Joined
Sep 7, 2011
Messages
2,785 (0.58/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
Unless they use it in dual cards instead of the PLX bridge or some laptop setups... most probably the block will be ommited in consumer cards.
Could be an interesting option to retain a single NVLink for dual GPU cards - a bit overkill on the bandwidth front, but it would make the overall bill of materials fractionally lower not having to pay Avago for the lane extender chips I suppose.
It was officially announced NVLink only works with POWER CPUs at this time. So no, it's not for home use.
Really? Wasn't that pretty much dispelled with the announcement of the dual Xeon E5 DGX-1 and Quanta's x86 support announcement during their DGX-1 demonstration? SuperMicro has also announced x86 NVLink compatibility in a semi-leaked form. Nvidia also announced NVLink compatibility for ARM64 architectures during the same GTC presentation

So, the consumer Geforce GP100 has
- No HBM2 (GDDR5X)
- No NVlink (no functionality on x86, no point otherwise)
- A lower enabled SM count (they will never start with the maximum GP100 can offer, they still need a later release top-end 'ti' version, which means we will see at least two SM's disabled, or more, unless they get really good yields - which is highly unlikely given the maturity of 14/16nm)
- Probably similar or slightly bumped clocks
That seems like a lot of supposition being related as fact with no actual proof. Historically, GeForce parts have higher clocks than HPC parts, and Nvidia has amortized production costs by leading with salvage parts - so these are very possible, but I've just provided evidence of x86 NVLink support, and I haven't seen any indication that GP100 won't be paired with HBM2. Where did you see this supposed fact?
 
Last edited:
Joined
Sep 17, 2014
Messages
22,482 (6.03/day)
Location
The Washing Machine
System Name Tiny the White Yeti
Processor 7800X3D
Motherboard MSI MAG Mortar b650m wifi
Cooling CPU: Thermalright Peerless Assassin / Case: Phanteks T30-120 x3
Memory 32GB Corsair Vengeance 30CL6000
Video Card(s) ASRock RX7900XT Phantom Gaming
Storage Lexar NM790 4TB + Samsung 850 EVO 1TB + Samsung 980 1TB + Crucial BX100 250GB
Display(s) Gigabyte G34QWC (3440x1440)
Case Lian Li A3 mATX White
Audio Device(s) Harman Kardon AVR137 + 2.1
Power Supply EVGA Supernova G2 750W
Mouse Steelseries Aerox 5
Keyboard Lenovo Thinkpad Trackpoint II
VR HMD HD 420 - Green Edition ;)
Software W11 IoT Enterprise LTSC
Benchmark Scores Over 9000
That seems like a lot of supposition being related as fact with no actual proof. Historically, GeForce parts have higher clocks than HPC parts, and Nvidia has amortized production costs by leading with salvage parts - so these are very possible, but I've just provided evidence of x86 NVLink support, and I haven't seen any indication that GP100 won't be paired with HBM2. Where did you see this supposed fact?

I know, it was just something I pulled out as my assumption of what it may look like. Facts, not at all :) About HBM2, we will have to see, apart from the memory controller being set on the current part I am having trouble justifying it in terms of being 'required' when GDDR5X is available. Nvidia will want to cut corners if they can, and they have ample time to alter the design, though Im not sure its economically smart to do so.
 
Joined
Apr 30, 2012
Messages
3,881 (0.84/day)
Could be an interesting option to retain a single NVLink for dual GPU cards - a bit overkill on the bandwidth front, but it would make the overall bill of materials fractionally lower not having to pay Avago for the lane extender chips I suppose.

Really? Wasn't that pretty much dispelled with the announcement of the dual Xeon E5 DGX-1 and Quanta's x86 support announcement during their DGX-1 demonstration? SuperMicro has also announced x86 NVLink compatibility in a semi-leaked form. Nvidia also announced NVLink compatibility for ARM64 architectures during the same GTC presentation


That seems like a lot of supposition being related as fact with no actual proof. Historically, GeForce parts have higher clocks than HPC parts, and Nvidia has amortized production costs by leading with salvage parts - so these are very possible, but I've just provided evidence of x86 NVLink support, and I haven't seen any indication that GP100 won't be paired with HBM2. Where did you see this supposed fact?




1. QuantaPlex T21W-3U:
This x86 server employs high-bandwidth and energy-efficient NVLink interconnects to enable extremely fast communication between eight of the latest NVidia GPU modules (SXM2).

x86 still use PCIe Swith between GPU & CPU. I think only IBM with newest Power CPU will be able to do NVLink GPU-to-CPU
 
Last edited:
Joined
Sep 7, 2011
Messages
2,785 (0.58/day)
Location
New Zealand
System Name MoneySink
Processor 2600K @ 4.8
Motherboard P8Z77-V
Cooling AC NexXxos XT45 360, RayStorm, D5T+XSPC tank, Tygon R-3603, Bitspower
Memory 16GB Crucial Ballistix DDR3-1600C8
Video Card(s) GTX 780 SLI (EVGA SC ACX + Giga GHz Ed.)
Storage Kingston HyperX SSD (128) OS, WD RE4 (1TB), RE2 (1TB), Cav. Black (2 x 500GB), Red (4TB)
Display(s) Achieva Shimian QH270-IPSMS (2560x1440) S-IPS
Case NZXT Switch 810
Audio Device(s) onboard Realtek yawn edition
Power Supply Seasonic X-1050
Software Win8.1 Pro
Benchmark Scores 3.5 litres of Pale Ale in 18 minutes.
x86 still use PCIe Swith between GPU & CPU. I think only IBM with newest Power CPU will be able to do NVLink GPU-to-CPU
The original contention put forward by bug was that NVLink could not be used as a bridge between GPUs in a dual GPU card as put forward by Ferrum Master. That is incorrect, as is your assumption that we were talking about GPU<->CPU traffic.


FWIW, the advantages of GPU point to point bandwidth advances using NVLink have been doing the rounds for the best part of a week since the Quanta and SuperMicro info dropped.
 
Last edited:
Joined
Nov 18, 2010
Messages
7,564 (1.48/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
Could be an interesting option to retain a single NVLink for dual GPU cards - a bit overkill on the bandwidth front

If latency is low enough it should render possible to address neighboring GPU's RAM pool without a performance tax... That's the best thing that could happen. So I suppose overkill on the bandwidth front is really not possible ;).
 

bug

Joined
May 22, 2015
Messages
13,788 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
The original contention put forward by bug was that NVLink could not be used as a bridge between GPUs in a dual GPU card as put forward by Ferrum Master. That is incorrect, as is your assumption that we were talking about GPU<->CPU traffic.


FWIW, the advantages of GPU point to point bandwidth advances using NVLink have been doing the rounds for the best part of a week since the Quanta and SuperMicro info dropped.

But what's the physical medium NVLink uses on a x86 platform? I haven't seen any "SLI bridge"-style thing and the card is only connected to a PCIe bus.

Edit: You seem too eager to take marketing slides as actual real-world gains.
 
Joined
Nov 18, 2010
Messages
7,564 (1.48/day)
Location
Rīga, Latvia
System Name HELLSTAR
Processor AMD RYZEN 9 5950X
Motherboard ASUS Strix X570-E
Cooling 2x 360 + 280 rads. 3x Gentle Typhoons, 3x Phanteks T30, 2x TT T140 . EK-Quantum Momentum Monoblock.
Memory 4x8GB G.SKILL Trident Z RGB F4-4133C19D-16GTZR 14-16-12-30-44
Video Card(s) Sapphire Pulse RX 7900XTX. Water block. Crossflashed.
Storage Optane 900P[Fedora] + WD BLACK SN850X 4TB + 750 EVO 500GB + 1TB 980PRO+SN560 1TB(W11)
Display(s) Philips PHL BDM3270 + Acer XV242Y
Case Lian Li O11 Dynamic EVO
Audio Device(s) SMSL RAW-MDA1 DAC
Power Supply Fractal Design Newton R3 1000W
Mouse Razer Basilisk
Keyboard Razer BlackWidow V3 - Yellow Switch
Software FEDORA 41
But what's the physical medium NVLink uses on a x86 platform? I haven't seen any "SLI bridge"-style thing and the card is only connected to a PCIe bus.

Edit: You seem too eager to take marketing slides as actual real-world gains.

SERVERS! If the server cluster will use x86, they will use a demux bridge to nvlink! It is really simple as that... They have no conventional connectors there, it is all custom made, they could make their own boards as shown per slides.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,171 (2.81/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
At this point it's not known if each port has a throughput of 80 GB/s (per-direction), or all four ports put together.
This article seems to indicate that a single NVLink connection is rated for 20GB/s, not 80GB. The total theoretical aggregate bandwidth appears to be 80GB but, that appears to be highly variable depending on how much data is being set at any given time. That is, full bandwidth isn't realized unless each message is packed as full as it can get and the smaller each "packet" (I'm thinking network/PCI-E like mechanics here,) is, the worse bandwidth is going to get. So don't plan on sending a ton of small messages over NVLink, it seems to like large ones. The situation would have to be pretty special to realize a full 80GB/s and even still, if the following article is to be trusted, even in the best of circumstances, no more than 70GB/s will probably get realized due to overhead introduced with smaller sized data and that assumes load can be equally spread out accross NVLink connections. I would rather see PCI-E v4.
http://www.hardware.fr/news/14587/gtc-tesla-p100-debits-pcie-nvlink-mesures.html
 

bug

Joined
May 22, 2015
Messages
13,788 (3.96/day)
Processor Intel i5-12600k
Motherboard Asus H670 TUF
Cooling Arctic Freezer 34
Memory 2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s) EVGA GTX 1060 SC
Storage 500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s) Dell U3219Q + HP ZR24w
Case Raijintek Thetis
Audio Device(s) Audioquest Dragonfly Red :D
Power Supply Seasonic 620W M12
Mouse Logitech G502 Proteus Core
Keyboard G.Skill KM780R
Software Arch Linux + Win10
SERVERS! If the server cluster will use x86, they will use a demux bridge to nvlink! It is really simple as that... They have no conventional connectors there, it is all custom made, they could make their own boards as shown per slides.
My point exactly. x86 won't be able to use NVLink. Instead, custom designs will.
 
Joined
Jun 10, 2014
Messages
2,987 (0.78/day)
Processor AMD Ryzen 9 5900X ||| Intel Core i7-3930K
Motherboard ASUS ProArt B550-CREATOR ||| Asus P9X79 WS
Cooling Noctua NH-U14S ||| Be Quiet Pure Rock
Memory Crucial 2 x 16 GB 3200 MHz ||| Corsair 8 x 8 GB 1333 MHz
Video Card(s) MSI GTX 1060 3GB ||| MSI GTX 680 4GB
Storage Samsung 970 PRO 512 GB + 1 TB ||| Intel 545s 512 GB + 256 GB
Display(s) Asus ROG Swift PG278QR 27" ||| Eizo EV2416W 24"
Case Fractal Design Define 7 XL x 2
Audio Device(s) Cambridge Audio DacMagic Plus
Power Supply Seasonic Focus PX-850 x 2
Mouse Razer Abyssus
Keyboard CM Storm QuickFire XT
Software Ubuntu
Unless there's major changes in the GigaThread Engine, there's nothing on the diagram that suggests it is fixed.
You still need to prove that it's broken. Async shaders is a feature of Maxwell, and is a planned feature of CUDA 8. Nvidia even applied for patents on the technology several years back.

People constantly needs to be reminded that async shaders is an optional feature of Direct3D 12, and Nvidia so far has prioritized implementing it for CUDA as there really still are no good games on the market to utilize it anyway.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,259 (4.45/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
Async shaders have been implemented since D3D11 (not "optional"). AMD implemented them as they should be in GCN; NVIDIA half-assed it making the entire GPC have to be scheduled for compute or graphics, not both (not Microsoft's intended design). There are plenty of resources on the internet that show it is broke going all of the way back to Kepler (which should have had it because it was a D3D11 card). Game developers haven't been using it likely because it causes a framerate drop on NVIDIA cards. Async shading is getting the most traction on consoles (especially PlayStation 4) because they're trying to squeeze every drop of performance they can out of the console.

"...Nvidia so far has prioritized implementing it for CUDA..." and that's the problem! Async shaders are a feature of graphics pipeline, not CUDA. CUDA 8 makes no mention of it, nor should it.

Edit: Here's the article: http://ext3h.makegames.de/DX12_Compute.html
MakeGames.de said:
Compute and 3D engine can not be active at the same time as they utilize a single function unit.
The Hyper-Q interface used for CUDA is in fact supporting concurrent execution, but it's not compatible with the DX12 API.
If it was used, there would be a hardware limit of 32 asynchronous compute queues in addition to the 3D engine.
MakeGames.de said:
  • The workload on a single queue should always be sufficient to fully utilize the GPU.

    There is no parallelism between the 3D and the compute engine so you should not try to split workload between regular draw calls and compute commands arbitrarily. Make sure to always properly batch both draw calls and compute commands.

    Pay close attention not to stall the GPU with solitary compute jobs limited by texture sample rate, memory latency or anything alike. Other queues can't become active as long as such a command is running.
  • Compute commands should not be scheduled on the 3D queue.

    Doing so will hurt the performance measurably. The 3D engine does not only enforce sequential execution, but the reconfiguration of the SMM units will impair performance even further.

    Consider the use of a draw call with a proxy geometry instead when batching and offloading is not an option for you. This will still save you a few microseconds as opposed to interleaving a compute command.
  • Make 3D and compute sections long enough.

    Switching between compute and 3D queues results in a full flush of all pipelines. The GPU should have spent enough time in one mode to justify the penalty for switching.

    Beware that there is no active preemption, a long running shader in either engine will stall the transition.
If developers start implementing async shaders into their games, they'll always be nerfed on Maxwell and older cards. Backwards compatibility support will be poor because NVIDIA didn't properly implement the feature going into D3D11.
 
Last edited:
  • Like
Reactions: xvi
Joined
Feb 8, 2012
Messages
3,014 (0.64/day)
Location
Zagreb, Croatia
System Name Windows 10 64-bit Core i7 6700
Processor Intel Core i7 6700
Motherboard Asus Z170M-PLUS
Cooling Corsair AIO
Memory 2 x 8 GB Kingston DDR4 2666
Video Card(s) Gigabyte NVIDIA GeForce GTX 1060 6GB
Storage Western Digital Caviar Blue 1 TB, Seagate Baracuda 1 TB
Display(s) Dell P2414H
Case Corsair Carbide Air 540
Audio Device(s) Realtek HD Audio
Power Supply Corsair TX v2 650W
Mouse Steelseries Sensei
Keyboard CM Storm Quickfire Pro, Cherry MX Reds
Software MS Windows 10 Pro 64-bit
Joined
Apr 30, 2012
Messages
3,881 (0.84/day)
Top