NVIDIA Laying Groundwork for Multi-Chip-Module GPUs

Solidstate89 · Jul 5, 2017

FordGT90Concept said:
I think AMD was a little too obvious talking about Navi. Of course NVIDIA is going to try to head them off.

That said, I'm all for MCM because it means higher yields and higher yields means more bang for the buck. My concern/fear is that they'll run into the same problems as SLI/Crossfire where support is spotty.

These companies work on timelines of several years when it comes to arch development. AMD wasn't revealing anything that nVidia wouldn't have already been working on themselves. No company can adapt and build a new arch that quickly.

Hell, look at Intel trying to move up its LGA-20166 mobo and CPU launch by just a few months and the disaster that ended up being with an underprepared and buggy platform launch. And that was an already expected platform launch that was accelerated by just a few months.

erocker · Jul 5, 2017

dorsetknob said:
you in the market for a "PUP" then

No idea.

nem.. · Jul 5, 2017

efikkan · Jul 5, 2017

Steevo said:
I agree, but so far its been mostly the drivers fault for not supporting it, or not being smart enough to subdivide instruction sets, or the latency introduced by some dependent code paths unable to be broken apart for split or multicore rendering.

Closer cores and a piece of hardware to check the code path in advance and communicate that back through drivers to fetch data from each branch possibility and run unused cycles even if its wasted would still be faster than some of the negative scaling we have seen, and you are correct that if its presented to all games unless they request lower level access and are optimized for it, that it should run as fast if not faster by at least some percent all of the time.

Low level scheduling has to be done in hardware, the latency from the CPU will never let this be done in the driver. For MCM GPUs to work well, the bandwidth between them has to be good and the latency very low, which is why traditional multi-GPU never will work well like this for games.

evernessince · Jul 6, 2017

Brusfantomet said:
Nvidia could be using on package NVLink for the MCM communication.

As for the dispatcher, one could have a central dispatch core that uses NVLink or infinity fabric to send jobs to a number of cores, with each core having a memory interface and dedicated L1 and L2 memory with access to a shared PCIe link.

NVLink isn't designed to be used to make two GPU cores appear as one. Right now AMD is the only one to have multiple dies appear as one, and that's Ryzen.

justimber · Jul 6, 2017

Nvidia engineers musy be busy reverse engineering a ryzen now

TheGuruStud · Jul 6, 2017

evernessince said:
I'm pretty sure it's going to be much harder for Nvidia to get a MCM GPU to work then AMD. After all, AMD already has infinity fabric while Nvidia has zero experience creating one. If it were something easy Intel would be doing it too but Intel's best MCM implementation isn't nearly as good as infinity fabric.

They employed three universities to do the work for them. Does anyone think those dogs are going to invent something themselves? LOL

nem.. said:
T800 chip pic

Daaaamn, Cameron is good. He predicted the future...maybe it's AMDnet.

HopelesslyFaithful · Jul 6, 2017

Raevenlord said:
Multi-Chip-Module accelerators are nothing new, really. Though there are earlier implementations, when it comes to recognizable hardware most of us has already heard of, these solutions harken back to Intel's Kentsfield and Yorkfield quad-core processors (built on the 65 nm process for the LGA 775 package.) However, a singular issue with this kind of approach is having a powerful, performant-enough interconnect that allows the different cores in each module to really "talk" to each other and work perfectly in tandem. More recently, AMD has demonstrated the advantages of a true MCM (Multi-Chip-Module) approach with its Ryzen CPUs. These result from the development of a modular CPU architecture with a powerful interconnect (Infinity Fabric), which has allowed AMD to keep die size to a minimum (as it relates to a true 8-core design, at least), while enabling the company to profitably scale up to 16-cores (2 MCMs) with Threadripper, and 4 MCMs with Epyc (32 cores.)

AMD has already given hints in that its still long-coming Navi architecture (I mean, we're still waiting for Vega) will bring a true MCM design to GPUs. Vega already supports AMD's Infinity Fabric interconnect as well, paving the way for future APU designs from the company, but also MCM GPU ones, leveraging the same technology. And NVIDIA itself seems to be making strides towards an MCM-enabled future, looking to abandon the monolithic die design approach it has been taking for a long time now.

NVIDIA believes a modular approach is the best, currently technically and technologically feasible solution to a stagnating Moore's Law. CPU and GPU performance and complexity has been leaning heavily on increasing transistor counts and density, whose development and more importantly, production deployment, is slowing down (the curve that seemed to be exponential is actually sigmoidal, eh!). In fact, it is currently estimated that the biggest die-size achievable with today's technology is ~800 mm². The point is driven home when we consider that the company's Tesla V100 comes in at a staggering 815 mm², thus already straining the technical die-size limit. This fact, coupled with the industry's ever-increasing need of ever-increasing performance, leads us to believe that the GV100 GPU will be one of NVIDIA's last monolithic design GPUs (there is still a chance that 7 nm manufacturing will give the company a little more time in developing a true MCM solution, but I would say that odds are NVIDIA's next product will already manifest in such a design.

In a paper published by the company, NVIDIA itself says that the way ahead is towards integration of multiple GPU processing modules in a single package, thus allowing the GPU world to achieve what Ryzen and its Threadripper and EPYC older brothers are already achieving: scaling performance with small dies, and therefore, higher yields... Specifically NVIDIA says that they "(...) propose partitioning GPUs into easily manufacturable basic GPU Modules (GPMs), and integrating them on package using high bandwidth and power efficient signaling technologies." In its white paper, NVIDIA says that "the optimized MCM-GPU design is 45.5% faster than the largest implementable monolithic GPU, and performs within 10% of a hypothetical (and unbuildable) monolithic GPU (...)", and that their "optimized MCM-GPU is 26.8% faster than an equally equipped Multi-GPU system with the same total number of SMs and DRAM bandwidth."

These developments go on to show engineering's ingenuity and drive to improve, and looks extremely promising for companies, since abandoning the monolithic design philosophy and scaling with a variable number of smaller dies should allow for greater yields and improved performance scaling, thus both keeping the high-performance market's needs sated, and the tech companies' bottom line a little better off than they (mostly) already are. Go on ahead and follow the source NVIDIA link for the white paper, it's a very interesting read.

Sources: NVIDIA MCM Paper, Radar.O'Reilly.com

.....ryzen is a 2x4 core OP.....your article has historical errors everywhere. You really need to retract and update with all the fixes that the community has stated.....Do we need to start writing for you? Good god this was awful.

CCX is 4 cores not 8 cores.

http://www.anandtech.com/show/11170...-review-a-deep-dive-on-1800x-1700x-and-1700/5

Another Ravenlord epic fail of an article...retract please!

Prima.Vera · Jul 6, 2017

XiGMAKiD said:
I don't think it will be as spotty as SLI, more like Ryzen module spotty

SLI/CFX kind of support will be only hardware this time, not software (game optimizations)

Octopuss · Jul 6, 2017

Haven't multi GPU solutions been proved as useless in the recent years? Be it either SLI or multi GPU cards.

Also, what sort of technological limit is there to die size?

TheoneandonlyMrK · Jul 6, 2017

HopelesslyFaithful said:
.....ryzen is a 2x4 core OP.....your article has historical errors everywhere. You really need to retract and update with all the fixes that the community has stated.....Do we need to start writing for you? Good god this was awful.

CCX is 4 cores not 8 cores.

http://www.anandtech.com/show/11170...-review-a-deep-dive-on-1800x-1700x-and-1700/5

Another Ravenlord epic fail of an article...retract please!

a ccx might be 4 cores but he was talking about the die which is 8.

bug · Jul 6, 2017

FordGT90Concept said:
That's the problem though: I don't think these appear to the OS as one GPU, they appear as four.

Look at the first block diagram: there's only one I/O block in there.

HopelesslyFaithful · Jul 6, 2017

theoneandonlymrk said:
a ccx might be 4 cores but he was talking about the die which is 8.

which has allowed AMD to keep die size to a minimum (as it relates to a true 8-core design, at least

he said this which is factually false.

CCX in of itself is a MCM.

FordGT90Concept · Jul 6, 2017

bug said:
Look at the first block diagram: there's only one I/O block in there.

Yes, but where's the "Sys I/O" at? The modules appear uniform in size. Do they all have a Sys I/O and only one is functional?

Another concern is that the memory controller: if it isn't in the Sys I/O, that strongly suggests each module has it's own memory controller which makes accesses between pools higher latency. The easiest solution is like SLI and Crossfire: mirroring the VRAM. That's extremely wasteful.

TL;DR I'll believe it when I see it.

bug · Jul 6, 2017

FordGT90Concept said:
Yes, but where's the "Sys I/O" at? The modules appear uniform in size. Do they all have a Sys I/O and only one is functional?

Another concern is that the memory controller: if it isn't in the Sys I/O, that strongly suggests each module has it's own memory controller which makes accesses between pools higher latency. The easiest solution is like SLI and Crossfire: mirroring the VRAM. That's extremely wasteful.

TL;DR I'll believe it when I see it.

Just look at the damn thing. Memory is clearly connected to each module, the Sys I/O isn't.

Raevenlord · Jul 6, 2017

HopelesslyFaithful said:
.....ryzen is a 2x4 core OP.....your article has historical errors everywhere. You really need to retract and update with all the fixes that the community has stated.....Do we need to start writing for you? Good god this was awful.

CCX is 4 cores not 8 cores.

http://www.anandtech.com/show/11170...-review-a-deep-dive-on-1800x-1700x-and-1700/5

Another Ravenlord epic fail of an article...retract please!

You need to take a huge pill of chill and learn how to read.

bug · Jul 6, 2017

Raevenlord said:
You need to take a huge pill of chill and learn how to read.

On teh internetz? NEVAAAAA!!!

TheoneandonlyMrK · Jul 6, 2017

HopelesslyFaithful said:
he said this which is factually false.

CCX in of itself is a MCM.

no its not its not the same the die is the design two such design are not just stuck together, this makes a die, an Mcm is multiple Die on an interposer ala 2.5d

FordGT90Concept · Jul 6, 2017

bug said:
Just look at the damn thing. Memory is clearly connected to each module, the Sys I/O isn't.

It doesn't tell us how the memory is structured/accessed and I was talking about the physical picture where all the modules appear the same on the surface so where is the Sys I/O physically at?

HopelesslyFaithful · Jul 6, 2017

theoneandonlymrk said:
no its not its not the same the die is the design two such design are not just stuck together, this makes a die, an Mcm is multiple Die on an interposer ala 2.5d

it still isn't a true 8 core. It is 2x4 cores packed on 1 die...basically the same thing with all the cons of 2 separate CPUs. (large penalty switching between clusters) Functionally, CCX on die/interposer are the same and definitely not a 8 core.

Raevenlord said:
You need to take a huge pill of chill and learn how to read.

maybe be factually accurate and admit you wrote a crap article?

CCX is a 4 core clusters and not a true 8 core. You need to learn how to read and have integrity in your articles and redact and fix them.

FordGT90Concept said:
It doesn't tell us how the memory is structured/accessed and I was talking about the physical picture where all the modules appear the same on the surface so where is the Sys I/O physically at?

Is this that confusing? look at the lines........

TheoneandonlyMrK · Jul 6, 2017

HopelesslyFaithful said:
it still isn't a true 8 core. It is 2x4 cores packed on 1 die...basically the same thing with all the cons of 2 separate CPUs. (large penalty switching between clusters) Functionally, CCX on die/interposer are the same and definitely not a 8 core.

maybe be factually accurate and admit you wrote a crap article?

CCX is a 4 core clusters and not a true 8 core. You need to learn how to read and have integrity in your articles and redact and fix them.

Is this that confusing? look at the lines........

The design means nothing to this debate.

An Mcm is more than one seperately made die put into the same package .............. full stop thats the fact

Thats all but please feel free to argue about random shit to my ignore hand as much as you like.

HopelesslyFaithful · Jul 6, 2017

theoneandonlymrk said:
The design means nothing to this debate.

An Mcm is more than one seperately made die put into the same package .............. full stop thats the fact

Thats all but please feel free to argue about random shit to my ignore hand as much as you like.

he was still factually false in multiple places in that terribly written article....full stop:

Stop scapegoating for him.

CCX is 4 core CPU. 100% fact. Not an 8 core CPU.

newtekie1 · Jul 8, 2017

jabbadap said:
And even earlier than that, Pentium Pro from 1995.

Good call! I was thinking of only chips that used the same chip multiple times. But I forgot the Pentium Pro had a separate L2 chip on the package..

But then again, if we think of it like that, wouldn't anything with HBM be considered a MCM?

bug · Jul 8, 2017

newtekie1 said:
Good call! I was thinking of only chips that used the same chip multiple times. But I forgot the Pentium Pro had a separate L2 chip on the package..

But then again, if we think of it like that, wouldn't anything with HBM be considered a MCM?

Well, it is not called Multi-SameChip-Module...

ratirt · Jul 10, 2017

I wonder guys if you seen this?

This link is a presentation of multi GPU's and AMD's plan. Well I suggest you all to watch the entire video to understand the situtation and all the information given since this brings also a conclusion . This video explains a lot with the multi GPU's and now we have Nvidia's MCM tech which has been announced officially not so long ago in terms of GPU's. Wonder what do you think about it? Is this an AMD plan indeed and Nvidia sees this and has to adjust or it's not related? Well maybe there's some truth in this and we see it happing now.

System Name	CUBE_NXT
Processor	i9 12900K @ 5.0Ghz all P-cores with E-cores enabled
Motherboard	Gigabyte Z690 Aorus Master
Cooling	EK AIO Elite Cooler w/ 3 Phanteks T30 fans
Memory	64GB DDR5 @ 5600Mhz
Video Card(s)	EVGA 3090Ti Ultra Hybrid Gaming w/ 3 Phanteks T30 fans
Storage	1 x SK Hynix P41 Platinum 1TB, 1 x 2TB, 1 x WD_BLACK SN850 2TB, 1 x WD_RED SN700 4TB
Display(s)	Alienware AW3418DW
Case	Lian-Li O11 Dynamic Evo w/ 3 Phanteks T30 fans
Power Supply	Seasonic PRIME 1000W Titanium
Software	Windows 11 Pro 64-bit

Processor	AMD Ryzen 7 7800X3D
Motherboard	ASUS TUF x670e-Plus Wifi
Cooling	EK AIO 360. Phantek T30 fans.
Memory	32GB G.Skill 6000Mhz
Video Card(s)	Asus RTX 4090
Storage	WD/Samsung m.2's
Display(s)	LG C2 Evo OLED 42"
Case	Lian Li PC 011 Dynamic Evo
Audio Device(s)	Topping E70 DAC, SMSL SP200 Amp, Adam Audio T5V's, Hifiman Sundara's.
Power Supply	FSP Hydro Ti PRO 1000W
Mouse	Razer Basilisk V3 Pro
Keyboard	Epomaker 84 key
Software	Windows 11 Pro

Processor	AMD Ryzen 9 5900X \|\|\| Intel Core i7-3930K
Motherboard	ASUS ProArt B550-CREATOR \|\|\| Asus P9X79 WS
Cooling	Noctua NH-U14S \|\|\| Be Quiet Pure Rock
Memory	Crucial 2 x 16 GB 3200 MHz \|\|\| Corsair 8 x 8 GB 1333 MHz
Video Card(s)	MSI GTX 1060 3GB \|\|\| MSI GTX 680 4GB
Storage	Samsung 970 PRO 512 GB + 1 TB \|\|\| Intel 545s 512 GB + 256 GB
Display(s)	Asus ROG Swift PG278QR 27" \|\|\| Eizo EV2416W 24"
Case	Fractal Design Define 7 XL x 2
Audio Device(s)	Cambridge Audio DacMagic Plus
Power Supply	Seasonic Focus PX-850 x 2
Mouse	Razer Abyssus
Keyboard	CM Storm QuickFire XT
Software	Ubuntu

Processor	Ryzen 7800X3D
Motherboard	ASRock X670E Taichi
Cooling	Noctua NH-D15 Chromax
Memory	32GB DDR5 6000 CL30
Video Card(s)	MSI RTX 4090 Trio
Storage	P5800X 1.6TB 4x 15.36TB Micron 9300 Pro 4x WD Black 8TB M.2
Display(s)	Acer Predator XB3 27" 240 Hz
Case	Thermaltake Core X9
Audio Device(s)	JDS Element IV, DCA Aeon II
Power Supply	Seasonic Prime Titanium 850w
Mouse	PMM P-305
Keyboard	Wooting HE60
VR HMD	Valve Index
Software	Win 10

System Name	Dell Inspiron 15 7000 Series 7559 Laptop
Processor	Intel Core i7-6700HQ CPU @ 2.60GHz
Memory	16gb SODIMM
Video Card(s)	NVIDIA GeForce GTX 960M/Intel HD Graphics 530
Storage	500gb Samsung Evo 850
Display(s)	15.6 FHD
Audio Device(s)	Realtek Audio
Software	Windows 10 Creators update v1703 build 15063

NVIDIA Laying Groundwork for Multi-Chip-Module GPUs

Solidstate89

erocker

*

nem..

efikkan

evernessince

justimber

TheGuruStud

HopelesslyFaithful

Prima.Vera

Octopuss

TheoneandonlyMrK

bug

HopelesslyFaithful

FordGT90Concept

"I go fast!1!11!1!"

bug

Raevenlord

News Editor

bug

TheoneandonlyMrK

FordGT90Concept

"I go fast!1!11!1!"

HopelesslyFaithful

TheoneandonlyMrK

HopelesslyFaithful

newtekie1

Semi-Retired Folder

bug

ratirt

Processor	OCed 5800X3D
Motherboard	Asucks C6H
Cooling	Air
Memory	32GB
Video Card(s)	OCed 6800XT
Storage	NVMees
Display(s)	32" Dull curved 1440
Case	Freebie glass idk
Audio Device(s)	Sennheiser
Power Supply	Don't even remember

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	Ryzen 5800X
Motherboard	Asus TUF-Gaming B550-Plus
Cooling	Noctua NH-U14S
Memory	32GB G.Skill Trident Z Neo F4-3600C16D-32GTZNC
Video Card(s)	Sapphire AMD Radeon RX 7900 XTX Nitro+
Storage	HP EX950 512GB + Samsung 970 PRO 1TB
Display(s)	Cooler Master GP27Q
Case	Fractal Design Define R6 Black
Audio Device(s)	Creative Sound Blaster AE-5
Power Supply	Seasonic PRIME Ultra 650W Gold
Mouse	Roccat Kone AIMO Remastered
Software	Windows 10 x64

System Name	RyzenGtEvo/ Asus strix scar II
Processor	Amd R5 5900X/ Intel 8750H
Motherboard	Crosshair hero8 impact/Asus
Cooling	360EK extreme rad+ 360$EK slim all push, cpu ek suprim Gpu full cover all EK
Memory	Gskill Trident Z 3900cas18 32Gb in four sticks./16Gb/16GB
Video Card(s)	Asus tuf RX7900XT /Rtx 2060
Storage	Silicon power 2TB nvme/8Tb external/1Tb samsung Evo nvme 2Tb sata ssd/1Tb nvme
Display(s)	Samsung UAE28"850R 4k freesync.dell shiter
Case	Lianli 011 dynamic/strix scar2
Audio Device(s)	Xfi creative 7.1 on board ,Yamaha dts av setup, corsair void pro headset
Power Supply	corsair 1200Hxi/Asus stock
Mouse	Roccat Kova/ Logitech G wireless
Keyboard	Roccat Aimo 120
VR HMD	Oculus rift
Software	Win 10 Pro
Benchmark Scores	laptop Timespy 6506

Processor	Intel i5-12600k
Motherboard	Asus H670 TUF
Cooling	Arctic Freezer 34
Memory	2x16GB DDR4 3600 G.Skill Ripjaws V
Video Card(s)	EVGA GTX 1060 SC
Storage	500GB Samsung 970 EVO, 500GB Samsung 850 EVO, 1TB Crucial MX300 and 2TB Crucial MX500
Display(s)	Dell U3219Q + HP ZR24w
Case	Raijintek Thetis
Audio Device(s)	Audioquest Dragonfly Red :D
Power Supply	Seasonic 620W M12
Mouse	Logitech G502 Proteus Core
Keyboard	G.Skill KM780R
Software	Arch Linux + Win10

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.

System Name	The Ryzening
Processor	AMD Ryzen 9 5900X
Motherboard	MSI X570 MAG TOMAHAWK
Cooling	Lian Li Galahad 360mm AIO
Memory	32 GB G.Skill Trident Z F4-3733 (4x 8 GB)
Video Card(s)	Gigabyte RTX 3070 Ti
Storage	Boot: Transcend MTE220S 2TB, Kintson A2000 1TB, Seagate Firewolf Pro 14 TB
Display(s)	Acer Nitro VG270UP (1440p 144 Hz IPS)
Case	Lian Li O11DX Dynamic White
Audio Device(s)	iFi Audio Zen DAC
Power Supply	Seasonic Focus+ 750 W
Mouse	Cooler Master Masterkeys Lite L
Keyboard	Cooler Master Masterkeys Lite L
Software	Windows 10 x64

Processor	Intel Core i7 10850K@5.2GHz
Motherboard	AsRock Z470 Taichi
Cooling	Corsair H115i Pro w/ Noctua NF-A14 Fans
Memory	32GB DDR4-3600
Video Card(s)	RTX 2070 Super
Storage	500GB SX8200 Pro + 8TB with 1TB SSD Cache
Display(s)	Acer Nitro VG280K 4K 28"
Case	Fractal Design Define S
Audio Device(s)	Onboard is good enough for me
Power Supply	eVGA SuperNOVA 1000w G3
Software	Windows 10 Pro x64

System Name	Bro2
Processor	Ryzen 5800X
Motherboard	Gigabyte X570 Aorus Elite
Cooling	Corsair h115i pro rgb
Memory	32GB G.Skill Flare X 3200 CL14 @3800Mhz CL16
Video Card(s)	Powercolor 6900 XT Red Devil 1.1v@2400Mhz
Storage	M.2 Samsung 970 Evo Plus 500MB/ Samsung 860 Evo 1TB
Display(s)	LG 27UD69 UHD / LG 27GN950
Case	Fractal Design G
Audio Device(s)	Realtec 5.1
Power Supply	Seasonic 750W GOLD
Mouse	Logitech G402
Keyboard	Logitech slim
Software	Windows 10 64 bit