Tuesday, April 5th 2016

NVIDIA Unveils the Tesla P100 HPC Board based on "Pascal" Architecture

NVIDIA unveiled the Tesla P100, the first product based on the company's "Pascal" GPU architecture. At its core is a swanky new multi-chip module, similar in its essential layout to the AMD "Fiji." A 15 billion-transistor GPU die sits on top of a silicon wafer, through which a 4096-bit wide HBM2 memory interface wires it to four 3D HBM2 stacks; and with the wafer sitting on the fiberglass substrate that's rooted into the PCB over a ball-grid array. With the GPU die, wafer, and memory dies put together, this package has a cumulative transistor count of 150 billion transistors. The GPU die is built on the 16 nm FinFET process, and is 600 mm² in area.

The P100 sits on top of a space-efficient PCB that looks less like a video card, and more like a compact module that can be tucked away into ultra-high density supercomputing cluster boxes, such as the new NVIDIA DGX-1. The P100 offers a double-precision (FP64) compute performance of 5.3 TFLOP/s, FP32 performance of 10.6 TFLOP/s, and FP16 performance of a whopping 21.2 TFLOP/s. The chip has registers as big as 14.2 MB, and an L2 cache of 4 MB. In addition to PCI-Express, each P100 chip will be equipped with NVLink, and in-house developed high-bandwidth interconnect by NVIDIA, with bandwidths as high as 80 GB/s per direction, 160 GB/s both directions. This allows extremely high-bandwidth paths between GPUs, so they could share memory and work more like single-GPUs. The P100 is already in volume production, with its target customers already having bought it all the way up to its OEM channel availability some time in Q1-2017.
Add your own comment

34 Comments on NVIDIA Unveils the Tesla P100 HPC Board based on "Pascal" Architecture

#1
the54thvoid
Super Intoxicated Moderator
But can it play Ashes of the Singularity?
Posted on Reply
#2
R0H1T
the54thvoidBut can it play Ashes of the Singularity?
Sure it can, it doesn't have Async compute though i.e. AMD style o_O
Posted on Reply
#3
xorbe
I was rolling my eyes at his over-statement of 150B transistors (vs 7) and the odds of working being 0% ... c'mon, now you're including memory die, that's not how gpu yield is computed.
Posted on Reply
#4
iO
So Q2-2017 for a Titan and Q3 for a less stupidly priced card..?
Posted on Reply
#5
R0H1T
iOSo Q2-2017 for a Titan and Q3 for a less stupidly priced card..?
Q3 or Q4 depending on how well the titan sells, they'll milk it for as long as they can, & Vega as well.
Posted on Reply
#6
Steevo
"Five Miracles"


I threw up a little. Why do they have to be so.........
Posted on Reply
#7
qubit
Overclocked quantum bit
the54thvoidBut can it play Ashes of the Singularity?
Indeed it will, but not Crysis, unfortunately. :p

I'm looking forward to purchasing this top card at the affordable price that NVIDIA are known to price at and bundled with a free AAA game. :laugh:
Posted on Reply
#8
FordGT90Concept
"I go fast!1!11!1!"
So...paper launch...note all of the pictures are CAD renders. If they don't have a picture of an actual product to show, why bother?
xorbeI was rolling my eyes at his over-statement of 150B transistors (vs 7) and the odds of working being 0% ... c'mon, now you're including memory die, that's not how gpu yield is computed.
Fiji has 8.9 billion. Pascal and Polaris having ~15 billion is likely. You can thank 28nm -> 14/16nm for that.

The 150 billion figure is either a typo or they're counting all of the transistors in the HBM too.
Posted on Reply
#9
W1zzard
FordGT90ConceptThe 150 billion figure is either a typo or they're counting all of the transistors in the HBM too.
They do, the GPU alone is 15B
Posted on Reply
#10
HumanSmoke
FordGT90ConceptIf they don't have a picture of an actual product to show, why bother?
Because the products will launch before the next GTC ?
After all, AMD did exactly the same thing with Vega at the Capsizing Capsaicin presentation, and Intel formally unveiled Omni-Path well over a year before any interconnect hardware was sighted.
Posted on Reply
#12
HumanSmoke
silentbogoThere are some more specs for P100 published on NVidia website
devblogs.nvidia.com/parallelforall/inside-pascal/

Pretty sure, that in a few weeks, once some GTC videos make their way to Youtube, we'll get more details on what's going on + some marketing crap as usual )).
Hopefully Nvidia rework the layout for gaming consumers. The vast bulk of FP64 units would be better served being culled in favour of a higher ALU count.
Given the conservative clocks usually associated with professional GPGPU, a near 1500MHz boost clock for a 600mm^2 augers well for the 16nmFF+ process.
Posted on Reply
#13
medi01
the54thvoidBut can it play Ashes of the Singularity?
Erm have thay actually demoed an actual card?
Not even wooden one?
Posted on Reply
#14
dwade
the54thvoidBut can it play Ashes of the Singularity?
Not like that game is any good to begin with.

Total War: Warhammer is where it's at. DX 12 and ASYNC. And it looks like an actual game instead of a benchmark like AOS.
Posted on Reply
#15
Fluffmeister
A big chip and already very healthy clock speeds, bodes very well for the mid-tier consumer part coming our way.

Anyway the HPC market is gonna lap this badboy up.
Posted on Reply
#16
TheGuruStud
We're totally serious, we're making these chips like hotcakes. We knows chips. We make the best chips. Everyone says so. People always ask us, "Nvidia, where can I get your chips?" We tell them it's in volume production, but you can't get it for 11 months LOL


Nvidia sure is good at terrible lying.
Posted on Reply
#17
HumanSmoke
TheGuruStudWe tell them it's in volume production, but you can't get it for 11 months LOL
Wasn't the takeaway that Nvidia was prioritizing for the DGX-1 and fulfilling orders for supers (such as Cobalt) for HPC contracts?
Nvidia says it's producing Tesla P100s in volume today. The company says it's devoting its entire production of Tesla P100 cards (and presumably the GP100 GPU) to its DGX-1 high-density HPC node systems and HPC servers from IBM, Dell, and Cray.
[Source]
TheGuruStudNvidia sure is good at terrible lying.
You'll be providing proof that these contracts have been shelved or deferred? Thought not. Nice to see you remain committed to such a low standard of knee-jerk reactionary trolling though.

I'm guessing that production takes into account die size/yield, HBM2 availability, and assembly yield - which are three areas that Nvidia would have virtually no control over - just as AMD had to grin and bear it with the slow Fiji ramp.
Posted on Reply
#18
TheGuruStud
HumanSmokeWasn't the takeaway that Nvidia was prioritizing for the DGX-1 and fulfilling orders for supers (such as Cobalt) for HPC contracts?

[Source]

You'll be providing proof that these contracts have been shelved or deferred? Thought not. Nice to see you remain committed to such a low standard of knee-jerk reactionary trolling though.

I'm guessing that production takes into account die size/yield, HBM2 availability, and assembly yield - which are three areas that Nvidia would have virtually no control over - just as AMD had to grin and bear it with the slow Fiji ramp.
Bingo. They have no control means they have no volume production. How many chips can possibly be going to HPC? Not that many, but when you DO NOT have any volume, then of course they will take all the orders.
Posted on Reply
#19
HumanSmoke
TheGuruStudBingo. They have no control means they have no volume production. How many chips can possibly be going to HPC? Not that many, but when you DO NOT have any volume, then of course they will take all the orders.
The only person here stating that Nvidia is in volume production status here is you. Nvidia just stated that production had begun and chip packages were prioritized for pre-existing contracts and what looks to be a money-spinning deep learning system.
You are the one making hyperbolic claims on Nvidia's behalf. Gunning for Roy Taylor's job?
Posted on Reply
#20
TheGuruStud
HumanSmokeThe only person here stating that Nvidia is in volume production status here is you. Nvidia just stated that production had begun and chip packages were prioritized for pre-existing contracts and what looks to be a money-spinning deep learning system.
You are the one making hyperbolic claims on Nvidia's behalf. Gunning for Roy Taylor's job?
"The P100 is already in volume production..."

I wouldn't expect any high end gaming cards until late fall with this kind of ridiculous spin going on for PR.
Posted on Reply
#21
HumanSmoke
TheGuruStud"The P100 is already in volume production..."
I wouldn't expect any high end gaming cards until late fall with this kind of ridiculous spin going on for PR.
The announcement has nothing to do with gaming cards. The actual quote was that the company were producing the P100 in volume - as opposed to risk/QA production. bta's reinterpretation of the statement introduces some ambiguity. As for professional cards, volume tends to mean something much less than consumer cards - not rocket science.


Again...only you seem to be inferring that Nvidia is claiming volume production suitable for gaming cards. The company made it quite clear that wider availability of GP100 wouldn't happen until Q1 2017.
Posted on Reply
#22
btarunr
Editor & Senior Moderator
FordGT90ConceptSo...paper launch...note all of the pictures are CAD renders. If they don't have a picture of an actual product to show, why bother?


Fiji has 8.9 billion. Pascal and Polaris having ~15 billion is likely. You can thank 28nm -> 14/16nm for that.

The 150 billion figure is either a typo or they're counting all of the transistors in the HBM too.
As I've mentioned clearly in the post, the GPU die has 15 billion transistors, and NV arrived at 150 billion by adding up every transistor on the package (GPU die + wafer/interposer + 4 x 4-die HBM2 memory stacks). A DRAM die is an ocean of transistors. NVIDIA has 16 of them. If AMD did the same math for "Fiji," it probably would have arrived at a similar "several dozen billion" transistor count. They stuck to an honest 8.9 billion, counting only the GPU die.
Posted on Reply
#23
silentbogo
NVidia started to upload GTC 2016 videos on youtube.
Let's see what Jen-Hsun Huang has to say:

Posted on Reply
#24
medi01
Also note the "5 miracles" in Tesla P100. Although, I might be confusing which product would have the miracles (or even be a miracle). Anyhow, most miraculous NV launch (and with some volume production too!) so far. And it's not that expensive too, only 129'000$.

No clue if it is faster than Tesla S.
TheGuruStud"The P100 is already in volume production..."
"Is white a color?"
"Yes"
"Is black a color?"
"Yes"
"Told ya, I've sold you a color TV"

(A joke from times where there actually were black & white TVs)
Posted on Reply
#25
TheoneandonlyMrK
btarunrNVIDIA unveiled the Tesla P100, the first product based on the company's "Pascal" GPU architecture. At its core is a swanky new multi-chip module, similar in its essential layout to the AMD "Fiji." A 15 billion-transistor GPU die sits on top of a silicon wafer, through which a 4096-bit wide HBM2 memory interface wires it to four 3D HBM2 stacks; and with the wafer sitting on the fiberglass substrate that's rooted into the PCB over a ball-grid array. With the GPU die, wafer, and memory dies put together, this package has a cumulative transistor count of 150 billion transistors. The GPU die is built on the 16 nm FinFET process, and is 600 mm² in area.

The P100 sits on top of a space-efficient PCB that looks less like a video card, and more like a compact module that can be tucked away into ultra-high density supercomputing cluster boxes, such as the new NVIDIA DGX-1. The P100 offers a double-precision (FP64) compute performance of 5.3 TFLOP/s, FP32 performance of 10.6 TFLOP/s, and FP16 performance of a whopping 21.2 TFLOP/s. The chip has registers as big as 14.2 MB, and an L2 cache of 4 MB. In addition to PCI-Express, each P100 chip will be equipped with NVLink, and in-house developed high-bandwidth interconnect by NVIDIA, with bandwidths as high as 80 GB/s per direction, 160 GB/s both directions. This allows extremely high-bandwidth paths between GPUs, so they could share memory and work more like single-GPUs. The P100 is already in volume production, with its target customers already having bought it all the way up to its OEM channel availability some time in Q1-2017.

So what's its Tdp then Bta, I've seen a 300 watt Tdp quoted elsewhere,which seams to be ok if a bit higher than I personally expected , 8 of these in one (Gpx1)box must use quite an impressive cooling solution.
Posted on Reply
Add your own comment
Nov 12th, 2024 19:19 EST change timezone

New Forum Posts

Popular Reviews

Controversial News Posts