Monday, October 2nd 2023

NVIDIA Chief Scientist Reaffirms Huang's Law

In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore's law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems engineers. That's radically different from a generation ago, when engineers essentially relied on the physics of ever smaller, faster chips.

The team of more than 300 that Dally leads at NVIDIA Research helped deliver a whopping 1,000x improvement in single GPU performance on AI inference over the past decade (see chart below). It's an astounding increase that IEEE Spectrum was the first to dub "Huang's Law" after NVIDIA founder and CEO Jensen Huang. The label was later popularized by a column in the Wall Street Journal.
The advance was a response to the equally phenomenal rise of large language models used for generative AI that are growing by an order of magnitude every year. "That's been setting the pace for us in the hardware industry because we feel we have to provide for this demand," Dally said.

In his talk, Dally detailed the elements that drove the 1,000x gain. The largest of all, a sixteen-fold gain, came from finding simpler ways to represent the numbers computers use to make their calculations.


The New Math
The latest NVIDIA Hopper architecture with its Transformer Engine uses a dynamic mix of eight- and 16-bit floating point and integer math. It's tailored to the needs of today's generative AI models. Dally detailed both the performance gains and the energy savings the new math delivers.

Separately, his team helped achieve a 12.5x leap by crafting advanced instructions that tell the GPU how to organize its work. These complex commands help execute more work with less energy. As a result, computers can be "as efficient as dedicated accelerators, but retain all the programmability of GPUs," he said.

In addition, the NVIDIA Ampere architecture added structural sparsity, an innovative way to simplify the weights in AI models without compromising the model's accuracy. The technique brought another 2x performance increase and promises future advances, too, he said. Dally described how NVLink interconnects between GPUs in a system and NVIDIA networking among systems compound the 1,000x gains in single GPU performance.

No Free Lunch
Though NVIDIA migrated GPUs from 28 nm to 5 nm semiconductor nodes over the decade, that technology only accounted for 2.5x of the total gains, Dally noted.

That's a huge change from computer design a generation ago under Moore's law, an observation that performance should double every two years as chips become ever smaller and faster.

Those gains were described in part by Denard scaling, essentially a physics formula defined in a 1974 paper co-authored by IBM scientist Robert Denard. Unfortunately, the physics of shrinking hit natural limits such as the amount of heat the ever smaller and faster devices could tolerate.

An Upbeat Outlook
Dally expressed confidence that Huang's law will continue despite diminishing gains from Moore's law.

For example, he outlined several opportunities for future advances in further simplifying how numbers are represented, creating more sparsity in AI models and designing better memory and communications circuits.

Because each new chip and system generation demands new innovations, "it's a fun time to be a computer engineer," he said.

Dally believes the new dynamic in computer design is giving NVIDIA's engineers the three opportunities they desire most: to be part of a winning team, to work with smart people and to work on designs that have impact.
Source: NVIDIA Blog
Add your own comment

41 Comments on NVIDIA Chief Scientist Reaffirms Huang's Law

#1
Solaris17
Super Dainty Moderator
Imagine saying that publicly to get a pay raise. Yikes.

what’s the law? I didn’t see the quote.
Posted on Reply
#3
wNotyarD
Solaris17Imagine saying that publicly to get a pay raise. Yikes.

what’s the law? I didn’t see the quote.
The more you buy, the more you save.
Posted on Reply
#4
SJZL 2.0
The most important part is that they basically optimized GPU math to be faster in Hopper.

About replacing Moore's law. I'd personally look forward to them researching using graphene instead of silicon. We still need faster metal eventually.
Posted on Reply
#5
TheoneandonlyMrK
Man says Mores law is dead.

Man says it's my law now but it's a secret.

I say f#@£ o££ you T#@7.

Your law is shite you egotistical Muppet.
Posted on Reply
#6
Vayra86
wNotyarDThe more you buy, the more leather jackets we save.
This is it, redacted for accuracy.
Posted on Reply
#7
mechtech
Hmmmm
“. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems engineers. That's radically different from a generation ago, when engineers essentially relied on the physics of ever smaller, faster chips.”

I look at the graph and the 1000x performance.
Hmmmm
I think a better graph would be performance per transistor normalized to a specific frequency to highlight actual “ingenuity/architectural “ improvements.
Posted on Reply
#8
TheoneandonlyMrK
mechtechHmmmm
“. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems engineers. That's radically different from a generation ago, when engineers essentially relied on the physics of ever smaller, faster chips.”

I look at the graph and the 1000x performance.
Hmmmm
I think a better graph would be performance per transistor normalized to a specific frequency to highlight actual “ingenuity/architectural “ improvements.
He started at nothing, AI started at nothing on GPU.

The gains are initial IMHO so OF course exponential.

You talk new ingredients, such as.

Making an ASIC any ASIC is a balance of area and features, verses cost, they hype up Tesla cores but in reality most of these things are a evolution of something pre existing.

All that's made now in silicon is built on the shoulders of giants, who already did the ingenuity bit.

Again IMHO
Posted on Reply
#9
mechtech
TheoneandonlyMrKHe started at nothing, AI started at nothing on GPU.

The gains are initial IMHO so OF course exponential.

You talk new ingredients, such as.

Making an ASIC any ASIC is a balance of area and features, verses cost, they hype up Tesla cores but in reality most of these things are a evolution of something pre existing.

All that's made now in silicon is built on the shoulders of giants, who already did the ingenuity bit.

Again IMHO
Agreed. Just pointing out their 1000x doesn’t seem to highlight all gains from the said variables (independently)
Posted on Reply
#10
Dr. Dro
Nvidia hatred circlejerk aside, Huang's law has proven true thus far. And by this I do not mean exclusively when talking about GPU compute or AI performance, the same is valid in the gaming realm as well. Most games today are entirely CPU-limited when paired with the NVIDIA AD102 (in its cutdown form present in the RTX 4090), even when you are talking about processors such as the i9-13900KS and the Ryzen 9 7950X3D.
mechtechAgreed. Just pointing out their 1000x doesn’t seem to highlight all gains from the said variables (independently)
1000x is an easily marketable number which is conveniently related to the trendy AI thing, but GPUs have gotten much more than 10 times faster in the past 10 years. CPUs didn't.
Posted on Reply
#11
trsttte
SJZL 2.0About replacing Moore's law. I'd personally look forward to them researching using graphene instead of silicon. We still need faster metal eventually
Photonics might be it.
Dr. DroNvidia hatred circlejerk aside, Huang's law has proven true thus far
No it hasn't, mainly because Huang's law doesn't mean shit. He's trying to correlate things that make no sense. If you build ever bigger, more power hungry and more expensive processors of course you're going to grow faster than the CPU market that has kept it's costs more or less constant, probably even slightly bellow what inflation would warrant. Not to mention the focus area of development, inference wasn't a priority in the past, now we're building processors dedicated to it with fixed function hardware to accelerate specific workloads, surprise surprise the performance increase is exponential. SHOCKER!!!

A lot of the so called AI that's becoming popular today is using variations of the transformer model Google open sourced in 2017, nvidia now made a chip optimized for that model and is claiming an insane performance increase, very cool, let's wait then to measure this when someone decides to use a different model and see how their new "law" gets completely crushed. Or even simpler, let's give this fixed function hardware a couple generations and see how their performance improves.

Huang's so called law was a stupid market ploy to try to justify the rapid increase in prices and hopefully it continues to be ridiculled for the bullshit that it is, no matter how many times they try to bring it up and claim it's a thing.
Posted on Reply
#12
Denver
Dr. DroNvidia hatred circlejerk aside, Huang's law has proven true thus far. And by this I do not mean exclusively when talking about GPU compute or AI performance, the same is valid in the gaming realm as well. Most games today are entirely CPU-limited when paired with the NVIDIA AD102 (in its cutdown form present in the RTX 4090), even when you are talking about processors such as the i9-13900KS and the Ryzen 9 7950X3D.



1000x is an easily marketable number which is conveniently related to the trendy AI thing, but GPUs have gotten much more than 10 times faster in the past 10 years. CPUs didn't.
In 10 years you went from the i7 4770k to the i9 13900k, in MT performance you have almost 10x more raw power depending on the application. On the AMD side, the gains would be even greater, because we would be comparing FX 8xxx vs Ryzen 7950x.

In GPUs the gain was at a similar level, but it is much easier to notice this advance in the form of fps in games, most will not notice or take advantage of the raw power we have today in CPUs.

And... Can anyone explain what Huang's law is? "The more you buy, the more you save"? "It just works"? :p
Posted on Reply
#14
Wirko
wNotyarDThe more you buy, the more you save.
Needs correction. "We will have more to sell to you. Start saving now."
Posted on Reply
#15
Denver
claesHere’s some fodder for all of you
en.m.wikipedia.org/wiki/Huang's_law
There has been criticism. Journalist Joel Hruska writing in ExtremeTech in 2020 said "there is no such thing as Huang's Law", calling it an "illusion" that rests on the gains made possible by Moore's Law; and that it is too soon to determine a law exists.[9] The research nonprofit Epoch has found that, between 2006 and 2021, GPU price performance (in terms of FLOPS/$) has tended to double approximately every 2.5 years, much slower than predicted by Huang's law.


Meh, It's a comparison of oranges vs. apples. But in short, it seems like it's just a fragile, bland and generic theory propagated by some speculative journalist based on a speech Huang made a few years ago. Well, we will put this theory to the test in the coming years as lithographs are struggling to advance. My bet is GPUs with a TDP of 1000w+ next year.
Posted on Reply
#16
SJZL 2.0
trstttePhotonics might be it.
Photonics are much harder to apply to consumer computers. Data conversions from light to energy and back will happen due to being in a world where all computers use electricity will mean performance compensations. Also, light has different physics than electrons so R&D will have a very hard time having photonics to function identical to electronics and again compensations will have to be made in a way that we may let down such expectations.

In graphene electronics, we are using the same type energy in a material more fit for handling electricity. We may see less watt consumption and higher clocks.
Posted on Reply
#17
Wye
They are ridding the hype wave and patting themselves on the back on how "genius" they are.
Well, the "geniuses" will get fired when the bubble pops.
Posted on Reply
#18
photonboy
SJZL 2.0Photonics are much harder to apply to consumer computers. Data conversions from light to energy and back will happen due to being in a world where all computers use electricity will mean performance compensations. Also, light has different physics than electrons so R&D will have a very hard time having photonics to function identical to electronics and again compensations will have to be made in a way that we may let down such expectations.

In graphene electronics, we are using the same type energy in a material more fit for handling electricity. We may see less watt consumption and higher clocks.
But...
One of the main uses for PHOTONICS in certain electronic areas will simply be to talk between MODULES rather than run copper traces long distances. There's some discussion of using one or more in a desktop PC (i.e. talk direct to the I/O of the CPU from the Graphics card VRAM bus rather than through the traces on the motherboard). So for this usage very little changes in terms of CPU/GPU design. There are MANY uses of photonics in electronics. Some make a lot of sense. Some, are pointless. I would expect the traditional silicon TRANSISTOR approach to be around for decades to come, albeit with modifications such as carbon as you suggest, as well as a 3D approach to design but the basic on/off switch isn't going anywhere. "Photonic" transistors IMO don't make sense as a replacement for a CPU or GPU but probably make sense for simplified processing with few switches such as ROUTERS.
Posted on Reply
#19
Minus Infinity
So as long as you are doing AI work it's amazing. What about the improvements in fp64 for those doing real scientific calculations and not working on AI fakery? Huang's as fake as his frames.
Posted on Reply
#20
AnotherReader
Dr. DroNvidia hatred circlejerk aside, Huang's law has proven true thus far. And by this I do not mean exclusively when talking about GPU compute or AI performance, the same is valid in the gaming realm as well. Most games today are entirely CPU-limited when paired with the NVIDIA AD102 (in its cutdown form present in the RTX 4090), even when you are talking about processors such as the i9-13900KS and the Ryzen 9 7950X3D.



1000x is an easily marketable number which is conveniently related to the trendy AI thing, but GPUs have gotten much more than 10 times faster in the past 10 years. CPUs didn't.
For gaming, GPUs have gotten faster by about 10 times in the last 10 years. Ten years ago, the fastest GPUs were the 780 Ti and the 290X. The performance improvement from the 780 Ti to the 4090 at 4K is about 10 times. The table below uses TPU's reviews at 4K for the GTX 1060, 1080 Ti, RTX 3080, and RTX 4090 respectively.

GTX 780 Ti to GTX 970GTX 970 to GTX 1080 TiGTX 1080 Ti to RTX 3080RTX 3080 to RTX 4090
85/831/0.36100/53190/99


Multiplying all the speedups gives 10.3 which isn't too far off the multi-threaded performance increase for CPUs in that time. Anandtech's CPU bench can be used to compare the 4770k and the 7950X. There are common applications where the 7950X is as much as 9 times faster than the 4770K and these applications don't leverage any instructions unique to the newer processor such as AVX-512. I haven't used the 13900K because their database doesn't have numbers for any Intel CPUs faster than the 12900K.


Rather than blaming CPU designers, you should be asking game engine developers why their engines are unable to utilize these CPUs efficiently.

I'm saddened that Bill Dally is misrepresenting TSMC's contribution to these gains. The 28nm to 5 nm transition isn't worth only a 2.5 times increase in GPU resources. From the Titan X to AD102, clock speeds have increased by nearly 2.5 times and the GPU has 6 times more FP32 flops per clock. That is a 15 fold increase in compute solely related to the process. We shouldn't ignore the work done by Nvidia's engineers, but if we take his claim at face value, then a 28 nm 4090 would be only 2.5 times slower than the actual 4090 which is patently ridiculous.
Posted on Reply
#21
R0H1T
Dr. Drobut GPUs have gotten much more than 10 times faster in the past 10 years. CPUs didn't.
Nope CPU's have gone even higher, the top chip on desktops 10 years back was the 5960x(?) & this year(next?) we'll have a TR with probably up to 96 cores. And it's definitely more than 12x faster than Intel's best HEDT chips back then, even if you take the top server chips they now top out at 128c/256t for AMD. In fact you could argue that CPU's have progressed far more, in part of course due to the stagnation with *dozer & Intel deciding to milk quad cores for at least half a decade!

The top Ivy bridge Xeon chips topped out at 12 cores, so again vastly lower.
Posted on Reply
#22
stimpy88
The naked narcissism inside nGreedia must be absolutely awful to have to navigate through.
Posted on Reply
#23
Redwoodz
Should be called RTX Gamers Law since you're the ones that paid for it
Posted on Reply
#24
Prima.Vera
Huang's Law ??


The video with the clown's law:
Posted on Reply
#25
Dr. Dro
R0H1TNope CPU's have gone even higher, the top chip on desktops 10 years back was the 5960x(?) & this year(next?) we'll have a TR with probably up to 96 cores. And it's definitely more than 12x faster than Intel's best HEDT chips back then, even if you take the top server chips they now top out at 128c/256t for AMD. In fact you could argue that CPU's have progressed far more, in part of course due to the stagnation with *dozer & Intel deciding to milk quad cores for at least half a decade!

The top Ivy bridge Xeon chips topped out at 12 cores, so again vastly lower.
Threadripper Pro is not a desktop processor, Threadripper as a consumer grade CPU died with the 3990X.

But even if you account for the market niche and multi-die CPUs (which really are multiple CPUs in one package), I don't think IPC hasn't gone up a full 10x from Haswell to Raptor Cove (2013-2023). Operating frequencies increased greatly in the interim as well.

Core counts went from 18 (Haswell-EP) to basically around 128, so not a full 10x increase. IPC must have gone up around 6 times higher, and also an extra GHz on average, but I guess that's about it.

Might have if you compare Piledriver to Zen 4 but AMD CPUs were hopeless garbage until Ryzen came out. Could be worth looking at sometime with some real data, but we all remember how 1st gen Core i7 CPUs made sport of FX.

Still GPUs have easily outpaced this growth. GK110 to AD102 is one hell of a leap.
stimpy88The naked narcissism inside nGreedia must be absolutely awful to have to navigate through.
Ah, yes, I'm sure "nGreedia" engineers are just jumping at the opportunity to work at better companies, such as AMD, perhaps? :kookoo:
AnotherReaderFor gaming, GPUs have gotten faster by about 10 times in the last 10 years. Ten years ago, the fastest GPUs were the 780 Ti and the 290X. The performance improvement from the 780 Ti to the 4090 at 4K is about 10 times. The table below uses TPU's reviews at 4K for the GTX 1060, 1080 Ti, RTX 3080, and RTX 4090 respectively.

GTX 780 Ti to GTX 970GTX 970 to GTX 1080 TiGTX 1080 Ti to RTX 3080RTX 3080 to RTX 4090
85/831/0.36100/53190/99


Multiplying all the speedups gives 10.3 which isn't too far off the multi-threaded performance increase for CPUs in that time. Anandtech's CPU bench can be used to compare the 4770k and the 7950X. There are common applications where the 7950X is as much as 9 times faster than the 4770K and these applications don't leverage any instructions unique to the newer processor such as AVX-512. I haven't used the 13900K because their database doesn't have numbers for any Intel CPUs faster than the 12900K.


Rather than blaming CPU designers, you should be asking game engine developers why their engines are unable to utilize these CPUs efficiently.

I'm saddened that Bill Dally is misrepresenting TSMC's contribution to these gains. The 28nm to 5 nm transition isn't worth only a 2.5 times increase in GPU resources. From the Titan X to AD102, clock speeds have increased by nearly 2.5 times and the GPU has 6 times more FP32 flops per clock. That is a 15 fold increase in compute solely related to the process. We shouldn't ignore the work done by Nvidia's engineers, but if we take his claim at face value, then a 28 nm 4090 would be only 2.5 times slower than the actual 4090 which is patently ridiculous.
You're also accounting shipping products (and at a relatively low weight class) to normalize for performance, the comparison in progress should IMHO be done comparing fully enabled and endowed processors that are configured for their fullest performance, perhaps normalized for frequency to accurately measure improvements at an architectural level. We don't even have such a product available to the public for Ada Lovelace yet.
Posted on Reply
Add your own comment
May 21st, 2024 16:54 EDT change timezone

New Forum Posts

Popular Reviews

Controversial News Posts