Tuesday, October 22nd 2019

Future AMD GPU Architecture to Implement BFloat16 Hardware

Oct 22nd, 2019 03:20 Discuss (12 Comments)

A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.

Sources: ROCm (Github), dylan522p (Reddit), Dr Nick Higham

Add your own comment

12 Comments on Future AMD GPU Architecture to Implement BFloat16 Hardware

londiste

Earlier this year Intel also said something about adding BFloat16 (in Cooper Lake - Ice Lake based Xeon in 2020):
www.anandtech.com/show/14179/intel-manual-updates-bfloat16-for-cooper-lake-xeon-scalable-only

Hyderz

i read the title future amd gpu architecture to implement bloatware ...

Mysteoa

Hyderzi read the title future amd gpu architecture to implement bloatware ...

I liked this.

ChosenName

Makes sense if range is more important than resolution.

Ahhzz

Guys. Topic. Find it or take a break.

Hardware Geek

I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?

Casecutter

Nice to know the only antagonists... hold a place of honor :mad:

Prima.Vera

Is there a plain English translation and explanation of this article please?

MazeFrame

Hardware GeekI don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?

Nvidia is big GPUs with CUDA and making their entry with Tensor cores. So maybe they feel safe with what they got?

#10

londiste

Prima.VeraIs there a plain English translation and explanation of this article please?

The topic does not lend well to easy explanations.

en.wikipedia.org/wiki/Bfloat16_floating-point_format

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)

#11

JohnWal

londisteThe topic does not lend well to easy explanations.

en.wikipedia.org/wiki/Bfloat16_floating-point_format

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article :)

AMD are supporting in GPU hardware a new numerical datatype that trades precision for increased performance. A number of common Machine Learning algorithms do not actually require high-precision math to be effective so use of bfloat16 enables increased performance by essentially allowing twice the numerical calculation thought put as compared to float32. Intel already have support for bfloat16 as part of the upcoming Cooper Lake generation of processors and Google use it as part of their Tensor Processing Units. At some point bfloat16 will appear in AMD CPU's as well. A nice article is available at: www.nextplatform.com/2019/07/15/intel-prepares-to-graft-googles-bfloat16-onto-processors/

#12

FordGT90Concept

"I go fast!1!11!1!"

Prima.VeraIs there a plain English translation and explanation of this article please?

This:

btarunrBFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits.

Significand are like basic number such as 123456.
Exponent are 10^123456 power.

Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.

Add your own comment

Future AMD GPU Architecture to Implement BFloat16 Hardware

12 Comments on Future AMD GPU Architecture to Implement BFloat16 Hardware

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts

Future AMD GPU Architecture to Implement BFloat16 Hardware

Related News

12 Comments on Future AMD GPU Architecture to Implement BFloat16 Hardware

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts