Tuesday, October 22nd 2019
Future AMD GPU Architecture to Implement BFloat16 Hardware
A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.
Sources:
ROCm (Github), dylan522p (Reddit), Dr Nick Higham
12 Comments on Future AMD GPU Architecture to Implement BFloat16 Hardware
www.anandtech.com/show/14179/intel-manual-updates-bfloat16-for-cooper-lake-xeon-scalable-only
The real question is when will it be implemented?
en.wikipedia.org/wiki/Bfloat16_floating-point_format
tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.
Edit:
And reading it again, I am just mostly just rephrasing the article :)
Exponent are 10^123456 power.
Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.