Future AMD GPU Architecture to Implement BFloat16 Hardware

btarunr · Oct 22, 2019

A future AMD graphics architecture could implement BFloat16 floating point capability on the silicon. Updates to AMD's ROCm libraries on GitHub dropped a big hint as to the company implementing the compute standard, which has significant advantages over FP16 that's implemented by current-gen AMD GPUs. BFloat16 offers a significantly higher range than FP16, which caps out at just 6.55 x 10^4, forcing certain AI researchers to "fallback" to the relatively inefficient FP32 math hardware. BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits. BFloat16 is more resilient to overflow and underflow in conversions to FP32 than FP16 is, since BFloat16 is essentially a truncated FP32. The addition of BFloat16 is more of a "future-proofing" measure by AMD. Atomic operations in modern 3D game rendering are unlikely to benefit from BFloat16 in comparison to FP16. BFloat16, however, will pay huge dividends to the AI machine-learning community.

View at TechPowerUp Main Site

londiste · Oct 22, 2019

Earlier this year Intel also said something about adding BFloat16 (in Cooper Lake - Ice Lake based Xeon in 2020):

Intel Architecture Manual Updates: bfloat16 for Cooper Lake Xeon Scalable Only?

www.anandtech.com

Hyderz · Oct 22, 2019

i read the title future amd gpu architecture to implement bloatware ...

Mysteoa · Oct 22, 2019

Hyderz said:
i read the title future amd gpu architecture to implement bloatware ...

I liked this.

ChosenName · Oct 22, 2019

Makes sense if range is more important than resolution.

Ahhzz · Oct 22, 2019

Guys. Topic. Find it or take a break.

Hardware Geek · Oct 22, 2019

I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?

Casecutter · Oct 23, 2019

Nice to know the only antagonists... hold a place of honor :mad:

Prima.Vera · Oct 23, 2019

Is there a plain English translation and explanation of this article please?

MazeFrame · Oct 23, 2019

Hardware Geek said:
I don't think Nvidia has announced support for bfloat16 yet, but considering both Arm and Intel have announced they will be including it, it's just a matter of time before Nvidia announces support as well.
The real question is when will it be implemented?

Nvidia is big GPUs with CUDA and making their entry with Tensor cores. So maybe they feel safe with what they got?

londiste · Oct 23, 2019

Prima.Vera said:
Is there a plain English translation and explanation of this article please?

The topic does not lend well to easy explanations.

bfloat16 floating-point format - Wikipedia

en.wikipedia.org

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article

JohnWal · Oct 23, 2019

londiste said:
The topic does not lend well to easy explanations.

bfloat16 floating-point format - Wikipedia

en.wikipedia.org

tl;dr
Same number of exponent bits as float32 but reduced amount of mantissa bits. This means same range but reduced resolution/precision.
Benefits are easier conversion from bfloat16 to float32 due to same exponent length, less area cost of implementing in hardware compared to float32 (and supposedly compared to float16) etc.
All this is mainly useful for machine learning stuff at this point basically because that does not have high precision requirements.

Edit:
And reading it again, I am just mostly just rephrasing the article

AMD are supporting in GPU hardware a new numerical datatype that trades precision for increased performance. A number of common Machine Learning algorithms do not actually require high-precision math to be effective so use of bfloat16 enables increased performance by essentially allowing twice the numerical calculation thought put as compared to float32. Intel already have support for bfloat16 as part of the upcoming Cooper Lake generation of processors and Google use it as part of their Tensor Processing Units. At some point bfloat16 will appear in AMD CPU's as well. A nice article is available at: https://www.nextplatform.com/2019/07/15/intel-prepares-to-graft-googles-bfloat16-onto-processors/

FordGT90Concept · Oct 23, 2019

Prima.Vera said:
Is there a plain English translation and explanation of this article please?

This:

btarunr said:
BFloat16 uses three fewer significand bits than FP16 (8 bits versus 11 bits), offering 8 exponent bits, while FP16 only offers 5 bits.

Significand are like basic number such as 123456.
Exponent are 10^123456 power.

Bfloat16 gives you more precision (11 bit significand) in your value but less range (5 bit exponent) than float16 (8 bit significand and 8 bit exponent). Said differently: Bfloat16 is better for values close to zero than float16 is.

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	ASUS ROG Strix B450-E Gaming
Cooling	DeepCool Gammax L240 V2
Memory	2x 8GB G.Skill Sniper X
Video Card(s)	Palit GeForce RTX 2080 SUPER GameRock
Storage	Western Digital Black NVMe 512GB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Custom
Processor	i9 9900k
Motherboard	Gigabyte Z390 arous master
Cooling	corsair h150i
Memory	4x8 3200mhz corsair
Video Card(s)	Galax RTX 3090 EX Gamer White OC
Storage	500gb Samsung 970 Evo PLus
Display(s)	MSi MAG341CQ
Case	Lian Li Pc-011 Dynamic
Audio Device(s)	Arctis Pro Wireless
Power Supply	850w Seasonic Focus Platinum
Mouse	Logitech G403
Keyboard	Logitech G110

System Name	OrangeHaze / Silence
Processor	i7-13700KF / i5-10400 /
Motherboard	ROG STRIX Z690-E / MSI Z490 A-Pro Motherboard
Cooling	Corsair H75 / TT ToughAir 510
Memory	64Gb GSkill Trident Z5 / 32GB Team Dark Za 3600
Video Card(s)	Palit GeForce RTX 2070 / Sapphire R9 290 Vapor-X 4Gb
Storage	Hynix Plat P41 2Tb\Samsung MZVL21 1Tb / Samsung 980 Pro 1Tb
Display(s)	22" Dell Wide/24" Asus
Case	Lian Li PC-101 ATX custom mod / Antec Lanboy Air Black & Blue
Audio Device(s)	SB Audigy 7.1
Power Supply	Corsair Enthusiast TX750
Mouse	Logitech G502 Lightspeed Wireless / Logitech G502 Proteus Spectrum
Keyboard	K68 RGB — CHERRY® MX Red
Software	Win10 Pro \ RIP:Win 7 Ult 64 bit

System Name	Linotosh
Processor	Dual 800mhz G4
Cooling	Air
Memory	1.5 GB

Future AMD GPU Architecture to Implement BFloat16 Hardware

btarunr

Editor & Senior Moderator

londiste

Intel Architecture Manual Updates: bfloat16 for Cooper Lake Xeon Scalable Only?

Hyderz

Mysteoa

ChosenName

Ahhzz

Super Moderator

Hardware Geek

Casecutter

Prima.Vera

MazeFrame

londiste

bfloat16 floating-point format - Wikipedia

JohnWal

New Member

bfloat16 floating-point format - Wikipedia

FordGT90Concept

"I go fast!1!11!1!"

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

System Name	New Horyzen
Processor	Ryzen 7 1700X
Motherboard	ASRock Fatal1ty X370 Gaming K4
Cooling	Noctua U-12S
Memory	2x 8GB Corsair Vengance LPX 3200MHz
Video Card(s)	Sapphire 5700XT
Storage	A lot
Case	Modded Corsair Carbide 300R
Audio Device(s)	ESI Maya44 eX, Focusrite Scarlett 2i4 2nd, Samson Meteor, Mixer and Headphone Amp
Power Supply	SeaSonic M12II 750W
Mouse	Logitech G502
Keyboard	HyperX Alloy & Logitech G13
Software	All of it

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.