AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

londiste · Feb 28, 2017

Steevo said:
The actual amount of data used is negligible over the PCIe bus, I suggest a read of the PCIe scaling article W1zz did, most graphics cards only use X4 lanes of 2.0 in actual bandwidth, more than that only gives a few percent (not frames per second) more performance, so 60FPS +/- 3% doesn't really mean much.

well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?

kn00tcn · Mar 1, 2017

Nabarun said:
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.

you enjoy stutters during heavy usage? this affects ALL gpus ALL sizes ALL performance levels

Prima.Vera · Mar 1, 2017

The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.

londiste · Mar 1, 2017

depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.

Steevo · Mar 1, 2017

londiste said:
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?

The VMEM bandwidth is used every time the GPU performs Anti-Alising and Anisotropic Filtering, so if I need 70% of the bandwidth to perform after render effects, that means only 30% is actually used for frame rendering, and a part of that is used to store the finished frames.

I have a tuner card in a PCIe X1 slot that sends 24FPS of 1080i plus audio to my GPU directly and then it gets up scaled in hardware, the actual HDMI bandwidth is much higher though. So why can't I run my GPU at PCIE x1?

It's all about where the bandwidth is used and when, and PCIE is over kill for Graphics cards as they are today.

jigar2speed · Mar 1, 2017

Nabarun said:
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.

Kid why do you smoke so much that you only expect AMD to give you everything at low cost. Why don't you ask Nvidia about it ? Oh wait they just lowered 40 dollars on GTX 1070, happy ?

W1zzard · Mar 1, 2017

Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.

R0H1T · Mar 1, 2017

acperience7 said:
I don't understand how lowering the amount the VRAM amount would help with FPS. Can anyone explain this?

This probably works the same way as the (pseudo) SLC cache on TLC drives, similar in the way that it speeds up the frequent VRAM ops lifting the min & avg fps.

R0H1T · Mar 1, 2017

RejZoR said:
The reason they lowered the VRAM availability is that they wanted to place Vega into worst possible situation. Situation where HBC really shows it's strength, situations when game VRAM usage goes beyond what you actually have on-board.

Only thing that I wonder about is if HBC can do the data management on its own or if it has to be specifically coded to use it. Because if it can be used out of the box with anything, it'll be awesome. But if you have to specifically code for it, then that's a problem by itself.

Well if there's anything like HBC in Scorpio or PS5 (whenever it's released) then there's a good chance this approach will be popular, even if some years down the line.

Super XP · Mar 2, 2017

Steevo said:
I wonder where I get my degree in keyboard engineering?

From the posts in this thread at the Nvidia school of fanboy!!

First we had people butthurt about all the AMD news since if you have to buy AMD you are obviously a piss poor peon that shouldn't have a computer, and now we have a lot of posts about a new technology from AMD and lots of hate tossed it way by salad tossers, with no syrup.

What some fail to understand is simply, AMD is an innovator. If they were not, they would have gone out of business.
They don't copy, they Innovate & Design. Taking chances, because they have no choice but to do such a thing. Patients paid off with Ryzen, and I can see similar success with Vega.

FYI for all criticizing HBM. GDDR5 or what every you call it is outdated. HBM is the way for the future IMO, and it gets better with each new version coming out.

FordGT90Concept · Mar 2, 2017

RejZoR said:
So, in an essence, AMD has expanded the cache hierarchy. We have L1 and L2 on GPU itself, L3 is basically VRAM (I'm not aware of L3 being used on GPU's unlike with CPU's or is it?) and now they've added L4 which is system RAM. All this is usually controlled by algorithm/prediction based prefetchers.

I mean, if this will be fully automatic without any need for special game code, it's gonna be nice and it's going to dramatically expand the usability of the graphic card over time as it ages and new demanding games come out with more memory needed to work. Sure it won't be as fast as having as much VRAm available at all times, but it won't be nearly as bad as running out of VRAM entirely. I know Win8/Win10 already does this to small extent, but I don't think not even nearly in such extent as VEGA will be doing it this.

I mean, with Vega, my 32GB of system RAM will finally find a very good use. Because for games, not even 16GB is really needed. Meaning other 16GB is idling to itself most of the time. But Vega will be able to use that. I like the idea very much.

Think Windows SuperFetch. It keeps assets in memory and removes them as the space is needed. Should the asset be required again, access to it will be much faster than having to pull it from a slower memory pool. It's really smart that they're doing this and it's kind of silly it hasn't been done yet.

W1zzard said:
Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.

So HBCC is intended to stand in when developers fail to do that, or anticipate incorrectly.

RejZoR · Mar 2, 2017

I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...

FordGT90Concept · Mar 2, 2017

Judging from that picture, HBCC is a memory manager that sits below the L2 that has access to the HBM (presumably HMB2 stacks), system RAM, NAND, and even network (clearly aimed at enterprise customers). It moves pages of memory closer to, and into the L2 that it anticipates being necessary and it removes pages from the L2 that are expired.

Would have to watch the presentation to be sure.

R0H1T · Mar 2, 2017

RejZoR said:
I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...

In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.

londiste · Mar 2, 2017

R0H1T said:
In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.

hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.

R0H1T · Mar 2, 2017

londiste said:
hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.

I know that, but what's the point of a cache if it isn't faster than the VRAM (HBM) since the game (engine) & the OS do a bit a caching via software itself. There are two theories in this very thread, it could be something like virtual memory/pagefile (so it has to be faster than normal VRAM operations) or it could be prefetch/superfetch in which case I'm concerned about the overall benefit.

londiste · Mar 2, 2017

it is not meant to cache vram, vram itself is meant to be a cache for memory/storage at the next level where ever that may be.

actually from what has been said the controller seems to be meant for caching vram as well but that is done with l2 cache.

FordGT90Concept · Mar 2, 2017

L2 is hugely faster than HBM. Likely the reason why they have no L3 cache is because of HBM's performance. Modern CPUs have an L3, some even an L4, because DDR3 was so slow compared to the L2. L4 went away (at least for now) because of the transition to DDR4.

My understanding is that what is unique to HBCC is that previous generations of GPUs would only maintain where the data they need is. It would constantly overwrite that data with new data and all I knows about what is contained in that memory is what is in use and what is not. HBCC not only maintains usage, but context. Imagine an asset siting in the system memory like a texture. One frame uses that texture so the GPU pulled it from the RAM and stuck it in the HBC then took a tile of it from the HBC and moved it to L2 where the GPU continued to pull what was necessary from that tile to do actual work on it in the L1 caches. In the next frame, the same texture is used, instead of having to go to the system RAM again to fetch it (because the developer was an idiot and didn't precache it), the HBCC sees that asset already sitting in HBC and starts using it instead of waiting to get it from the system RAM. That saves a few milliseconds in render time.

I think it will help hugely with tessellation, for example.

kn00tcn · Mar 5, 2017

Prima.Vera said:
The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.

londiste said:
depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.

it's not about resolution or horsepower, a dev can choose to want to cram high density textures that require 6gb minimum, have a minor blur at 4gb, & stutter with more blur at 3gb https://www.computerbase.de/2016-09/grafikkarten-speicher-vram-test/ with screenshots to see it for yourself

actually i'm surprised & disappointed that so many devs choose to fill up an entire 4gb, i prefer things to be streamed with maximum detail on nearby objects, aka megatextures (well, even some of the streaming games seem to do a poor job if there is some loss or stutter at 4gb)

a few days ago i had afterburner open in call of duty 4.... only 300something mb used! windows idle was like 100mb

londiste · Mar 5, 2017

today, practically all games are streaming textures.

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

Processor	Intel® Core™ i7-13700K
Motherboard	Gigabyte Z790 Aorus Elite AX
Cooling	Noctua NH-D15
Memory	32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s)	ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage	2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s)	Acer Predator X34 3440x1440@100Hz G-Sync
Case	NZXT PHANTOM410-BK
Audio Device(s)	Creative X-Fi Titanium PCIe
Power Supply	Corsair 850W
Mouse	Logitech Hero G502 SE
Software	Windows 11 Pro - 64bit
Benchmark Scores	30FPS in NFS:Rivals

Processor	Ryzen 7800X3D
Motherboard	ROG STRIX B650E-F GAMING WIFI
Memory	2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s)	INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage	2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s)	42" LG C2 OLED, 27" ASUS PG279Q
Case	Thermaltake Core P5
Power Supply	Fractal Design Ion+ Platinum 760W
Mouse	Corsair Dark Core RGB Pro SE
Keyboard	Corsair K100 RGB
VR HMD	HTC Vive Cosmos

System Name	Compy 386
Processor	7800X3D
Motherboard	Asus
Cooling	Air for now.....
Memory	64 GB DDR5 6400Mhz
Video Card(s)	7900XTX 310 Merc
Storage	Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s)	55" Samsung 4K HDR
Audio Device(s)	ATI HDMI
Mouse	Logitech MX518
Keyboard	Razer
Software	A lot.
Benchmark Scores	Its fast. Enough.

Processor	i5 4670K - @ 4.8GHZ core
Motherboard	MSI Z87 G43
Cooling	Thermalright Ultra-120 *(Modded to fit on this motherboard)
Memory	16GB 2400MHZ
Video Card(s)	HD7970 GHZ edition Sapphire
Storage	Samsung 120GB 850 EVO & 4X 2TB HDD (Seagate)
Display(s)	42" Panasonice LED TV @120Hz
Case	Corsair 200R
Audio Device(s)	Xfi Xtreme Music with Hyper X Core
Power Supply	Cooler Master 700 Watts

AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

londiste

kn00tcn

Prima.Vera

londiste

Steevo

jigar2speed

W1zzard

Administrator

R0H1T

R0H1T

Super XP

FordGT90Concept

"I go fast!1!11!1!"

RejZoR

FordGT90Concept

"I go fast!1!11!1!"

R0H1T

londiste

R0H1T

londiste

FordGT90Concept

"I go fast!1!11!1!"

kn00tcn

londiste

Processor	Ryzen 7 5700X
Memory	48 GB
Video Card(s)	RTX 4080
Storage	2x HDD RAID 1, 3x M.2 NVMe
Display(s)	30" 2560x1600 + 19" 1280x1024
Software	Windows 10 64-bit

System Name	RiseZEN Gaming PC
Processor	AMD Ryzen 7 5800X @ Auto
Motherboard	Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling	Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory	G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s)	ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage	Corsair Force MP500 480GB M.2 & MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X NVMe 1TB
Display(s)	ASUS ROG Strix 34” XG349C 144Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case	Corsair Obsidian Series 450D Gaming Case
Audio Device(s)	SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply	Corsair RM750x Power Supply
Mouse	Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard	Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software	Windows 11 Pro - 64-Bit Edition
Benchmark Scores	I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...

System Name	BY-2021
Processor	AMD Ryzen 7 5800X (65w eco profile)
Motherboard	MSI B550 Gaming Plus
Cooling	Scythe Mugen (rev 5)
Memory	2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s)	AMD Radeon RX 7900 XT
Storage	Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s)	Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case	Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s)	Realtek ALC1150, Micca OriGen+
Power Supply	Enermax Platimax 850w
Mouse	Nixeus REVEL-X
Keyboard	Tesoro Excalibur
Software	Windows 10 Home 64-bit
Benchmark Scores	Faster than the tortoise; slower than the hare.