• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

AMD "Vega" High Bandwidth Cache Controller Improves Minimum and Average FPS

Joined
Feb 3, 2017
Messages
3,863 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
The actual amount of data used is negligible over the PCIe bus, I suggest a read of the PCIe scaling article W1zz did, most graphics cards only use X4 lanes of 2.0 in actual bandwidth, more than that only gives a few percent (not frames per second) more performance, so 60FPS +/- 3% doesn't really mean much.
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?
 
Joined
Feb 9, 2009
Messages
1,618 (0.28/day)
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.
you enjoy stutters during heavy usage? this affects ALL gpus ALL sizes ALL performance levels
 
Joined
Sep 15, 2011
Messages
6,825 (1.40/day)
Processor Intel® Core™ i7-13700K
Motherboard Gigabyte Z790 Aorus Elite AX
Cooling Noctua NH-D15
Memory 32GB(2x16) DDR5@6600MHz G-Skill Trident Z5
Video Card(s) ZOTAC GAMING GeForce RTX 3080 AMP Holo
Storage 2TB SK Platinum P41 SSD + 4TB SanDisk Ultra SSD + 500GB Samsung 840 EVO SSD
Display(s) Acer Predator X34 3440x1440@100Hz G-Sync
Case NZXT PHANTOM410-BK
Audio Device(s) Creative X-Fi Titanium PCIe
Power Supply Corsair 850W
Mouse Logitech Hero G502 SE
Software Windows 11 Pro - 64bit
Benchmark Scores 30FPS in NFS:Rivals
The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.
 
Joined
Feb 3, 2017
Messages
3,863 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.
 
Joined
Nov 4, 2005
Messages
12,054 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
well, don't focus on the used part. pci-e 3.0 x16 has 15.75 GB/s of bandwidth. rx480 with its good old gddr5 has 200+GB/s, high-end gpus have more.

in context of using ram in addition to vram why would memory management be a more limiting factor over a very narrow pipe?


The VMEM bandwidth is used every time the GPU performs Anti-Alising and Anisotropic Filtering, so if I need 70% of the bandwidth to perform after render effects, that means only 30% is actually used for frame rendering, and a part of that is used to store the finished frames.

I have a tuner card in a PCIe X1 slot that sends 24FPS of 1080i plus audio to my GPU directly and then it gets up scaled in hardware, the actual HDMI bandwidth is much higher though. So why can't I run my GPU at PCIE x1?

It's all about where the bandwidth is used and when, and PCIE is over kill for Graphics cards as they are today.
 
Joined
Mar 6, 2012
Messages
570 (0.12/day)
Processor i5 4670K - @ 4.8GHZ core
Motherboard MSI Z87 G43
Cooling Thermalright Ultra-120 *(Modded to fit on this motherboard)
Memory 16GB 2400MHZ
Video Card(s) HD7970 GHZ edition Sapphire
Storage Samsung 120GB 850 EVO & 4X 2TB HDD (Seagate)
Display(s) 42" Panasonice LED TV @120Hz
Case Corsair 200R
Audio Device(s) Xfi Xtreme Music with Hyper X Core
Power Supply Cooler Master 700 Watts
Bla bla bla. Do I get to buy something that performs like the 1070 but costs less than 460/950? Otherwise it's just fvcking bla bla bla.

Kid why do you smoke so much that you only expect AMD to give you everything at low cost. Why don't you ask Nvidia about it ? Oh wait they just lowered 40 dollars on GTX 1070, happy ?
 

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
28,015 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.
 
Joined
Apr 12, 2013
Messages
7,598 (1.77/day)
I don't understand how lowering the amount the VRAM amount would help with FPS. Can anyone explain this?
This probably works the same way as the (pseudo) SLC cache on TLC drives, similar in the way that it speeds up the frequent VRAM ops lifting the min & avg fps.
 
Joined
Apr 12, 2013
Messages
7,598 (1.77/day)
The reason they lowered the VRAM availability is that they wanted to place Vega into worst possible situation. Situation where HBC really shows it's strength, situations when game VRAM usage goes beyond what you actually have on-board.

Only thing that I wonder about is if HBC can do the data management on its own or if it has to be specifically coded to use it. Because if it can be used out of the box with anything, it'll be awesome. But if you have to specifically code for it, then that's a problem by itself.
Well if there's anything like HBC in Scorpio or PS5 (whenever it's released) then there's a good chance this approach will be popular, even if some years down the line.
 
Joined
Mar 23, 2005
Messages
4,100 (0.57/day)
Location
Ancient Greece, Acropolis (Time Lord)
System Name RiseZEN Gaming PC
Processor AMD Ryzen 7 5800X @ Auto
Motherboard Asus ROG Strix X570-E Gaming ATX Motherboard
Cooling Corsair H115i Elite Capellix AIO, 280mm Radiator, Dual RGB 140mm ML Series PWM Fans
Memory G.Skill TridentZ 64GB (4 x 16GB) DDR4 3200
Video Card(s) ASUS DUAL RX 6700 XT DUAL-RX6700XT-12G
Storage Corsair Force MP500 480GB M.2 & MP510 480GB M.2 - 2 x WD_BLACK 1TB SN850X NVMe 1TB
Display(s) ASUS ROG Strix 34” XG349C 144Hz 1440p + Asus ROG 27" MG278Q 144Hz WQHD 1440p
Case Corsair Obsidian Series 450D Gaming Case
Audio Device(s) SteelSeries 5Hv2 w/ Sound Blaster Z SE
Power Supply Corsair RM750x Power Supply
Mouse Razer Death-Adder + Viper 8K HZ Ambidextrous Gaming Mouse - Ergonomic Left Hand Edition
Keyboard Logitech G910 Orion Spectrum RGB Gaming Keyboard
Software Windows 11 Pro - 64-Bit Edition
Benchmark Scores I'm the Doctor, Doctor Who. The Definition of Gaming is PC Gaming...
I wonder where I get my degree in keyboard engineering?

From the posts in this thread at the Nvidia school of fanboy!!

First we had people butthurt about all the AMD news since if you have to buy AMD you are obviously a piss poor peon that shouldn't have a computer, and now we have a lot of posts about a new technology from AMD and lots of hate tossed it way by salad tossers, with no syrup.
What some fail to understand is simply, AMD is an innovator. If they were not, they would have gone out of business.
They don't copy, they Innovate & Design. Taking chances, because they have no choice but to do such a thing. Patients paid off with Ryzen, and I can see similar success with Vega.

FYI for all criticizing HBM. GDDR5 or what every you call it is outdated. HBM is the way for the future IMO, and it gets better with each new version coming out.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
So, in an essence, AMD has expanded the cache hierarchy. We have L1 and L2 on GPU itself, L3 is basically VRAM (I'm not aware of L3 being used on GPU's unlike with CPU's or is it?) and now they've added L4 which is system RAM. All this is usually controlled by algorithm/prediction based prefetchers.

I mean, if this will be fully automatic without any need for special game code, it's gonna be nice and it's going to dramatically expand the usability of the graphic card over time as it ages and new demanding games come out with more memory needed to work. Sure it won't be as fast as having as much VRAm available at all times, but it won't be nearly as bad as running out of VRAM entirely. I know Win8/Win10 already does this to small extent, but I don't think not even nearly in such extent as VEGA will be doing it this.

I mean, with Vega, my 32GB of system RAM will finally find a very good use. Because for games, not even 16GB is really needed. Meaning other 16GB is idling to itself most of the time. But Vega will be able to use that. I like the idea very much.
Think Windows SuperFetch. It keeps assets in memory and removes them as the space is needed. Should the asset be required again, access to it will be much faster than having to pull it from a slower memory pool. It's really smart that they're doing this and it's kind of silly it hasn't been done yet.

Just something slightly related: I went to a GDC session yday talking about DX12 optimization, and one recommendation was to use the copy queue for all GPU<->CPU memory transfers. These happen in the background, completely independent of GPU activity, at full PCIe speeds. The key is to anticipate a few frames early that you will need the data, so it's in GPU memory when it is needed, so no stuttering occurs.
So HBCC is intended to stand in when developers fail to do that, or anticipate incorrectly.
 
Last edited:
Joined
Oct 2, 2004
Messages
13,791 (1.86/day)
I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.


Judging from that picture, HBCC is a memory manager that sits below the L2 that has access to the HBM (presumably HMB2 stacks), system RAM, NAND, and even network (clearly aimed at enterprise customers). It moves pages of memory closer to, and into the L2 that it anticipates being necessary and it removes pages from the L2 that are expired.

Would have to watch the presentation to be sure.
 
Joined
Apr 12, 2013
Messages
7,598 (1.77/day)
I don't think it works that way, I think it can actually fetch data directly from RAM pool (or even SSD pool). It will just likely organize data in such a way that frequently used data is in VRAM and the less frequent one in RAM and even less frequent on SSD. But it doesn't mean it has to swap that data through VRAM to utilize it.

I mean, they use similar system on professional cards, you know, the one that comes with NAND attached to it? Surely, they know already how things work and they are confident enough they could unlesh this tech to consumer market...
In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.
 
Joined
Feb 3, 2017
Messages
3,863 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
In which case the HBCC should/could be faster, in theory, than the rest of HBM. Like I said previously, just as there's SLC cache in TLC drives, otherwise it makes little sense to partition the VRAM in such a fashion that it increases the complexity, possibly negating the HBM advantage over traditional GDDR5(x) or anything else.
hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.
 
Joined
Apr 12, 2013
Messages
7,598 (1.77/day)
hbcc is the controller, hbc is hbm.
what they are doing doesn't really increase complexity, it just builds on and expands the existing memory organization for more flexibility.
I know that, but what's the point of a cache if it isn't faster than the VRAM (HBM) since the game (engine) & the OS do a bit a caching via software itself. There are two theories in this very thread, it could be something like virtual memory/pagefile (so it has to be faster than normal VRAM operations) or it could be prefetch/superfetch in which case I'm concerned about the overall benefit.
 
Joined
Feb 3, 2017
Messages
3,863 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
it is not meant to cache vram, vram itself is meant to be a cache for memory/storage at the next level where ever that may be.

actually from what has been said the controller seems to be meant for caching vram as well but that is done with l2 cache.
 

FordGT90Concept

"I go fast!1!11!1!"
Joined
Oct 13, 2008
Messages
26,263 (4.42/day)
Location
IA, USA
System Name BY-2021
Processor AMD Ryzen 7 5800X (65w eco profile)
Motherboard MSI B550 Gaming Plus
Cooling Scythe Mugen (rev 5)
Memory 2 x Kingston HyperX DDR4-3200 32 GiB
Video Card(s) AMD Radeon RX 7900 XT
Storage Samsung 980 Pro, Seagate Exos X20 TB 7200 RPM
Display(s) Nixeus NX-EDG274K (3840x2160@144 DP) + Samsung SyncMaster 906BW (1440x900@60 HDMI-DVI)
Case Coolermaster HAF 932 w/ USB 3.0 5.25" bay + USB 3.2 (A+C) 3.5" bay
Audio Device(s) Realtek ALC1150, Micca OriGen+
Power Supply Enermax Platimax 850w
Mouse Nixeus REVEL-X
Keyboard Tesoro Excalibur
Software Windows 10 Home 64-bit
Benchmark Scores Faster than the tortoise; slower than the hare.
L2 is hugely faster than HBM. Likely the reason why they have no L3 cache is because of HBM's performance. Modern CPUs have an L3, some even an L4, because DDR3 was so slow compared to the L2. L4 went away (at least for now) because of the transition to DDR4.

My understanding is that what is unique to HBCC is that previous generations of GPUs would only maintain where the data they need is. It would constantly overwrite that data with new data and all I knows about what is contained in that memory is what is in use and what is not. HBCC not only maintains usage, but context. Imagine an asset siting in the system memory like a texture. One frame uses that texture so the GPU pulled it from the RAM and stuck it in the HBC then took a tile of it from the HBC and moved it to L2 where the GPU continued to pull what was necessary from that tile to do actual work on it in the L1 caches. In the next frame, the same texture is used, instead of having to go to the system RAM again to fetch it (because the developer was an idiot and didn't precache it), the HBCC sees that asset already sitting in HBC and starts using it instead of waiting to get it from the system RAM. That saves a few milliseconds in render time.

I think it will help hugely with tessellation, for example.
 
Last edited:
Joined
Feb 9, 2009
Messages
1,618 (0.28/day)
The VRAM is no longer the bottleneck factor for a long time. Even crappy video cards now have 4GB as standard, which is way more enough for 1080p gaming. The GPU horsepower is now the decisive factor in 99.9% of occasions.
depends on the game. 4gb has not been enough in many cases for highest (usually texture) settings, especially at higher resolutions like 1440p or uhd.
this is even more relevant now when gpus do have horsepower to run games at these resolutions.
it's not about resolution or horsepower, a dev can choose to want to cram high density textures that require 6gb minimum, have a minor blur at 4gb, & stutter with more blur at 3gb https://www.computerbase.de/2016-09/grafikkarten-speicher-vram-test/ with screenshots to see it for yourself

actually i'm surprised & disappointed that so many devs choose to fill up an entire 4gb, i prefer things to be streamed with maximum detail on nearby objects, aka megatextures (well, even some of the streaming games seem to do a poor job if there is some loss or stutter at 4gb)

a few days ago i had afterburner open in call of duty 4.... only 300something mb used! windows idle was like 100mb
 
Joined
Feb 3, 2017
Messages
3,863 (1.33/day)
Processor Ryzen 7800X3D
Motherboard ROG STRIX B650E-F GAMING WIFI
Memory 2x16GB G.Skill Flare X5 DDR5-6000 CL36 (F5-6000J3636F16GX2-FX5)
Video Card(s) INNO3D GeForce RTX™ 4070 Ti SUPER TWIN X2
Storage 2TB Samsung 980 PRO, 4TB WD Black SN850X
Display(s) 42" LG C2 OLED, 27" ASUS PG279Q
Case Thermaltake Core P5
Power Supply Fractal Design Ion+ Platinum 760W
Mouse Corsair Dark Core RGB Pro SE
Keyboard Corsair K100 RGB
VR HMD HTC Vive Cosmos
today, practically all games are streaming textures.
 
Top