• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

GPU Memory Latency Tested on AMD's RDNA 2 and NVIDIA's Ampere Architecture

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,173 (2.79/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I think he's talking about how much space it takes on the chip.
That's not really the only consideration though. Given the latency improvement and how it contributes less to heat than more CUs and/or faster or wider memory, I'd call it a win. This is a far better solution to the alternatives. Do you remember how much more power a 290/390 would consume when clocking up that 512-bit memory? Trust me, the infinity cache is a far better solution. This is actually why I advocate for HBM; power consumption figures are fantastic.
 

Space Lynx

Astronaut
Joined
Oct 17, 2014
Messages
17,444 (4.67/day)
Location
Kepler-186f
Processor 7800X3D -25 all core
Motherboard B650 Steel Legend
Cooling Frost Commander 140
Video Card(s) Merc 310 7900 XT @3100 core -.75v
Display(s) Agon 27" QD-OLED Glossy 240hz 1440p
Case NZXT H710 (Red/Black)
Audio Device(s) Asgard 2, Modi 3, HD58X
Power Supply Corsair RM850x Gold
*licks my rock solid stable 6800*
 
Joined
Apr 10, 2010
Messages
1,864 (0.35/day)
Location
London
System Name Jaspe
Processor Ryzen 1500X
Motherboard Asus ROG Strix X370-F Gaming
Cooling Stock
Memory 16Gb Corsair 3000mhz
Video Card(s) EVGA GTS 450
Storage Crucial M500
Display(s) Philips 1080 24'
Case NZXT
Audio Device(s) Onboard
Power Supply Enermax 425W
Software Windows 10 Pro
Do you remember how much more power a 290/390 would consume when clocking up that 512-bit memory?
Yes, I had one :D
I agree that 128mb is plenty; I was talking about your comment on the other person, I believe he was referring to the space the cache takes; not the amount of cache.
 

Aquinus

Resident Wat-man
Joined
Jan 28, 2012
Messages
13,173 (2.79/day)
Location
Concord, NH, USA
System Name Apollo
Processor Intel Core i9 9880H
Motherboard Some proprietary Apple thing.
Memory 64GB DDR4-2667
Video Card(s) AMD Radeon Pro 5600M, 8GB HBM2
Storage 1TB Apple NVMe, 4TB External
Display(s) Laptop @ 3072x1920 + 2x LG 5k Ultrafine TB3 displays
Case MacBook Pro (16", 2019)
Audio Device(s) AirPods Pro, Sennheiser HD 380s w/ FIIO Alpen 2, or Logitech 2.1 Speakers
Power Supply 96w Power Adapter
Mouse Logitech MX Master 3
Keyboard Logitech G915, GL Clicky
Software MacOS 12.1
I believe he was referring to the space the cache takes; not the amount of cache.
I understand. I'm just saying that it's a worthwhile use of die space given the performance and power characteristics of it.
 
Joined
Apr 30, 2011
Messages
2,719 (0.54/day)
Location
Greece
Processor AMD Ryzen 5 5600@80W
Motherboard MSI B550 Tomahawk
Cooling ZALMAN CNPS9X OPTIMA
Memory 2*8GB PATRIOT PVS416G400C9K@3733MT_C16
Video Card(s) Sapphire Radeon RX 6750 XT Pulse 12GB
Storage Sandisk SSD 128GB, Kingston A2000 NVMe 1TB, Samsung F1 1TB, WD Black 10TB
Display(s) AOC 27G2U/BK IPS 144Hz
Case SHARKOON M25-W 7.1 BLACK
Audio Device(s) Realtek 7.1 onboard
Power Supply Seasonic Core GC 500W
Mouse Sharkoon SHARK Force Black
Keyboard Trust GXT280
Software Win 7 Ultimate 64bit/Win 10 pro 64bit/Manjaro Linux
Kudos to AMD engineers that made it happen. And surely they used help from the Zen engineers. That was publicly declared from AMD almost 2 years ago about the Navi design combo efforts. I had many laughs with the posts that try to reduce the achievement of AMD on the memory department of their GPUs. That makes AMD's effort seem even more impressive. If AMD's arch is inferior to nVidia and this cache made it win over nVidia in 1440P or lower resolutions, while having less power consumption and smaller die, that's what I call a genious work from AMD's engineer department who use a small % of what the nVidia engineer use. As always, haters gonna hate.
 
Joined
Oct 12, 2005
Messages
720 (0.10/day)
Wasn't that statement about the actual use of IC?
I never said to get rid of the whole IC, which was clearly stated in my post. What I wanted is to halve It(64MB instead of 128MB) and the saved up space would be used for more CU. BTW I would love to see a performance penalty graph for using smaller IC to know, If that much cache is really needed or It can be smaller.
rdna_2_deep_dive_infinity_cache_01.png


With 64 MB, it would probably be fine in 1080p, but the hit rate will be much lower in 4K and the card would probably be memory starved. This graph show also why the cards perform so well in 1440p but start to fall behind in 4K. Probably 256 MB would be the perfect spot for 4K.

The thing is Cache are much easier to manufacture (less defect per area) than compute unit. They also consume way less. Also the shorter the data has to travel, the less power it take. The operation itself take very few power but it's moving all the data around that use power. Having a cache that limit the distance data have to travel greatly reduce power consumption.

Also, infinity cache is there to prepare for the next step, Multi chip GPU. but that is another story.
 
Joined
Nov 4, 2005
Messages
12,036 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Smoothness


See the effects of the Infinity Cache in the charts, 3090 & 6900 trade for FPS, but the 6900 has consistently higher frame rates and fewer low FPS frames, which equates to less laggy feeling, IE... smoothness
 
Joined
Feb 20, 2019
Messages
8,399 (3.91/day)
System Name Bragging Rights
Processor Atom Z3735F 1.33GHz
Motherboard It has no markings but it's green
Cooling No, it's a 2.2W processor
Memory 2GB DDR3L-1333
Video Card(s) Gen7 Intel HD (4EU @ 311MHz)
Storage 32GB eMMC and 128GB Sandisk Extreme U3
Display(s) 10" IPS 1280x800 60Hz
Case Veddha T2
Audio Device(s) Apparently, yes
Power Supply Samsung 18W 5V fast-charger
Mouse MX Anywhere 2
Keyboard Logitech MX Keys (not Cherry MX at all)
VR HMD Samsung Oddyssey, not that I'd plug it into this though....
Software W10 21H1, barely
Benchmark Scores I once clocked a Celeron-300A to 564MHz on an Abit BE6 and it scored over 9000.
TBH the cache in RDNA2 is less about performance this gen and more about setting up for chiplets. It's not 100% useless but IPC differences between the 6700XT and similar 5700XT without the cache are really low. Sometimes zero, sometimes negligible. The performance uplift is almost entirely down to the 25-30% increase in clockspeeds.


It's a marketing point for now, that will lay the work for MCM GPUs next gen. Presumably it makes things smoother for raytracing two as the calculations now involve lookups for more data than that just relevant to the pixels any particular CU is working on, ergo more data being required - but for traditional raster based stuff the HWU video above proves how little it's of benefit to this generation.
 
Joined
Oct 12, 2005
Messages
720 (0.10/day)
TBH the cache in RDNA2 is less about performance this gen and more about setting up for chiplets. It's not 100% useless but IPC differences between the 6700XT and similar 5700XT without the cache are really low. Sometimes zero, sometimes negligible. The performance uplift is almost entirely down to the 25-30% increase in clockspeeds.

It's a marketing point for now, that will lay the work for MCM GPUs next gen. Presumably it makes things smoother for raytracing two as the calculations now involve lookups for more data than that just relevant to the pixels any particular CU is working on, ergo more data being required - but for traditional raster based stuff the HWU video above proves how little it's of benefit to this generation.
Well 5700 XT have a 256 bit bus where 6700 XT have a 192 bit bus. The fact that they both maintain similar performance mean that the 96 MB cache here is the "equivalent" of 64 bit bus more or less.

That is still significant since a smaller memory bus mean less space used by the memory controller, less pin on the chip, less trace on the cards, simpler card layout, etc...
 
Joined
Dec 22, 2011
Messages
289 (0.06/day)
Processor Ryzen 7 5800X3D
Motherboard Asus Prime X570 Pro
Cooling Deepcool LS-720
Memory 32 GB (4x 8GB) DDR4-3600 CL16
Video Card(s) PowerColor Radeon RX 7900 XTX Red Devil
Storage Samsung PM9A1 (980 Pro OEM) + 960 Evo NVMe SSD + 830 SATA SSD + Toshiba & WD HDD's
Display(s) Samsung C32HG70
Case Lian Li O11D Evo
Audio Device(s) Sound Blaster Zx
Power Supply Seasonic 750W Focus+ Platinum
Mouse Logitech G703 Lightspeed
Keyboard SteelSeries Apex Pro
Software Windows 11 Pro
I think Nvidia adding FP32 functionality to Its INT units is a pretty good idea. Although I don't know how much transistors or power It cost gaming performance increased by ~25% and then there is the advantage in compute workload. I wouldn't mind If AMD did the same thing.
You got it reversed.
NVIDIAs "INT units" were stripped down CUDA cores, with Ampere they just cut a little less features (like FP32 capability was stripped out before, but not on Ampere) out from them and started calling them CUDA cores again.
AMD runs everything on same units, just like NVIDIAs full CUDA cores do.
 

MxPhenom 216

ASIC Engineer
Joined
Aug 31, 2010
Messages
13,020 (2.48/day)
Location
Loveland, CO
System Name Ryzen Reflection
Processor AMD Ryzen 9 5900x
Motherboard Gigabyte X570S Aorus Master
Cooling 2x EK PE360 | TechN AM4 AMD Block Black | EK Quantum Vector Trinity GPU Nickel + Plexi
Memory Teamgroup T-Force Xtreem 2x16GB B-Die 3600 @ 14-14-14-28-42-288-2T 1.45v
Video Card(s) Zotac AMP HoloBlack RTX 3080Ti 12G | 950mV 1950Mhz
Storage WD SN850 500GB (OS) | Samsung 980 Pro 1TB (Games_1) | Samsung 970 Evo 1TB (Games_2)
Display(s) Asus XG27AQM 240Hz G-Sync Fast-IPS | Gigabyte M27Q-P 165Hz 1440P IPS | LG 24" IPS 1440p
Case Lian Li PC-011D XL | Custom cables by Cablemodz
Audio Device(s) FiiO K7 | Sennheiser HD650 + Beyerdynamic FOX Mic
Power Supply Seasonic Prime Ultra Platinum 850
Mouse Razer Viper v2 Pro
Keyboard Corsair K65 Plus 75% Wireless - USB Mode
Software Windows 11 Pro 64-Bit
AMD should thank a lot to TSMC for allowing them to add that much cache in such little space.
Using cache is in general the lazy man way of solving things.

TSMC wouldn't care. As long as AMDs design conforms to all DRC requirements of TSMC 7nm, timing closed, congestion, etc. TSMC doesn't care what's actually in the chip.

Also, i dont know where you get the idea that cache is the lazy man way of solving things, but it is not. Its actually one of the more critical parts of a chips performance.
 
Joined
Dec 22, 2011
Messages
289 (0.06/day)
Processor Ryzen 7 5800X3D
Motherboard Asus Prime X570 Pro
Cooling Deepcool LS-720
Memory 32 GB (4x 8GB) DDR4-3600 CL16
Video Card(s) PowerColor Radeon RX 7900 XTX Red Devil
Storage Samsung PM9A1 (980 Pro OEM) + 960 Evo NVMe SSD + 830 SATA SSD + Toshiba & WD HDD's
Display(s) Samsung C32HG70
Case Lian Li O11D Evo
Audio Device(s) Sound Blaster Zx
Power Supply Seasonic 750W Focus+ Platinum
Mouse Logitech G703 Lightspeed
Keyboard SteelSeries Apex Pro
Software Windows 11 Pro
TBH the cache in RDNA2 is less about performance this gen and more about setting up for chiplets. It's not 100% useless but IPC differences between the 6700XT and similar 5700XT without the cache are really low. Sometimes zero, sometimes negligible. The performance uplift is almost entirely down to the 25-30% increase in clockspeeds.
It's all about performance, not chiplets.
RDNA2 doesn't offer any IPC increases over RDNA in the Compute Unit department (new units like Ray Accelerators aside) except for what Infinity Cache brings to the table.
They said it outright on release that RDNA2 offers more performance thanks to three things: Higher clocks, Lower power (per clock) and Infinity Cache bandwidth.
 
Joined
Jan 8, 2020
Messages
834 (0.46/day)
Location
Maryland, USA
Processor Ryzen 5 5600X
Motherboard MSI MPG X570S Carbon Max Wifi
Cooling CPU: bequiet! Dark Rock 4. Case fans: 2x bequiet Silent Wings 3 140s, 2x Silent Wings 3 120s
Memory 2 x 8 GB Patriot Viper Steel DDR4-4400 C19
Video Card(s) Sapphire NITRO+ RX 5700 XT
Storage 2TB Mushkin Pilot-E M.2, 1 TB SK Hynix P31 M.2, 1 TB Inland Professional, 500 GB Samsung 860 Evo
Display(s) MSI Optix MAG271CQR 1440p 144Hz, MSI Optix MAG241C 1080p 144Hz
Case Lian Li Lancool III
Audio Device(s) Philips SHP9500, V-Moda BoomPro, Sybasonic Better Connectivity USB DAC/Amp
Power Supply EVGA SuperNOVA G3 80+ Gold 750W
Mouse Glorious Model D Wireless
Keyboard Custom Qwertykeys Navy QK80: Sarokeys Strawberry Wine switches, GMK CYL DMG3 keycaps
Smoothness


See the effects of the Infinity Cache in the charts, 3090 & 6900 trade for FPS, but the 6900 has consistently higher frame rates and fewer low FPS frames, which equates to less laggy feeling, IE... smoothness
@1d10t This is what @TheinsanegamerN is talking about. More consistent frametimes (and therefore more consistent FPS) results in a smoother experience. The charts Steevo linked show examples of that.

Think about it like case fan hysteresis. If you have a fan curve set up so that they run really quiet and then ramp up significantly once a certain temperature is hit, that's a noticeable change in noise. The change in fan speed is noticeable. However, if you just set a curve that's maybe a little louder initially but a much smoother curve, there's a slow change in RPM, and therefore a less drastic change in noise, which makes it less noticeable to the ear. The same principle applies to frametimes and FPS.
 
Joined
Oct 12, 2005
Messages
720 (0.10/day)
TSMC wouldn't care. As long as AMDs design conforms to all DRC requirements of TSMC 7nm, timing closed, congestion, etc. TSMC doesn't care what's actually in the chip.

Also, i dont know where you get the idea that cache is the lazy man way of solving things, but it is not. Its actually one of the more critical parts of a chips performance.
You are right. Actually making fast cache is way more complicated than it look. You have to have mechanism that will check the cache to know if the data you are trying to access is there. The larger the cache, the larger is the amount of work you have to do to figure out if it contain the data you are looking for.

This can add latency. The fact that even with more layer of cache, AMD is able to get lower latency show how well they master the cache thing. They purposely made a lot of effort there because this is a key thing with multi chips modules.
 
Joined
Jun 11, 2020
Messages
574 (0.34/day)
Location
Florida
Processor 5800x3d
Motherboard MSI Tomahawk x570
Cooling Thermalright
Memory 32 gb 3200mhz E die
Video Card(s) 3080
Storage 2tb nvme
Display(s) 165hz 1440p
Case Fractal Define R5
Power Supply Toughpower 850 platium
Mouse HyperX Hyperfire Pulse
Keyboard EVGA Z15
TBH the cache in RDNA2 is less about performance this gen and more about setting up for chiplets. It's not 100% useless but IPC differences between the 6700XT and similar 5700XT without the cache are really low. Sometimes zero, sometimes negligible. The performance uplift is almost entirely down to the 25-30% increase in clockspeeds.

But there must be some architectural difference that allows RDNA2 to clock that much higher at the same voltage. It can't just be AMD just got that good at 7nm, can it?
 
Joined
Jan 24, 2011
Messages
183 (0.04/day)
You got it reversed.
NVIDIAs "INT units" were stripped down CUDA cores, with Ampere they just cut a little less features (like FP32 capability was stripped out before, but not on Ampere) out from them and started calling them CUDA cores again.
AMD runs everything on same units, just like NVIDIAs full CUDA cores do.
As far as I know Turing had 64 FP32 and 64 INT32 Units per SM. Now those 64 INT32 units are capable of either INT32/FP32, but the original Cuda cores were or are not capable of INT32 as far as I know.
 
Last edited:
Joined
Sep 28, 2012
Messages
983 (0.22/day)
System Name Poor Man's PC
Processor Ryzen 7 9800X3D
Motherboard MSI B650M Mortar WiFi
Cooling Thermalright Phantom Spirit 120 with Arctic P12 Max fan
Memory 32GB GSkill Flare X5 DDR5 6000Mhz
Video Card(s) XFX Merc 310 Radeon RX 7900 XT
Storage XPG Gammix S70 Blade 2TB + 8 TB WD Ultrastar DC HC320
Display(s) Xiaomi G Pro 27i MiniLED
Case Asus A21 Case
Audio Device(s) MPow Air Wireless + Mi Soundbar
Power Supply Enermax Revolution DF 650W Gold
Mouse Logitech MX Anywhere 3
Keyboard Logitech Pro X + Kailh box heavy pale blue switch + Durock stabilizers
VR HMD Meta Quest 2
Benchmark Scores Who need bench when everything already fast?
You presented an opinion, an opinion that is objectively incorrect. You presented the argument, if you cant prove your argument then all you are doing is shitting up the thread. "smoothness" IS a noun, per oxford's learner dictionary, and can be measured via frametime measurement.

Oxford: https://www.oxfordlearnersdictionaries.com/us/definition/english/smoothness#:~:text=smoothness-,noun,any rough areas or holes

I can present a new topic for depate too: "Does 1d10t live up to his username?".

"Ad hominem (Latin for 'to the person'), short for argumentum ad hominem, refers to several types of arguments, some but not all of which are fallacious. Typically this term refers to a rhetorical strategy where the speaker attacks the character, motive, or some other attribute of the person making an argument rather than attacking the substance of the argument itself. This avoids genuine debate by creating a diversion to some irrelevant but often highly charged issue. The most common form of this fallacy is "A makes a claim x, B asserts that A holds a property that is unwelcome, and hence B concludes that argument x is wrong".

I'm drawing conclusions based on claims by Linus and Anthony, whom both say in the video "I feel like, if my memory serve, etc." hence (my) term "placebo effect" originated.
I'm not going to delve far away from topic, and if you want to establish dominance over my username, by all means you are welcome.
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Something is up with these results. L0 is something Nvidia integrated first, notice their texture cache is also read-only. So, you misrepresented the results maybe in due part for not integrating texture cache's results.
 
Joined
Jan 24, 2011
Messages
183 (0.04/day)
With 64 MB, it would probably be fine in 1080p, but the hit rate will be much lower in 4K and the card would probably be memory starved. This graph show also why the cards perform so well in 1440p but start to fall behind in 4K. Probably 256 MB would be the perfect spot for 4K.

The thing is Cache are much easier to manufacture (less defect per area) than compute unit. They also consume way less. Also the shorter the data has to travel, the less power it take. The operation itself take very few power but it's moving all the data around that use power. Having a cache that limit the distance data have to travel greatly reduce power consumption.

Also, infinity cache is there to prepare for the next step, Multi chip GPU. but that is another story.
1. If N21 is fine with 128MB for 4K or N23 with 32MB for 1080p I don't see why 64MB shouldn't be fine for 1440p. Each higher resolution needs 2x more IC to keep similar hitrate as shown in that graph.
3. It starts to fall behind? N21 has higher performance than Navi10 the higher the resolution is.
If you meant against Ampere, then isn't It actually because Ampere has a lot more Cuda and has a problem with utilization at lower resolutions?
4. Infinity cache has Its advantages, but It also uses up a lot of space. I think It would have been better If N22 had shaved off 32MB, kept only 64MB of IC and added 8 CU instead. N22 is quite Inefficient for an RDNA2 GPU, because It has too high clocks and adding more CU would mean you can clock It lower.
Here is a nice graph of N22 GPU power consumption at different clockspeeds made by uzzi38. Link and another Link
Increasing the clocks from 2295MHz to 2565Mhz caused the power consumption to increase by 59W!
 
Joined
Nov 4, 2005
Messages
12,036 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
Something is up with these results. L0 is something Nvidia integrated first, notice their texture cache is also read-only. So, you misrepresented the results maybe in due part for not integrating texture cache's results.

Its a simple call to cache with a timer set to count ticks of the clock, so no, unless there is some fundamental misunderstanding that you can explain better as a engineer the results are correct. Ford made the first mass produced automobile, but doesn't make the best one, so your thought process is flawed.

Also, who specifically "mispresented" what, based on your limited understanding?
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
Also, who specifically "mispresented" what, based on your limited understanding?
That means your code didn't use the texture cache.

unless there is some fundamental misunderstanding that you can explain better as a engineer
Since I'm not an engineer, your loophole disregards anything else put forward, but I do indeed want to see AMD on top as a tech enthusiast. I just know that is not present at this time.
 
Joined
Oct 12, 2005
Messages
720 (0.10/day)
1. If N21 is fine with 128MB for 4K or N23 with 32MB for 1080p I don't see why 64MB shouldn't be fine for 1440p. Each higher resolution needs 2x more IC to keep similar hitrate as shown in that graph.
3. It starts to fall behind? N21 has higher performance than Navi10 the higher the resolution is.
If you meant against Ampere, then isn't It actually because Ampere has a lot more Cuda and has a problem with utilization at lower resolutions?
4. Infinity cache has Its advantages, but It also uses up a lot of space. I think It would have been better If N22 had shaved off 32MB, kept only 64MB of IC and added 8 CU instead. N22 is quite Inefficient for an RDNA2 GPU, because It has too high clocks and adding more CU would mean you can clock It lower.
Here is a nice graph of N22 GPU power consumption at different clockspeeds made by uzzi38. Link and another Link
Increasing the clocks from 2295MHz to 2565Mhz caused the power consumption to increase by 59W!
having 64 MB instead of 96 MB could have mean that the card would end up with a 8 GB memory buffer instead if 12, also, it would have mean either a 256 Bit or 128 bit bus. There is a relation between the amount of memory on the card and the amount of infinity cache. This is also probably one of the tricks AMD use to lower memory latency by caching a specific amount of memory per MB of infinity cache. This simplify the caching algorithm. (meaning it take less time to run, ie lower latency.)

Also something that i don't have the data on, but since the relation to memory bus/memory size seems clear, it's quite possible that the 96MB block on NAVI 22 have less bandwidth than the 128 MB on Navi 21.

Also, all chip maker have simulator in house. They probably already tested the scenario you propose versus the scenario they choose in simulation and decided that it was not worth it. NAvi 22 aim for 1440P and not 1080P too
 
Joined
Nov 4, 2005
Messages
12,036 (1.72/day)
System Name Compy 386
Processor 7800X3D
Motherboard Asus
Cooling Air for now.....
Memory 64 GB DDR5 6400Mhz
Video Card(s) 7900XTX 310 Merc
Storage Samsung 990 2TB, 2 SP 2TB SSDs, 24TB Enterprise drives
Display(s) 55" Samsung 4K HDR
Audio Device(s) ATI HDMI
Mouse Logitech MX518
Keyboard Razer
Software A lot.
Benchmark Scores Its fast. Enough.
That means your code didn't use the texture cache.


Since I'm not an engineer, your loophole disregards anything else put forward, but I do indeed want to see AMD on top as a tech enthusiast. I just know that is not present at this time.

I think AMD is going to leverage Infinity Cache to compete with Nvidia because they have been behind in the cache bandwidth race since Maxwell.
AMD had been successively expanding the chip resources, albeit never found the medium to express what it can do unequivocally.

My code wasn't involved.

I appreciate the effort to look unbiased, but the facts are teh 6900XY is on par with the3090, both of which are unavailable for the masses, and the worst part is that the few who have/can get them will probably use them to mine crypto instead of game on, or test hypothesis like ours, or yours.

Also, if a texture cache is read only how is data ever written to it beyond a driver call for texture, which I am assuming the programmer knew about since they know how to test the cache hierarchy latency on modern GPU's.

Plus Samsungs node and lower clock rate for higher TDP may mean Nvidia had to sacifice latency for stability in higher clock speed with capacitive rolloff effects.

And at the end of the day, AMD is making a product that succeeds despite being 1nm less in process size ( if that actually means anything in the real world) that has significanltly better performance in the 99th percentile meaning less stuttering and better feeling overall performance.
 
Joined
Jun 3, 2010
Messages
2,540 (0.48/day)
My code wasn't involved.

I appreciate the effort to look unbiased, but the facts are teh 6900XY is on par with the3090, both of which are unavailable for the masses, and the worst part is that the few who have/can get them will probably use them to mine crypto instead of game on, or test hypothesis like ours, or yours.

Also, if a texture cache is read only how is data ever written to it beyond a driver call for texture, which I am assuming the programmer knew about since they know how to test the cache hierarchy latency on modern GPU's.

Plus Samsungs node and lower clock rate for higher TDP may mean Nvidia had to sacifice latency for stability in higher clock speed with capacitive rolloff effects.

And at the end of the day, AMD is making a product that succeeds despite being 1nm less in process size ( if that actually means anything in the real world) that has significanltly better performance in the 99th percentile meaning less stuttering and better feeling overall performance.
I never said Nvidia didn't have the same cache though, you are taking my last quote very differently.

AMD can go up and down, Nvidia still keeps their own. Adding +1 to AMD is not subtracting from the other, per se.

Truth be told, I would like to chime in to the developer side of the equation, but we aren't the who's who of CUDA development the two of us.

PS: I won't sacrifice my integrity for clout.
 
Joined
Jun 11, 2020
Messages
574 (0.34/day)
Location
Florida
Processor 5800x3d
Motherboard MSI Tomahawk x570
Cooling Thermalright
Memory 32 gb 3200mhz E die
Video Card(s) 3080
Storage 2tb nvme
Display(s) 165hz 1440p
Case Fractal Define R5
Power Supply Toughpower 850 platium
Mouse HyperX Hyperfire Pulse
Keyboard EVGA Z15
And at the end of the day, AMD is making a product that succeeds despite being 1nm less in process size ( if that actually means anything in the real world) that has significanltly better performance in the 99th percentile meaning less stuttering and better feeling overall performance.

did you mean Nvidia?
 
Top