• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

B580 faster than the A770, but has less compute hardware?

Status
Not open for further replies.

Tia

New Member
Joined
Dec 28, 2024
Messages
4 (1.00/day)

The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Compared to the A770. But is still faster?

My current working theory is that they focused too much on compute with the A770 and A750 and too little on memory bandwidth speed.
I am starting to wonder if a A770 can outperform a B580 solely on a compute benchmark.

Anyone has any benchmarks that compares compute of the A770 to the B580?
 

Solaris17

Super Dainty Moderator
Staff member
Joined
Aug 16, 2005
Messages
27,149 (3.84/day)
Location
Alabama
System Name RogueOne
Processor Xeon W9-3495x
Motherboard ASUS w790E Sage SE
Cooling SilverStone XE360-4677
Memory 128gb Gskill Zeta R5 DDR5 RDIMMs
Video Card(s) MSI SUPRIM Liquid X 4090
Storage 1x 2TB WD SN850X | 2x 8TB GAMMIX S70
Display(s) 49" Philips Evnia OLED (49M2C8900)
Case Thermaltake Core P3 Pro Snow
Audio Device(s) Moondrop S8's on schitt Gunnr
Power Supply Seasonic Prime TX-1600
Mouse Razer Viper mini signature edition (mercury white)
Keyboard Monsgeek M3 Lavender, Moondrop Luna lights
VR HMD Quest 3
Software Windows 11 Pro Workstation
Benchmark Scores I dont have time for that.
I dont think thats that simple. It seems maybe, idk under reported? The B series is not an iteration on Alchemist. Battlemage is a completely different architecture.

Here is our B580 review which goes over this in text format.


Here is a video GN did with Tom explaining some of it, if video is more your speed.

 

Tia

New Member
Joined
Dec 28, 2024
Messages
4 (1.00/day)
I dont think thats that simple. It seems maybe, idk under reported? The B series is not an iteration on Alchemist. Battlemage is a completely different architecture.

Here is our B580 review which goes over this in text format.


Here is a video GN did with Tom explaining some of it, if video is more your speed.

Well that answers my question and then some.
Thanks.
 
Joined
Nov 27, 2023
Messages
2,566 (6.40/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent (Solid)
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original) on a X-Raypad Equate Plus V2
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (24H2)
Generally, comparing raw specs works only between GPUs of the same architecture/generation anyway. This is a rabbit hole many enthusiasts end up falling in for some reason whenever leaks or rumors for unreleased GPUs come out, but it’s pointless. The fact that a card has less X and Y than a card of previous architecture means very little and no real performance can be ascertained that way. Like, for example, a 1070 had less of… everything (except VRAM) than a 980Ti and was still equal or faster. Or the gen before that 970 to a 780Ti. You get the picture.

Just wanted to expand and add a bit to what Solaris said above, just as a PSA.
 
Joined
May 25, 2022
Messages
133 (0.14/day)
The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Compared to the A770. But is still faster?

My current working theory is that they focused too much on compute with the A770 and A750 and too little on memory bandwidth speed.
I am starting to wonder if a A770 can outperform a B580 solely on a compute benchmark.

Anyone has any benchmarks that compares compute of the A770 to the B580?
B580 has less physical memory bandwidth. Even though it uses 20GT memory modules it's only 192-bit while A770 is 16GT and 256-bit.

The Alchemist architecture has lots of architectural imbalances that prevent it from being fully utilized: https://chipsandcheese.com/p/microbenchmarking-intels-arc-a770

-It needs high workload otherwise it sits idle.
-The 512GB/s memory bandwidth is wasted on Alchemist because it has hard time utilizing it. Also C&C tests show that it can't even reach 512GB/s in tests except in exceptional circumstances, and work more like a 250-300GB/s device.
-Battlemage has other advances such as Fast Clear, a technology which has been in existence in AMD/Nvidia GPUs more than a decade ago but Intel is only using it now. Fast Clear increases utilization of all parts of the memory subsystem from the private caches, large shared caches, and the VRAM itself.
-Battlemage no longer emulates critical instructions that had to be on Alchemist, so it's faster there too.
-Battlemage also clocks quite a bit higher, reducing the gap further.

The compute "advantage" you are talking about is only theoretical. It basically doesn't lose a single test over Alchemist, meaning in the real world it's absolutely worthless.

Comparing based on shaders, fillrate, and memory bandwidth is like looking number of cylinders in a car and saying one is higher performing than the other. You could have a V8 with less horsepower and torque than V6. Also the V6 car might have a more efficient transmission system, is more aerodynamic, and weighs less. Further, the driver behind the wheel affects performance too. And if you are in the middle of New York, then you'll never be able to go full speed. Complex systems require complex analysis to fully understand.
 
Joined
Jul 20, 2020
Messages
1,158 (0.71/day)
System Name Gamey #1 / #3
Processor Ryzen 7 5800X3D / Ryzen 7 5700X3D
Motherboard Asrock B450M P4 / MSi B450 ProVDH M
Cooling IDCool SE-226-XT / IDCool SE-224-XTS
Memory 32GB 3200 CL16 / 16GB 3200 CL16
Video Card(s) PColor 6800 XT / GByte RTX 3070
Storage 4TB Team MP34 / 2TB WD SN570
Display(s) LG 32GK650F 1440p 144Hz VA
Case Corsair 4000Air / TT Versa H18
Power Supply EVGA 650 G3 / EVGA BQ 500
Generally, comparing raw specs works only between GPUs of the same architecture/generation anyway. This is a rabbit hole many enthusiasts end up falling in for some reason whenever leaks or rumors for unreleased GPUs come out, but it’s pointless. The fact that a card has less X and Y than a card of previous architecture means very little and no real performance can be ascertained that way. Like, for example, a 1070 had less of… everything (except VRAM) than a 980Ti and was still equal or faster.

The 1070 has higher Fillrate and GFLOPS than the 980Ti so it's little surprise it's faster. But how could that be if the 1070 has "less of... everything"?

Because the 1070 has considerably more of the most important thing: clock speed, 1822 vs. 1140 MHz. 60% higher clock speeds is a sledgehammer. You gotta look at all the specs.
 
Joined
Nov 27, 2023
Messages
2,566 (6.40/day)
System Name The Workhorse
Processor AMD Ryzen R9 5900X
Motherboard Gigabyte Aorus B550 Pro
Cooling CPU - Noctua NH-D15S Case - 3 Noctua NF-A14 PWM at the bottom, 2 Fractal Design 180mm at the front
Memory GSkill Trident Z 3200CL14
Video Card(s) NVidia GTX 1070 MSI QuickSilver
Storage Adata SX8200Pro
Display(s) LG 32GK850G
Case Fractal Design Torrent (Solid)
Audio Device(s) FiiO E-10K DAC/Amp, Samson Meteorite USB Microphone
Power Supply Corsair RMx850 (2018)
Mouse Razer Viper (Original) on a X-Raypad Equate Plus V2
Keyboard Cooler Master QuickFire Rapid TKL keyboard (Cherry MX Black)
Software Windows 11 Pro (24H2)
@Lew Zealand
I… am aware? Yes? That wasn’t the only example I had. I would hope that the 1070 had seen a jump in clocks considering it was on a new node. Frequency is absolutely important (though it also can be architecturally dependent), but OPs question was specifically about compute resources here:
The B580 has less: Shading Units, TMUs, ROPs, Execution Units, Tensor Cores, RT Cores.
Though yeah, you are right that at least for gaming workloads one can brute force quite a bit with freqs. Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether. Good shout though, yeah, I should have mentioned frequency as a factor too.
 
Joined
May 25, 2022
Messages
133 (0.14/day)
Though yeah, you are right that at least for gaming workloads one can brute force quite a bit with freqs. Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether. Good shout though, yeah, I should have mentioned frequency as a factor too.
The B580 has less theoretical resources even taking account the frequency.

And frequency itself is not an easy thing to do either. Often the losses come due to frequency differences. It's quite an execution marvel for modern 400mm2+ GPUs to be at 2.5-3GHz frequencies. Geforce 10x generation that you guys are talking about, Nvidia rearchitected the circuitry to get higher frequency, so a lot has went into that as well.
Pure compute accelerators tend to be more efficiency and constant use oriented and tend to run lower clocks. But those are a different kettle of fish altogether.
The enthusiast products have quite a high requirement. In addition to the market accepting only fraction of a price of those GPUs, there are hundreds of thousands of games out there and the GPU has to work with thousand different configurations across dozens of different CPU generations.

In fact, I think a company that figures out the enthusiast segment basically guarantees success in every other market, because the requirements are so stringent.
 
Status
Not open for further replies.
Top