• Welcome to TechPowerUp Forums, Guest! Please check out our forum guidelines for info related to our community.

New NVIDIA Tesla GPUs Reduce Cost Of Supercomputing By A Factor Of 10

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
I know this is getting off topic, but what exactly is this?
It comes with CCC suite 9.10. :confused:
http://img.techpowerup.org/091116/Capture004.jpg

The free AMD video transcoding application, I guess. In it's first itterations was extremely buggy and useless, because it produced massive artifacts on videos. I have not heard since, so I don't know it it has improved.

PD. I don't even know for sure if it's that TBH. :laugh:
 
Joined
Oct 1, 2006
Messages
4,934 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
The free AMD video transcoding application, I guess. In it's first itterations was extremely buggy and useless, because it produced massive artifacts on videos. I have not heard since, so I don't know it it has improved.

PD. I don't even know for sure if it's that TBH. :laugh:
Its not that PoS.
I wouldn't touch that Avivo transcoder with a 10 foot pole, don't tempt me :laugh:

Edit: you temped to download that thing lol :roll:
Interesting enough that PoS finally does what it claims to do, it actually loads the GPU @11~17% in pulses.
 
Last edited:
Joined
Nov 13, 2009
Messages
5,614 (1.02/day)
Location
San Diego, CA
System Name White Boy
Processor Core i7 3770k @4.6 Ghz
Motherboard ASUS P8Z77-I Deluxe
Cooling CORSAIR H100
Memory CORSAIR Vengeance 16GB @ 2177
Video Card(s) EVGA GTX 680 CLASSIEFIED @ 1250 Core
Storage 2 Samsung 830 256 GB (Raid 0) 1 Hitachi 4 TB
Display(s) 1 Dell 30U11 30"
Case BIT FENIX Prodigy
Audio Device(s) none
Power Supply SeaSonic X750 Gold 750W Modular
Software Windows Pro 7 64 bit || Ubuntu 64 Bit
Benchmark Scores 2017 Unigine Heaven :: P37239 3D Mark Vantage
You better hope its not Q3 by the way the 40nm yeilds look :shadedshu

Your speaking of the laughable article written by the giant tool Charlie Demerjian ( http://www.semiaccurate.com/2009/09/15/nvidia-gt300-yeilds-under-2/ ) even if it is true, it's far from uncommon for early fab results to be poor. It happen to all MC companies ( microcircuitry ) . Years back in 1995 I can remember hearing tell of AMD's K5 ( http://en.wikipedia.org/wiki/AMD_K5 ) processors reaching a all time low fab rate of of 2 out of a 250 fab wafer! Let alone the fact they were basically just reengineered Pentiums . ZOMG that's less than 1% lets write an article about it! Lets continue to the complete lack of creditability and objectivity Charlie Demerjian has. The essence of what I am saying is, he write articles that rarely cite any fact, and contain little more than jaded, pessimistic, and unobjective opinion.
 
Joined
Oct 1, 2006
Messages
4,934 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
Your speaking of the laughable article written by the giant tool Charlie Demerjian ( http://www.semiaccurate.com/2009/09/15/nvidia-gt300-yeilds-under-2/ ) even if it is true, it's far from uncommon for early fab results to be poor. It happen to all MC companies ( microcircuitry ) . Years back in 1995 I can remember hearing tell of AMD's K5 ( http://en.wikipedia.org/wiki/AMD_K5 ) processors reaching a all time low fab rate of of 2 out of a 250 fab wafer! Let alone the fact they were basically just reengineered Pentiums . ZOMG that's less than 1% lets write an article about it! Lets continue to the complete lack of creditability and objectivity Charlie Demerjian has. The essence of what I am saying is, he write articles that rarely cite any fact, and contain little more than jaded, pessimistic, and unobjective opinion.
I have never read that site to be honest. :slap:
It is common sense to know that the 40nm yields are not good, simply by looking at the supply (or the lack) of the 5800 series.
 
Joined
Nov 13, 2009
Messages
5,614 (1.02/day)
Location
San Diego, CA
System Name White Boy
Processor Core i7 3770k @4.6 Ghz
Motherboard ASUS P8Z77-I Deluxe
Cooling CORSAIR H100
Memory CORSAIR Vengeance 16GB @ 2177
Video Card(s) EVGA GTX 680 CLASSIEFIED @ 1250 Core
Storage 2 Samsung 830 256 GB (Raid 0) 1 Hitachi 4 TB
Display(s) 1 Dell 30U11 30"
Case BIT FENIX Prodigy
Audio Device(s) none
Power Supply SeaSonic X750 Gold 750W Modular
Software Windows Pro 7 64 bit || Ubuntu 64 Bit
Benchmark Scores 2017 Unigine Heaven :: P37239 3D Mark Vantage
I have never read that site to be honest. :slap:
It is common sense to know that the 40nm yields are not good, simply by looking at the supply (or the lack) of the 5800 series.

The lack of 5800's is due more to the fact AMD's fab / manufacturing, are separate entities / companies, and while it cuts costs, and kept AMD out of bankruptcy. But prevents them from producing their high end products in any large quantities. Hence their budget minded approach to sales, it really isn't a choice, it's all they can do, to keep money in their coffers, and hope to expand come 2012. Their other choice is to try to compete directly with intel, and fade even faster into irrelevance, well faster than they are now anyway. :slap:
 
Joined
Oct 1, 2006
Messages
4,934 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
The lack of 5800's is due more to the fact AMD's fab / manufacturing, are separate entities / companies, and while it cuts costs, and kept AMD out of bankruptcy. But prevents them from producing their high end products in any large quantities. Hence their budget minded approach to sales, it really isn't a choice, it's all they can do, to keep money in their coffers, and hope to expand come 2012. Their other choice is to try to compete directly with intel, and fade even faster into irrelevance, well faster than they are now anyway. :slap:
First of all, AMD don't own any Fabs anymore, and their Graphics were never manufactured in their Fabs. :pimp:

It is TSMC that makes their graphics chips, and it is the same comapny that makes graphics chips for nVidia.:slap:
The actual cards are make by their AIBs, companies like Sapphire (PC Partner) are the ones that actually make the cards.

AMD is a fabless company just like nVidia is now.
Globalfoundries and their Fabs were never invloved.:pimp:

What can be tell from this is, the Fermi's larger die size won't make their yields any better than the Cypress.
So unless TSMC gets their yields up, don't expect a sufficient supply of Fermi(s).
 
Last edited:

W1zzard

Administrator
Staff member
Joined
May 14, 2004
Messages
27,972 (3.71/day)
Processor Ryzen 7 5700X
Memory 48 GB
Video Card(s) RTX 4080
Storage 2x HDD RAID 1, 3x M.2 NVMe
Display(s) 30" 2560x1600 + 19" 1280x1024
Software Windows 10 64-bit
This is proof there is a gt300. So where is our desktop cards huh nvidia?

proof that they took a photo of something and wrote a press release
edit: it's not even a photo .. it's rendered not

The card can use up to 1TB of system memory?

afaik it means that the gpu architecture is able to address up to 1 tb of memory .. like 32-bit -> 64-bit
 
Last edited:
Joined
Aug 15, 2008
Messages
5,941 (0.99/day)
Location
Watauga, Texas
System Name Univac SLI Edition
Processor Intel Xeon 1650 V3 @ 4.2GHz
Motherboard eVGA X99 FTW K
Cooling EK Supremacy EVO, Swiftech MCP50x, Alphacool NeXXos UT60 360, Black Ice GTX 360
Memory 2x16GB Corsair Vengeance LPX 3000MHz
Video Card(s) Nvidia Titan X Tri-SLI w/ EK Blocks
Storage HyperX Predator 240GB PCI-E, Samsung 850 Pro 512GB
Display(s) Dell UltraSharp 34" Ultra-Wide (U3415W) / (Samsung 48" Curved 4k)
Case Phanteks Enthoo Pro M Acrylic Edition
Audio Device(s) Sound Blaster Z
Power Supply Thermaltake 1350watt Toughpower Modular
Mouse Logitech G502
Keyboard CODE 10 keyless MX Clears
Software Windows 10 Pro
I just read q2 and thats when most all of us are expecting 300. Guess i shoulda read a little more Me ->:slap:<- me
 

3volvedcombat

New Member
Joined
May 10, 2009
Messages
1,514 (0.27/day)
Location
South California, The desert.
System Name My Computer
Processor Core 2 Q9550 4Ghz 1.23volts
Motherboard Gigabyte
Cooling Corsair
Memory OCZ
Video Card(s) Galaxy
Storage Western Digital
Display(s) Acer
Case Lian li
Audio Device(s) Asus
Power Supply Corsiar
Software Microsoft
Benchmark Scores 25,000 3dmark06 at 4.35Ghz processor, 835core card!
wow they finnaly HAVE A WORKING MODEL OF THERE NEW CORE THIS MEANS THAT HOPEFULLY THE WORLD WILL SEE SOME FERMI SLAPED INTO THE WORLD >.<.

*ATI Lol's while they release HD 5870x2 and have all the shares on there highest end series while everybody goes broke for shiat*
 
Joined
Jul 19, 2006
Messages
43,609 (6.48/day)
Processor AMD Ryzen 7 7800X3D
Motherboard ASUS TUF x670e-Plus Wifi
Cooling EK AIO 360. Phantek T30 fans.
Memory 32GB G.Skill 6000Mhz
Video Card(s) Asus RTX 4090
Storage WD/Samsung m.2's
Display(s) LG C2 Evo OLED 42"
Case Lian Li PC 011 Dynamic Evo
Audio Device(s) Topping E70 DAC, SMSL SP200 Amp, Adam Audio T5V's, Hifiman Sundara's.
Power Supply FSP Hydro Ti PRO 1000W
Mouse Razer Basilisk V3 Pro
Keyboard Epomaker 84 key
Software Windows 11 Pro
The free AMD video transcoding application, I guess. In it's first itterations was extremely buggy and useless, because it produced massive artifacts on videos. I have not heard since, so I don't know it it has improved.

PD. I don't even know for sure if it's that TBH. :laugh:

I use it all the time for YouTube stuff. MPEG-2 720p works great, since 9.8's anyways.
 
Joined
Jul 2, 2008
Messages
3,638 (0.60/day)
Location
California
Joined
Aug 15, 2008
Messages
5,941 (0.99/day)
Location
Watauga, Texas
System Name Univac SLI Edition
Processor Intel Xeon 1650 V3 @ 4.2GHz
Motherboard eVGA X99 FTW K
Cooling EK Supremacy EVO, Swiftech MCP50x, Alphacool NeXXos UT60 360, Black Ice GTX 360
Memory 2x16GB Corsair Vengeance LPX 3000MHz
Video Card(s) Nvidia Titan X Tri-SLI w/ EK Blocks
Storage HyperX Predator 240GB PCI-E, Samsung 850 Pro 512GB
Display(s) Dell UltraSharp 34" Ultra-Wide (U3415W) / (Samsung 48" Curved 4k)
Case Phanteks Enthoo Pro M Acrylic Edition
Audio Device(s) Sound Blaster Z
Power Supply Thermaltake 1350watt Toughpower Modular
Mouse Logitech G502
Keyboard CODE 10 keyless MX Clears
Software Windows 10 Pro
With 2 GPUs. I think somebody is BSing somewhere.
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
http://forums.techpowerup.com/showpost.php?p=1638012&postcount=14

The ratio between single and double precision performance is ~0.083
And :

No one surprise that this card single precision performance is ~4.7 TFLOPS!? (570GFLOPS*0.083)

http://forums.techpowerup.com/showpost.php?p=1638260&postcount=114

And HD5970 has the same compute performance!

>.>

The ratio in Fermi is 0.5, so these Tesla cards will have 1040-1260 single precision Gflops. Don't let the "low" number fool you anyway, these Fermi cards will trounce the Ati cards when it comes to general computing.

Don't let the numbers fool you in comparison to GTX285 or Ati cards either, GTX285 numbers are based on dual-issue, something that was never usable, real FP was more like 650 Gflops on the GTX285. Nvidia/Ati Gflops numbers don't correlate either, if 650 Gflps GTX285 is still significantly faster than the 1360 Gflops HD4890, the Fermi with 1260 is going to be significantly faster than the 2700 GFlops HD5870. Tesla cards are usually underclocked in comparison to desktop GPUs AFAIK, so this 520-630 DP numbers on the teslas could be the testimony of the power of the GTX380.
 
Joined
Oct 1, 2006
Messages
4,934 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
The ratio in Fermi is 0.5, so these Tesla cards will have 1040-1260 single precision Gflops. Don't let the "low" number fool you anyway, these Fermi cards will trounce the Ati cards when it comes to general computing.

Don't let the numbers fool you in comparison to GTX285 or Ati cards either, GTX285 numbers are based on dual-issue, something that was never usable, real FP was more like 650 Gflops on the GTX285. Nvidia/Ati Gflops numbers don't correlate either, if 650 Gflps GTX285 is still significantly faster than the 1360 Gflops HD4890, the Fermi with 1260 is going to be significantly faster than the 2700 GFlops HD5870. Tesla cards are usually underclocked in comparison to desktop GPUs AFAIK, so this 520-630 DP numbers on the teslas could be the testimony of the power of the GTX380.
We don't know what kind of architechure the Fermi is built on anayways.
It is still too early to say before we even see a Engineering Sample in action.

If nVidia somehow and for some reason go for a SIMD architecture, the theoretical limit will sky rocket just like the RV7X0. :rolleyes:
All we have are some vague numbers that doesn't mean too much yet.

It is well possible that the Fermi is more optimized in GPGPU than its predecessors, afterall this is where the big bucks are.
I am more interested in the Graphics performance of a GPU, but this thread is about the new Tesla so I guess I am off topic.
 
Last edited:

@RaXxaa@

New Member
Joined
Jun 29, 2009
Messages
473 (0.08/day)
Location
Pakistan & US
System Name ROG
Processor AMD PhenomII X6 1090T 3.2-4.0 GHZ
Motherboard Crosshair IV Formula
Cooling Thermaltake Frio
Memory OCZ Gold 1333 4GB
Video Card(s) XFX Radeon HD 5850
Storage 1000GB SeaGate Sata
Case Thermaltake V4
Audio Device(s) Built in
Power Supply OCZ Modextreme 700W
Software Win7 & Win2k8 server
Benchmark Scores Almost 20300 in 3D 06 6.9 IN Cinebench 11.5 CPU Score
Way too overpriced, sure its godd but for gaming seriously i would never pay couple of Gs for a GPU... Sure even in crysis some tesla gives maybe 350 fps but plzz i would buy a gpu that can just give me 35 fps thats it good enough gaming for me
 
Joined
Sep 24, 2008
Messages
2,697 (0.45/day)
System Name Dire Wolf IV
Processor Intel Core i9 14900K
Motherboard Asus ROG STRIX Z790-I GAMING WIFI
Cooling Arctic Liquid Freezer II 280 w/Thermalright Contact Frame
Memory 2x24GB Corsair DDR5 6667
Video Card(s) NVIDIA RTX4080 FE
Storage AORUS Gen4 7300 1TB + Western Digital SN750 500GB
Display(s) Alienware AW3423DWF (QD-OLED, 3440x1440, 165hz)
Case Corsair Airflow 2000D
Power Supply Corsair SF1000L
Mouse Razer Deathadder Essential
Keyboard Chuangquan CQ84
Software Windows 11 Professional
Way too overpriced, sure its godd but for gaming seriously i would never pay couple of Gs for a GPU... Sure even in crysis some tesla gives maybe 350 fps but plzz i would buy a gpu that can just give me 35 fps thats it good enough gaming for me

Of course, Tesla GPUs have nothing to do with gaming.
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
We don't know what kind of architechure the Fermi is built on anayways.
It is still too early to say before we even see a Engineering Sample in action.
If nVidia somehow and for some reason go for a SIMD architecture, the theoretical limit will sky rocket just like the RV7X0. :rolleyes:

We do know the architecture. White papers have been out for long, architecture is more scalar than it ever was. Nvidia has always used SIMD architecture anyway, but they have not used 5 ALU wide VLIW shader processors. That's the biggest lie AMD has ever made, they have really only 160 SPs on the RV770. That's the "problem" in Ati cards, the effective Gflops on the HD4870 ranges between 1200 and 240 Gflops single precision because of that, depending on how many ALUs-per-SP can be used in a certain scenario. In a general computing application you will be closer to the low end and that's why in F@H you can see Nvidia cards topping out Ati cards that are suposed to be much faster.
 
Joined
Oct 1, 2006
Messages
4,934 (0.74/day)
Location
Hong Kong
Processor Core i7-12700k
Motherboard Z690 Aero G D4
Cooling Custom loop water, 3x 420 Rad
Video Card(s) RX 7900 XTX Phantom Gaming
Storage Plextor M10P 2TB
Display(s) InnoCN 27M2V
Case Thermaltake Level 20 XT
Audio Device(s) Soundblaster AE-5 Plus
Power Supply FSP Aurum PT 1200W
Software Windows 11 Pro 64-bit
We do know the architecture. White papers have been out for long, architecture is more scalar than it ever was. Nvidia has always used SIMD architecture anyway, but they have not used 5 ALU wide VLIW shader processors. That's the biggest lie AMD has ever made, they have really only 160 SPs on the RV770. That's the "problem" in Ati cards, the effective Gflops on the HD4870 ranges between 1200 and 240 Gflops single precision because of that, depending on how many ALUs-per-SP can be used in a certain scenario. In a general computing application you will be closer to the low end and that's why in F@H you can see Nvidia cards topping out Ati cards that are suposed to be much faster.
Well the Shader processor count is more marketing than anything.
The thing is out of 100 people, how many know what a "5 ALU wide VLIW SP" means?
Very very few companies are totally honest in marketing.
White paper tells you what a product is suppose to do, but it won't tell you how exactly it executes them in the hardware level.
The specific design of the chip is worth millions if not billions of dollars.

Since you mentioned the GTX380, GPGPU performance don't directly transfers to gamming performance.
 
Last edited:
Joined
May 4, 2009
Messages
1,972 (0.35/day)
Location
Bulgaria
System Name penguin
Processor R7 5700G
Motherboard Asrock B450M Pro4
Cooling Some CM tower cooler that will fit my case
Memory 4 x 8GB Kingston HyperX Fury 2666MHz
Video Card(s) IGP
Storage ADATA SU800 512GB
Display(s) 27' LG
Case Zalman
Audio Device(s) stock
Power Supply Seasonic SS-620GM
Software win10
We do know the architecture. White papers have been out for long, architecture is more scalar than it ever was. Nvidia has always used SIMD architecture anyway, but they have not used 5 ALU wide VLIW shader processors. That's the biggest lie AMD has ever made, they have really only 160 SPs on the RV770. That's the "problem" in Ati cards, the effective Gflops on the HD4870 ranges between 1200 and 240 Gflops single precision because of that, depending on how many ALUs-per-SP can be used in a certain scenario. In a general computing application you will be closer to the low end and that's why in F@H you can see Nvidia cards topping out Ati cards that are suposed to be much faster.

You're trying to compare the SPs to x86 cores, and the two are obviously not compatible...If you do indeed want to do so, you must at least say that those 800 "cores" consist of 160 phisical and 640 logical ones. And that would still be wrong because you don't have a dedicated pipeline that has to be filled for a second or third tread to be inserted...You can still run 800 "threads" on them as long as your software is coded properly.

It's not Ati's fault the F@H team can't put their thinking caps on and write a half-decent client program...
 
Last edited:

wolf

Better Than Native
Joined
May 7, 2007
Messages
8,247 (1.28/day)
System Name MightyX
Processor Ryzen 9800X3D
Motherboard Gigabyte X650I AX
Cooling Scythe Fuma 2
Memory 32GB DDR5 6000 CL30
Video Card(s) Asus TUF RTX3080 Deshrouded
Storage WD Black SN850X 2TB
Display(s) LG 42C2 4K OLED
Case Coolermaster NR200P
Audio Device(s) LG SN5Y / Focal Clear
Power Supply Corsair SF750 Platinum
Mouse Corsair Dark Core RBG Pro SE
Keyboard Glorious GMMK Compact w/pudding
VR HMD Meta Quest 3
Software case populated with Artic P12's
Benchmark Scores 4k120 OLED Gsync bliss
Yum Yum Fermi, if this is going to be the length of the GeForce card, watch out ATi, 13.5 inches of Dual GPU to to toe to toe with this slim baby.
 

vaiopup

New Member
Joined
Feb 3, 2008
Messages
840 (0.14/day)
Location
England
System Name Crunching and smapping!
Processor i7 920@3.8
Motherboard Gigabyte EX58-UD3r
Cooling True
Memory 6gb Gskill pc12800
Video Card(s) Crossfired 5770's
Storage 1tb Samsung Spinpoint
Display(s) 50" Panny Viera 1080p
Case Antec 200 I think
Audio Device(s) Audio via Logitech 680's or 580's....or something
Power Supply Corsair 850 TX
Software Win 7 64
Benchmark Scores Just a few coffee stains!!!
It's not Ati's fault the F@H team can't put their thinking caps on and write a half-decent client program...

They've had long enough :D
 

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.
You're trying to compare the SPs to x86 cores, and the two are obviously not compatible...If you do indeed want to do so, you must at least say that those 800 "cores" consist of 160 phisical and 640 logical ones. And that would still be wrong because you don't have a dedicated pipeline that has to be filled for a second or third tread to be inserted...You can still run 800 "threads" on them as long as your software is coded properly.

It's not Ati's fault the F@H team can't put their thinking caps on and write a half-decent client program...

Nope that's the case. There's only 160 pipelines, so you can have 160 threads feeding those 800 "cores" as long as the program can pack them together in an VLIW instruction, but it's not exactly the same and requires a lot of anticipation, not always posible. In fact almost never posible.

I'm not compating the SPs to x86 cores in any way, I don't know how did you come up to that conclusion.

Because of the VLIW nature of the SPs you could potentially make an engine that only works with 5 wide VLIW instructions and then you could potentially fill all the "cores", but that engine would not work on Nvidia cards or pre R600 Ati cards, not to mention it would not be profitable to do so and DirectX has no such functionality so you would have to make your engine entirely on HLSL. Still filling the 5 ALUs with something relevant to do would be very very difficult.

http://perspectives.mvdirona.com/2009/03/18/HeterogeneousComputingUsingGPGPUsAMDATIRV770.aspx

Unlike NVidia’s design which executes 1 instruction per thread, each SP on the RV770 executes packed 5-wide VLIW-style instructions. For graphics and visualization workloads, floating point intensity is high enough to average about 4.2 useful operations per cycle. On dense data parallel operations (ex. dense matrix multiply), all 5 ALUs can easily be used.

From this information, we can see that when people are talking about 800 “shader cores” or “threads” or “streaming processors”, they are actually referring to the 10*16*5 = 800 xyzwt ALUs. This can be confusing, because there are really only 160 simultaneous instruction pipelines.

On general computing you will not see that typical usage of 4.2 and will be closer to 1 most times than not and hence the real Gflops on the Ati cards with this design is 1/5th or 2/5th of the peak throughoutput.

Also when a special function must be calculated you loose one of those ALUs (the fat one) for many clocks (probably you loose the entire SP), whereas the Nvidia card can do both the SF and the ALU operation and this is not the famous dual-issue, it can always be done as long as the SF function and the thread being executed in the ALUs were issued in a different clock.
 
Last edited:
Joined
Apr 21, 2008
Messages
5,250 (0.86/day)
Location
IRAQ-Baghdad
System Name MASTER
Processor Core i7 3930k run at 4.4ghz
Motherboard Asus Rampage IV extreme
Cooling Corsair H100i
Memory 4x4G kingston hyperx beast 2400mhz
Video Card(s) 2X EVGA GTX680
Storage 2X Crusial M4 256g raid0, 1TbWD g, 2x500 WD B
Display(s) Samsung 27' 1080P LED 3D monitior 2ms
Case CoolerMaster Chosmos II
Audio Device(s) Creative sound blaster X-FI Titanum champion,Creative speakers 7.1 T7900
Power Supply Corsair 1200i, Logitch G500 Mouse, headset Corsair vengeance 1500
Software Win7 64bit Ultimate
Benchmark Scores 3d mark 2011: testing
Joined
May 4, 2009
Messages
1,972 (0.35/day)
Location
Bulgaria
System Name penguin
Processor R7 5700G
Motherboard Asrock B450M Pro4
Cooling Some CM tower cooler that will fit my case
Memory 4 x 8GB Kingston HyperX Fury 2666MHz
Video Card(s) IGP
Storage ADATA SU800 512GB
Display(s) 27' LG
Case Zalman
Audio Device(s) stock
Power Supply Seasonic SS-620GM
Software win10
Nope that's the case. There's only 160 pipelines, so you can have 160 threads feeding those 800 "cores" as long as the program can pack them together in an VLIW instruction, but it's not exactly the same and requires a lot of anticipation, not always posible. In fact almost never posible.

I'm not compating the SPs to x86 cores in any way, I don't know how did you come up to that conclusion.

Because of the VLIW nature of the SPs you could potentially make an engine that only works with 5 wide VLIW instructions and then you could potentially fill all the "cores", but that engine would not work on Nvidia cards or pre R600 Ati cards, not to mention it would not be profitable to do so and DirectX has no such functionality so you would have to make your engine entirely on HLSL. Still filling the 5 ALUs with something relevant to do would be very very difficult.

http://perspectives.mvdirona.com/2009/03/18/HeterogeneousComputingUsingGPGPUsAMDATIRV770.aspx





On general computing you will not see that typical usage of 4.2 and will be closer to 1 most times than not and hence the real Gflops on the Ati cards with this design is 1/5th or 2/5th of the peak throughoutput.

Also when a special function must be calculated you loose one of those ALUs (the fat one) for many clocks (probably you loose the entire SP), whereas the Nvidia card can do both the SF and the ALU operation and this is not the famous dual-issue, it can always be done as long as the SF function and the thread being executed in the ALUs were issued in a different clock.

For graphics and visualization workloads, floating point intensity is high enough to average about 4.2 useful operations per cycle

I don't want to derail the toppic, but on general computing, you will see the benefit if you're using the simpler single precision calculations, you may not see the benefit if you are using double tho.

Also, both NVidia and AMD use symmetric single issue streaming multiprocessor architectures, so branches are handled very differently from CPUs.
You were right here tho. There is a single pipeline but it doesn't hve to be flooded for a second thread to be loaded! :)

That was a really insightful article, thanks. Still what I was trying to say was that if there's a will, there's always a way. As you said it yourself, you need to code specifically for ati's architecture and that could mean a seperate executable. I'm not saying that game companies should invest their own time and money to code a game specifically for ati users, no they shoudn't. If ati wants to have better support for their cards, they should sponsor game manufacturers just like nvidia does. Still there is nothing stopping a non-profit organisation like F@H to actually try and use all that computing power avaible to them...
 
Last edited:

Benetanegia

New Member
Joined
Sep 11, 2009
Messages
2,680 (0.48/day)
Location
Reaching your left retina.


I don't want to derail the toppic, but on general computing, you will see the benefit if you're using the simpler single precision calculations, you may not see the benefit if you are using double tho.

That depends entirely on how linear* the code is. On graphics you can always use most of the shaders, because data is parallel enough as well as istruction type is parallel enough. In general computing is quite the oposite and although the chip might be able to run all that code in parallel in theory, aka there's no physical limitation onto it, there is a limitation in the code itself, and not because of the lack of optimization, but because of the nature of the code, because of the self dependencies. A lot has been discussed on the CPUs about this too, that the programers are lazy in not implemeting their code for multi-cores, but reality is that a lot of code simply can't be split into many threads.

A lot can be said about a bus with 50 seats being a more efficient and powerful way of transportation than a mini-bus with 12 seats, but if your working flow is go to town A -> take up 10 people -> go to B -> 10 people down/another 10 up -> go to C -> 10 leave/10 up and so on, your 50 seat bus is much less efficient than the mini-bus, and there's very little you can do about that. And there's very little the passengers (=software) can do on their front too.

* I'm talking about the ILP (Instruction Level Parallelism) and TLP (Thread level) both at the same time. Ati architecture needs both to be effective (because SIMD+VLIW) and that's a luxury you will not find in general computing quite often.
 
Last edited:
Top