not until i get his 5850 first
then i'll trade my 9600gt to him for GT240 for physx
i don't see there's any point adding ridiculous number of shader on exist 40nm fab..based on my previous calculation if cayman is double of barts even except rops/bus increase as you were mention it will turn out to be like below if the spec is 2560:128:32 and 256bit bus
shader die space in cypress is 60% and 4D shader is 80% of 5D shader in size and SIMD controller and TMU took about 15% then here will be 2(334x 0.6 x0.8)+2(334x0.15)+334x0.25 = 320.64 + 100.2 +83.5 = 504.34mm^2 + hard wiring = 510mm^2
that is huge die and such 510mm^2 only has 32 rops????and i don't see any reason why we'd need 640ALU for? folding@home?
and you expect a 510mm^2 chip using a narrow 256bit bus on it?
if the shader turn out to be 5120(1280ALU) then the die size will be:
4(334x 0.6 x0.8)+4(334x0.15)+334x0.25 = 641.28 + 200.4 + 83.5 = 925.18mm^2 + hard wiring = 940mm^2......
shader like this are pointless if you don't have more rops to push it. like g92 was bottleneck by its 16 rop while it had 128 ALU. and now cayman that has 1280 ALU but 32 rops....that is a big joke...
if the specification turn out to be 1920:96:64 512bit story will be vastly different from above
1.5(334x0.6x0.8)+1.5(334x0.15)+2(334x0.25) = 240.48 + 75.15 + 167 = 482.64mm^2 + hard wiring = 484mm^2
480ALU is what we need in existed 40nm..no go further....
First of all, I read an interview with an AMD engineer in which he stated that the shaders of Cypress take up 80% of the Cypress die. This was within the context of discussing SIMD pipelines, so he might have meant SIMD pipelines, which would be shaders plus TMUs plus SIMD logic, but even your 60% for shaders plus 15% for TMUs and SIMD logic do not add up to the 80% stated by this engineer. Where do you get your figures.
Second, while it is common to quote 1600 for the number of shaders in Cypress, Cypress actually has 1600 ALUs organized as 320 shaders, that are arranged in 20 SIMD pipelines having 16 shaders and 4 TMUs each. Each shader has 4 simple ALUs and 1 complex ALU. Barts/Cayman is supposed to have 4 moderate complexity ALUs per shader.
Barts/Cayman are not derivatives of Juniper or Cypress. They were designed in parallel with Evergreen by the team(s) that designed RV7xx, including RV740. The engineer that was interviewed stated that the 4 ALU per shader design of Northern Islands took up slightly less space per shader than the 4+1 ALU design of Cypress while delivering between 1.5x to 1.8x the performance per shader of Cypress. The engineer might have meant 1.5x to 1.8x the performance per ALU, deliberately using the wrong term to make things clearer to the interviewer that often mentioned the 1600 shaders of Cypress.
The Radeon HD 5830 has the same number of ROPs and memory controllers as the Radeon HD 4870/4890, and falls between the two of them in average performance despite having 1.4x the number of SIMD pipelines. Chances are that it is not the performance of the individual shaders/TMUs that is crippling Cypress, but the SIMD control logic. My guess is that the NI design team went with a 4 moderate complexity ALU design for NI to simplify the control logic, thus enabling them to achieve at least the per shader performance of RV770 while implementing double precision floating point, as well as the DX11 features. Just getting NI to RV770 level per ALU performance would have given NI 12% higher performance per shader than Cypress. And it is possible that other improvements, including higher utilization of the ALUs due to fewer of them per shader and the number of ALUs per shader being a power of two, increased performance per shader to within 95% of the 4+1 ALU shaders. Thus the 1.5x to 1.8x figure quoted.
My guess is that, since the small die size strategy was well established at the time NI was being designed, and 32nm allows for just a bit over 56% more transistors per mm2 versus 40nm, and the 4 ALU shader design is only slightly smaller than the 4+1 ALU shader design, Turks was to be 1.6x Redwood, Barts 1.6x Juniper, and Cayman 1.6x Cypress with regards to shaders/SIMD pipelines. This would make Turks 128 shaders(512 ALUs), Barts 256 shaders (1024 ALUs), and Cayman 512 shaders (2048 ALUs). When 40nm was cancelled, only Cayman had to be cut down, and this was only to keep the TDP within the limits of what was needed to produce a dual GPU "Cayman".
Bus width is primarily a function of die size, and since Barts would have had about the same die size as Juniper at 32nm, Barts would have started with a 128-bit bus. But with Barts having over 50% more core performance than Juniper, there would have been a push towards either increasing the number of ROPs per memory controller by at least 50% or increasing the memory width by 50%. If they went with the memory width solution, Barts would have had a 192-bit wide bus at 32nm. Cayman was probably not large enough for a 384-bit memory bus at 32nm, so my guess is that the number of ROPs per memory controller was increased.
If indeed the Radeon HD 2900 GT had 12 ROPs (persumably 16 total with 4 disabled) it is Cayman might have had 12 ROPs per memory controller at 32nm. Well actually 16 ROPs per memory controller organized as four clusters of 4 ROPs each, with one ROP cluster per memory controller serving as a spare. I estimate that, at the time the GTX 480 was introduced, approximately 14% of all Radeon HD 5850/5870 yield was being lost to defective ROP clusters. At the time the Radeon HD 5830 was introduced this yield loss to defective ROP clusters would have been higher, thus the need to salvage a part with one ROP cluster per memory controller disabled. ATI probably anticipated similar yield problems at 32nm, and at least wanted one spare ROP cluster per memory controller available to improve yields, so the design could have been three ROP clusters per memory contoller with the third serving only as a spare, but more likely, with the need for 50% higher ROP performance to match the 50% higher core performance, ROP clusters per memory controller were doubled, with the fourth ROP cluster per memory controller serving as a spare.
With 32nm being cancelled and NI reimplemented at 40nm, die size grew, and there was increased perimeter on which to implement edge pads, enabling Barts to grow from 192-bits to 256-bits, and perhaps Cayman can now be 384-bit instead of 256-bit. If not however, I do expect Cayman to have at least 50% more ROPs per memory controller.