Monday, November 22nd 2010
Cayman Confirmed To Be Using VLIW4 SP Arrangement, Redesigned ROPs
With the introduction of AMD's Radeon HD 6000 series GPUs, we were made to expect a massive architectural change in the way AMD arranges its unified shaders. That, however, didn't happen with the Radeon HD 6800 series based on the 40 nm "Barts" GPU, which continued to maintain the VLIW5 configuration (comprising of SIMD units with 4 simple and 1 complex stream processing units). A recent presentation leaked to the internet reveals that the much talked about architectural change was saved for Cayman, the company's upcoming high-end GPU, on which will be based Radeon HD 6900 series graphics cards.
In VLIW4 architecture, equipotent stream processing units are arranged in groups of four along with general purpose registers. Although the four have equal capabilities, two out of four of these (occupying 3 and 4 issue slots) are assigned with some special functions. AMD looks to be conservative with the benefits of the new SIMD architecture. It claims that VLIW4 gives similar computational power as VLIW5, with 10% reduction in die area. It also simplifies scheduling.The presentation also provided a glimpse of the overall architecture schematic of Cayman, which reveals a greater level of parallelization compared to Cypress (Radeon HD 5800 series, 5970). While Barts was a step up from Cypress architecture in assigning individual dispatch processors for each of the two SIMD Engine blocks (read further here), Cayman looks to take that a step further with two graphics processing engines (GPEs), and assigning each to an SIMD engine block. That effectively means that there are two physical tessellation units on Cayman. Barts, while using a single tessellation unit, improved its efficiency to increase tessellation performance by up to 2x compared to previous generation (or so claimed AMD). With Cayman having two of these, it could mean a tessellation performance increase by 3-4x compared to previous generation.Cayman also features reworked render backends consisting of 128 Z/Stencil ROPs, and 32 color ROPs, with up to 2x faster 16-bit integer operations and 2-4x faster 32-bit floating point operations.
Source:
NGOHQ
In VLIW4 architecture, equipotent stream processing units are arranged in groups of four along with general purpose registers. Although the four have equal capabilities, two out of four of these (occupying 3 and 4 issue slots) are assigned with some special functions. AMD looks to be conservative with the benefits of the new SIMD architecture. It claims that VLIW4 gives similar computational power as VLIW5, with 10% reduction in die area. It also simplifies scheduling.The presentation also provided a glimpse of the overall architecture schematic of Cayman, which reveals a greater level of parallelization compared to Cypress (Radeon HD 5800 series, 5970). While Barts was a step up from Cypress architecture in assigning individual dispatch processors for each of the two SIMD Engine blocks (read further here), Cayman looks to take that a step further with two graphics processing engines (GPEs), and assigning each to an SIMD engine block. That effectively means that there are two physical tessellation units on Cayman. Barts, while using a single tessellation unit, improved its efficiency to increase tessellation performance by up to 2x compared to previous generation (or so claimed AMD). With Cayman having two of these, it could mean a tessellation performance increase by 3-4x compared to previous generation.Cayman also features reworked render backends consisting of 128 Z/Stencil ROPs, and 32 color ROPs, with up to 2x faster 16-bit integer operations and 2-4x faster 32-bit floating point operations.
54 Comments on Cayman Confirmed To Be Using VLIW4 SP Arrangement, Redesigned ROPs
low shader numbers people said that the 6800 shaders are new design
well in here it says 6800 shaders are not new design
Looks like we're finally gonna see some head-on competition from the big boys and possibly a price war. Bring it on!
6800 = x2 ; 6900 = x4
!! PLUS the decent overall benefit of THE NEW architecture !!
InGame the 6870 ist a bit slower than the 5870, does that mean
the 6970 will be a bit slower than the 5970 ??
I know it's laymens terms and is undercomplicating what is a complicated architecture, but if that's true it is going to stomp all over the 580, even if it kept the same or last over shaders as the 5870, so the 1536 shaders/96 tmu number could be correct and this would still be loads faster than the 5870. Though peak frames may be similar, minimum and thus average go up making for a better gaming experience all around.
AMDs aforementioned TDP limiter will play a big role here tho. It will be a make or break feature.
I really hope this isn't another 2900 with epic specs and disappointing results :laugh:
Cos I'm excited to see AMD potentially competing and maybe even having the top end spot for once.
I mean they could pull an extra 12% performance out of their arse ( for lack of a better phrase) Just from going from 800 to 900 core.
Would the increased triangle per clock mean improved benefits for tessellation with over-clocking? ( I.E is it like overclocking gddr5? you get 4x the powa!, only with this 2x the powa! )
When do we see the actual cards again?