Monday, November 22nd 2010
Cayman Confirmed To Be Using VLIW4 SP Arrangement, Redesigned ROPs
With the introduction of AMD's Radeon HD 6000 series GPUs, we were made to expect a massive architectural change in the way AMD arranges its unified shaders. That, however, didn't happen with the Radeon HD 6800 series based on the 40 nm "Barts" GPU, which continued to maintain the VLIW5 configuration (comprising of SIMD units with 4 simple and 1 complex stream processing units). A recent presentation leaked to the internet reveals that the much talked about architectural change was saved for Cayman, the company's upcoming high-end GPU, on which will be based Radeon HD 6900 series graphics cards.
In VLIW4 architecture, equipotent stream processing units are arranged in groups of four along with general purpose registers. Although the four have equal capabilities, two out of four of these (occupying 3 and 4 issue slots) are assigned with some special functions. AMD looks to be conservative with the benefits of the new SIMD architecture. It claims that VLIW4 gives similar computational power as VLIW5, with 10% reduction in die area. It also simplifies scheduling.The presentation also provided a glimpse of the overall architecture schematic of Cayman, which reveals a greater level of parallelization compared to Cypress (Radeon HD 5800 series, 5970). While Barts was a step up from Cypress architecture in assigning individual dispatch processors for each of the two SIMD Engine blocks (read further here), Cayman looks to take that a step further with two graphics processing engines (GPEs), and assigning each to an SIMD engine block. That effectively means that there are two physical tessellation units on Cayman. Barts, while using a single tessellation unit, improved its efficiency to increase tessellation performance by up to 2x compared to previous generation (or so claimed AMD). With Cayman having two of these, it could mean a tessellation performance increase by 3-4x compared to previous generation.Cayman also features reworked render backends consisting of 128 Z/Stencil ROPs, and 32 color ROPs, with up to 2x faster 16-bit integer operations and 2-4x faster 32-bit floating point operations.
Source:
NGOHQ
In VLIW4 architecture, equipotent stream processing units are arranged in groups of four along with general purpose registers. Although the four have equal capabilities, two out of four of these (occupying 3 and 4 issue slots) are assigned with some special functions. AMD looks to be conservative with the benefits of the new SIMD architecture. It claims that VLIW4 gives similar computational power as VLIW5, with 10% reduction in die area. It also simplifies scheduling.The presentation also provided a glimpse of the overall architecture schematic of Cayman, which reveals a greater level of parallelization compared to Cypress (Radeon HD 5800 series, 5970). While Barts was a step up from Cypress architecture in assigning individual dispatch processors for each of the two SIMD Engine blocks (read further here), Cayman looks to take that a step further with two graphics processing engines (GPEs), and assigning each to an SIMD engine block. That effectively means that there are two physical tessellation units on Cayman. Barts, while using a single tessellation unit, improved its efficiency to increase tessellation performance by up to 2x compared to previous generation (or so claimed AMD). With Cayman having two of these, it could mean a tessellation performance increase by 3-4x compared to previous generation.Cayman also features reworked render backends consisting of 128 Z/Stencil ROPs, and 32 color ROPs, with up to 2x faster 16-bit integer operations and 2-4x faster 32-bit floating point operations.
54 Comments on Cayman Confirmed To Be Using VLIW4 SP Arrangement, Redesigned ROPs
Some additional pics:
And new type of AA. This is interesting:
There better be a review with crossfire too :mad:
It's basically like an advanced turbo core technology and the principle is pretty straight forward: If you can change frequency fast enough and if your workload is varied, there are certain moments where you can raise frequency for "free" while at the same time still be under the TDP limit.
The only problem here would of course be uneven performance spikes you would get, so that's probably a bad idea for games, where you want steady fps, but a great idea for HPC programs where you aim for maximum efficiency...
I intend to sell my 6870's and buy a 6970 so i can get back to using a single card/chip, even before knowing the performance of it the 6970 is the only real option for me as i want a single card/chip to run 3 monitors and the fact that a 6870 is almost enough it just needs to be slightly faster with more memory for my needs..
I want an nvidia gpu for my htpc just so i can do a useful amount of folding.
hopefully 6870s in CFX do respectably vs the price they cost
Also after a quick google there is some sites showing 6870 crossfire vs 580 reviews and the 6870 crossfire beats the 580 in multiple games, not in every game but enough to say that 6870 crossfire is around the performance of a 580 and going by prices form scan (I'm on the uk) it's cheaper £365 for 6870 crossfire and £400 for a 580 so I'm pretty sure you will be happy with your purchase.
if they success with this old weak points it will be great, but it's still one more thing they should think about it which is physics
As resolution increases the need for AA decreases, and the form in which AA is performed needs to evolve, no more overdrawing, cause at anything above 1680 it becomes too burdensome to perform with any sort of performance.