Monday, September 27th 2010
AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces
Here is the slide we've been waiting for, the specs sheet of AMD's next-generation Radeon HD 6700 series GPUs, based on a new, radically redesigned core, codenamed "Barts". The XT variant denotes Radeon HD 6770, and Pro denotes HD 6750. AMD claims that the HD 6700 series will pack "Twice the Horsepower", over previous generation HD 5700 series. Compared to the "Juniper" die that went into making the Radeon HD 5700 series, Barts features twice the memory bandwidth thanks to its 256-bit wide high-speed memory interface, key components such as the SIMD arrays split into two blocks (like on Cypress), and we're now getting to learn that it uses a more efficient 4-D stream processor design. There are 1280 stream processors available to the HD 6770 (Barts XT), and 1120 stream processors to the HD 6750 (Barts Pro). Both SKUs use the full 256-bit memory bus width.
The most interesting specification here is the shader compute power. Barts XT churns out 2.3 TFLOP/s with 1280 stream processors, GPU clocked at 900 MHz, while the Radeon HD 5870 manages 2.72 TFLOP/s with 1600 stream processors, 850 MHz. So indeed the redesigned SIMD core is working its magic. Z/Stencil performance also shot up more than 100% over the Radeon HD 5700 series. Both the HD 6770 and HD 6750 will be equipped with 5 GT/s memory chips, at least on the reference-design cards, which are technically capable of running at 1250 MHz (5 GHz effective), though are clocked at 1050 MHz (4.20 GHz effective) on HD 6770, and 1000 MHz (4 GHz effective) on HD 6750. Although these design changes will inevitably result in a larger die compared to Juniper, it could still be smaller than Cypress, and hence, more energy-efficient.
Source:
PCinLife
The most interesting specification here is the shader compute power. Barts XT churns out 2.3 TFLOP/s with 1280 stream processors, GPU clocked at 900 MHz, while the Radeon HD 5870 manages 2.72 TFLOP/s with 1600 stream processors, 850 MHz. So indeed the redesigned SIMD core is working its magic. Z/Stencil performance also shot up more than 100% over the Radeon HD 5700 series. Both the HD 6770 and HD 6750 will be equipped with 5 GT/s memory chips, at least on the reference-design cards, which are technically capable of running at 1250 MHz (5 GHz effective), though are clocked at 1050 MHz (4.20 GHz effective) on HD 6770, and 1000 MHz (4 GHz effective) on HD 6750. Although these design changes will inevitably result in a larger die compared to Juniper, it could still be smaller than Cypress, and hence, more energy-efficient.
245 Comments on AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces
This is the last AMD card I will have until they put out some decent drivers. The last good one was 10.4a. The last good one before that? 8.10
The hardware is great, software is garbage.
the rumor of 640ALU with 32 rops and narrow 256bit bus will just make it sound stupid if everything double up except rops/bus since the original r600 design is already unbalanced. if they haven't learn from what happened on r770 then they are pretty much hopeless...which i doubt amd are smart company if they done such non profit plan.
this is the fact that adding rops/bus are more profitable than adding ALU number what is most profitable if you can't shrink the die size because of put too many ALU on it
hard fact, a 480:96:64 with 512bit will make a 640:128:32, 256bit like shit in term of die space/power consumption/performance.
16 rops over a 512bit bus : ]
64/256 is possible I'm sure of it.
Crap example I know but I just had to!
i sure hope AMD don't go the nv route with changing naming conventions so early :(
"Developing (New) technology is simply offering more (performance) ....... for less (production costs).... to increase profit (margins)". Where retail costs increase because of that development is usually down to one of two factors...... they didn't quite get it right or simply just greed :laugh:
It is the Radeon HD 4730 versus the Radeon HD 4830 that demonstrates just what impact changing the number of ROPs can have. They were virtually identicle in specs, except the Radeon HD 4730 had half the ROPs and was clocked at 750MHz versus 575 MHz for the Radeon HD 4830. On average the Radeon HD 4830 beat the Radeon HD 4730 by a small margin. According to my estimate, the Radeon HD 5830 could be clocked at about 700 MHz and provide equivalent performance if it had its full complement of ROPs.
Assuming Barts is as described, and provides the same per shader performance as Cypress, it should provide about a 4.3% increase in performance over Cypress LE, despite being clocked 9.375% lower. This would be due to Barts Pro having 81.25% more ROP performance than Cypress LE. And Barts XT should provide about a 18.685% increase over Cypress Pro despite having just a 10.345% increase in shader performance, because of a 24.138% increase in ROP performance.
With these assumptions, doubling the ROPs would increase performance of Barts XT by about 10.25%. I don't think doubling the ROPs would be cost effective with regards to die size, but if they could, increasing the ROPs per memory controller by 50% probably would.
Lets see if the ROP ratio does increase in relation to bus size on the new 6XXX series, if it does your sure to be right! :) and if it don't you will probably just say that they chose not too.
while 64 rop's and 512bit memory are a little ridculous cost wise, the idea of 384-bit and 48 rop's isn't imo. soo... running down that line.
spec------barts xt------caymen xt
rop's------32-----------48
memory---256-bit-------384bit
shaders---1280---------1920
tmus------64-----------96
Additionally your comparision therefore between "Barts" and "Caymen" is little more than the comparison between the 5850 and the 5870 surely, less bus size and therefore ROP count? The typical performance differences within the market often can be attained (lets say 15% between 2 models) without having to increase bus size and/or ROP count as Cypress has shown.
Although this does not deal with any limitations between bus width and ROP count that we have mentioned, it does explain very well how segments with the same bus sizes and ROP count can differ a fair bit in performance through other means, I know the link is from semi accurate but this piece is not about speculation but actually makes comparisions with actual hardware and its architecture, if you scroll down to the chart about the 9600 and 9800 and read until the end of the page, it is quite interesting.....
www.semiaccurate.com/2010/09/20/northern-islands-barts/
I stand to be corrected in what follows, as I'm in no way an expert, but it's what I understand from the things I do know or have heard about. Let's explain it with an example, and let's take the HD5870 numbers from the chart in the OP.
Memory bandwidth: 153.6 GB/s == 1228.8 Gb/s
Pixel fillrate: 27.2 GPixel/s
Z/stencil: 108.8 GSamples/s
Now for stencil the most common used value is one byte per pixel, while Z sample in modern games is either 24 bit or 32 bit, because 16 bit creates artifacts.
Thus the average bit-lenght of samples is going to be between 8 and 24/32, let's settle down to 16 bit samples. Simple math from the specs tells that 108.8 Gsamples x 16 bit samples = 1740,8 Gb/s.
As you can see the required bandwidth to write z/stencil only scenarios already exceeds memory bandwidth limitations and it's worse in the cases when it's doing Z test. Of course the ROPs also have to write down pixels so I understand that is less taxing and makes up for the difference, because typical HDR pixels are 16 bit wide (per channel), so 27.2 GPixel/s x 16 bit* = 435.2 Gb/s and output of current games is 32 bit so 870.4 Gb/s.
* Here I have to admit I don't know if the pixels are blended and written separately by channels or alltogether. In case of the latter, the figure jumps to 1740.8 Gb/s (64 bit x 27.2 GPixel/s) again, and may actually reflect better the relation as the average of both 32 bit and 64 bit outputs is 1305.6 Gb/s, quite similar to the actual memory bandwidth.
As you might have guessed already doubling (even increasing) the ROPs is not going to yield any substantial gains, even with the increased 25% GDDR5 speed of 7 GT/s modules, especially considering that above numbers are only for write operations (and not all of them) and you still have to take into account read operations.
That being said, the above is just talking about theoretical throughouput and the effective balance. On practice, I think that 32 ROPs are more than enough for the kind of performance we can expect from Cayman and putting 64 would be a waste of die area for little or no gain (something I could see Nvidia doing** but not AMD). 48 would be ideal I guess, but I don't think AMD is willing to use odd numbers, or they would have done it in the past, with crippled 256 bit parts instead of making them 128 bit...
** This is another story, but the reason Nvidia "wastes" die area on 384 bit / 48 ROPs is because they are critical in the proffesional Quadro/Tesla cards, not because it poses any dramatical improvement or necessity on the desktop cards.
PS: You quoted my deleted post lol, you will see I say the rop count is the same) I was answering 2 different threads at the same time and messed on up..... I am too old to multi task these days!
For example, I know that 90% of my issues are either related to Crossfire, Eyefinity, or both. According to your system specs, you have neither, so would probably never see any of the issues I have.
And because of these issues, I will be focusing entirely on how the 6-series behaves under similar conditions.
And this is important...specifically when dealing with Eyefinity...AMD has lauded how they chose a hardware solution for Multi-monitor...yet, handing cursor off from one monitor to the next, often corrupts the cursor.
The cursor issue has been around since day one, and AMD has said that they fixed it, it's a known issues, etc...with a driver. I'm not too sure that a driver can really fix a hardware problem, but AMD seems pretty confident, even though it's been a year without any real fix.
Until AMD starts being honest about issues like this(need i mentopn my cards overheat due to the fan not spinning up correctly, due to the driver?), and there are some real legitimate claims to AMD's drivers steadly declining in quality.
Better yet, guess how I can avoid the cursor corruption? Two ways...either use a single monitor...or not use the DisplayPort connector...
Granted, maybe I just got some bad cards. I'll be mailing yet another one away for RMA later today, and hopefully that might sort it...time will tell.
so the gap has to be larger between the two to make sense in pricing and market positioning.
second overclock a 5850 to 5870's clocks and it'll bench just a hair lower. overclock a 5850 past a 5870 and it'll bench higher. so while shaders do help, there's plenty of them on all modern gpu's. This is exactly why far more 5850's sold than 5870s, the prformance was similar but the prices were not.
plus with the swapout from 4 simple + 1 complex to 4 moderately complex we're likly going to see more frames per shader out of the 6k series. So if we're talking the same rop's and more shaders it's unlikely that caymen would be that much better than barts. after all the chart shows barts at 1280 medium complexity shaders that should be a stark contrast with the 320 complex and 1280 simple on cypress xt.
if you take a look at 5770 vs 5830 where both have 16 rop's, clocks are close with the exception of memory clock and the memory bit is different, but the main difference is 800 shaders vs 1120 shaders (40% more) the difference averages to 13% in W1z's reviews. Now while I feel 256bit vs 128 bit accounts for at least a couple of those frames It's more than easy enough to make up that amount with overclocking.
so if caymen is only increasing shaders by 50% and tmu's while keeping the same rop's, the performance won't be as scalable as the 5770 to 5870 and we'll have a 6770 capable of taking sales away from the 6870 not just in price/performance but performance in general.
imo it would be a bad bad move when they have the chance to repeat the success of the 5xxx series.
Those cards serve as the basis for how bad my current cards actually are...4-series shows AMD can do better. Wonderful gen for AMD, that one...effective, and CHEAP. On the other hand, they also serve as the basis for my interest in 6-series..I hope it's another 4-series.