AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces

cadaveca · Sep 29, 2010

TheMailMan78 said:
Its just you have bad cards. I ran crossfire with 4850s for a very long time without issue.

Not like you'd know, running a single card. :laugh:

I had few problems too, with 4850, 4870, or 4890...I actually kinda miss those cards... :cry:

Alas, they don't support Eyefinity.

Those cards serve as the basis for how bad my current cards actually are...4-series shows AMD can do better. Wonderful gen for AMD, that one...effective, and CHEAP. On the other hand, they also serve as the basis for my interest in 6-series..I hope it's another 4-series.

Tatty_Two · Sep 29, 2010

yogurt_21 said:
well 1 5850 vs 5870 yet you used it as a reason why the 6770 and 6870 would have the same number of rop's. If you're going to use the cypress you have to incorporate juniper as a comparison for barts to caymen, not cypress pro vs cypress xt. again were talking mid range to highend not lower highend to higher highend.

so the gap has to be larger between the two to make sense in pricing and market positioning.

second overclock a 5850 to 5870's clocks and it'll bench just a hair lower. overclock a 5850 past a 5870 and it'll bench higher. so while shaders do help, there's plenty of them on all modern gpu's. This is exactly why far more 5850's sold than 5870s, the prformance was similar but the prices were not.

plus with the swapout from 4 simple + 1 complex to 4 moderately complex we're likly going to see more frames per shader out of the 6k series. So if we're talking the same rop's and more shaders it's unlikely that caymen would be that much better than barts. after all the chart shows barts at 1280 medium complexity shaders that should be a stark contrast with the 320 complex and 1280 simple on cypress xt.

if you take a look at 5770 vs 5830 where both have 16 rop's, clocks are close with the exception of memory clock and the memory bit is different, but the main difference is 800 shaders vs 1120 shaders (40% more) the difference averages to 13% in W1z's reviews. Now while I feel 256bit vs 128 bit accounts for at least a couple of those frames It's more than easy enough to make up that amount with overclocking.

so if caymen is only increasing shaders by 50% and tmu's while keeping the same rop's, the performance won't be as scalable as the 5770 to 5870 and we'll have a 6770 capable of taking sales away from the 6870 not just in price/performance but performance in general.

imo it would be a bad bad move when they have the chance to repeat the success of the 5xxx series.

Lol, I didnt use the comparision as a "reason" they should be compared to bart etc, I used it because in your previous post you said that you found it difficult to belive that mid and high end cards would have the same ROP count and my example clearly shows that is not always the case because the 5850 and 5870 do. All that you have said does not change the fact that currently, in order for the ROP count to be increased, the memory bus must also be increased, so unless you are sure that we will see some 512bit bus versions then which ever way you want a look at it, you are going to pay a huge premium for that, one of the main reasons ATi have been so competative price wise recently is because they have gone for the 256 bus, NVidia's 384bit + bus widths cost more to produce, just in PCB terms alone .
Using the comparison between the 5830 and the 5770, throws up some odd results, as well as what you have mentioned, despite having double the memory bus it has the same ROP count as the 5770 but were you aware, despite it having double the memory bus, the 5830 is actually SLOWER in pixel fill rate than the 5770, now thats for a couple of reasons but my point is Bus and ROP count are just ingredients in the overall performance, people seem to get too hung up on it, you can get to a point where too many ROP's actually strangle performance and show little improvement where other ingredients can give a greater boost.

Now if we do see a 512bit bus..... and I am not saying we won't, then as you have said, there is more potential there, but with that comes a fairly large hike in prices, I have some doubts that AMD want to go down that route personally, although maybe on just the one top end card.......... my point all along has simply been 2 fold.......

1. Currently I beleive there are limitations on ROP count against Memory Bus size, you aint gonna get 64 ROP's on a 256bit wide bus.
2. There are a lot more factors to overall performance than just bus size and ROP count.

Simple as that really.

dalelaroy · Sep 29, 2010

Increased ROPs Without Increased Memory bus

Tatty_One said:
All that you have said does not change the fact that currently, in order for the ROP count to be increased, the memory bus must also be increased,

Note that although both Redwood and Juniper have 128-bit memory buses, Redwood has 8 ROPs versus Juniper's 16 ROPs. It would not violate the pattern for Cayman to have twice the ROPs of Barts without an increase in memory width. It would simply be applying the Evergreen 128-bit pattern, in which Redwood and Juniper are the only families sharing a bus width, to the Northern Islands 256-bit width, with Barts and Cayman being the only families that share the same bus width.

I think it is more likely that Cayman will have 384-bit memory, but I also think that it might take less board real estate to simply double the ROPs per memory controller. As for the bandwidth argument, even with GF104 having less bandwidth than Cypress, it seems to have greater ROP performance. Doubling the ROPs may be overkill, but Cayman needs at least double the ROPs performance of Barts to take on GF100 in those applications where ROPs are the limitation.

meran · Sep 29, 2010

did u see the news on http://guru3d.com/news/radeon-hd-6800-series/http://guru3d.com/news/radeon-hd-6800-series/

:eek:

btarunr · Sep 29, 2010

Nah, no Barts launch on 18th ± 2 days, AFAIK. Also, I'd dismiss that new "we are right, they all were wrong" specs sheet some sites are sharing as "RV770 has 480 stream processors, not 800, as rumors claimed" encore. If Hilbert got those specs from AMD (because that article is written more like stating facts than inquisition), he'd also have an NDA over him.

In no way am I giving credibility to the information we have, but just saying that at this point that specs sheet is not one bit more credible.

bear jesus · Sep 29, 2010

btarunr said:
Nah, no Barts launch on 18th ± 2 days, AFAIK. Also, I'd dismiss that new "we are right, they all were wrong" specs sheet some sites are sharing as "RV770 has 480 stream processors, not 800, as rumors claimed" encore.

I have given up on trying to make sense of all the "information" on all the different tech sites, it's all lies :laugh:

to be honest as it gets closer to release (whenever it may be) it's time to ignore all the "leaks" and just wait for amd to say something official.

meran · Sep 29, 2010

so ,it makes sense to built 2xbarts on one board than one huge chip am i right or :toast:

dalelaroy · Sep 29, 2010

meran said:
so ,it makes sense to built 2xbarts on one board than one huge chip am i right or

Only from the point of view of marketing. Unless....

I still think that Barts will have 1024 shaders, with Barts XT shipping with 960 shaders active. I think yields of Barts XT will be too low to justify completely replacing Cypress Pro with Barts without a defect tolerant design. However yield of defect free Barts GPUs would be adequate for fully functional GPUs to be used in a dual GPU product. Along those same lines of logic, there should be too few Barts GPUs with defective ROPs to justify a mass market product like Cypress LE, but if these GPUs could be salvaged for a dual GPU product.

This could also explain the Radeon HD 6990. If Cayman XT is, like GTX 480, a cut down Cayman, and called the Radeon HD 6870, then if the dual GPU variant uses fully functional GPUs, it would make sense to call it a Radeon HD 6990 to signify it is more than a dual Radeon HD 6870.

cheezburger · Sep 29, 2010

wahdangun said:
are you afraid to bet? lets see who are the winner,

no i'm not afraid of betting , it's just i can't ignore the stupidity that's all. amd is not going to make a 500mm^2 die gpu just to add more ALU and feature 256bit bus and 32 rops, when ading ALU will cost more die space? that is hard fact!

just a question. what do you need so many shader for if your frame rate won't increase from 200 fps to 800 fps... just being feature rich? folding@home is generally garbage for vast high end gamer and "NO ONE WILL BUY A GFX JUST TO RUN FOLDING@HOME TO SAVE THE MANKIND WHILE CAN'T DO SHIT ON FRAME RATE" if human would die then let them all die....simple.

i would personally throw 500 dollars into water than save human race

anway read below post before you start think 32rop, 256bus with ridiculous 2560 shader will hit to the market with such bad scaling design.

shader die space in cypress is 60% and 4D shader is 80% of 5D shader in size and SIMD controller and TMU took about 15% then here will be 2(334x 0.6 x0.8)+2(334x0.15)+334x0.25 = 320.64 + 100.2 +83.5 = 504.34mm^2 + hard wiring = 510mm^2

that is huge die and such 510mm^2 only has 32 rops????and i don't see any reason why we'd need 640ALU for? folding@home?
and you expect a 510mm^2 chip using a narrow 256bit bus on it?

if the shader turn out to be 5120(1280ALU) then the die size will be:

4(334x 0.6 x0.8)+4(334x0.15)+334x0.25 = 641.28 + 200.4 + 83.5 = 925.18mm^2 + hard wiring = 940mm^2......

shader like this are pointless if you don't have more rops to push it. like g92 was bottleneck by its 16 rop while it had 128 ALU. and now cayman that has 1280 ALU but 32 rops....that is a big joke...

if the specification turn out to be 1920:96:64 512bit story will be vastly different from above

1.5(334x0.6x0.8)+1.5(334x0.15)+2(334x0.25) = 240.48 + 75.15 + 167 = 482.64mm^2 + hard wiring = 484mm^2

480ALU is what we need in existed 40nm..no go further....

yogurt_21 said:
so if caymen is only increasing shaders by 50% and tmu's while keeping the same rop's, the performance won't be as scalable as the 5770 to 5870 and we'll have a 6770 capable of taking sales away from the 6870 not just in price/performance but performance in general.

imo it would be a bad bad move when they have the chance to repeat the success of the 5xxx series.

hard fact, however people just don't listen

Tatty_One said:
1. Currently I beleive there are limitations on ROP count against Memory Bus size, you aint gonna get 64 ROP's on a 256bit wide bus.
2. There are a lot more factors to overall performance than just bus size and ROP count.

Simple as that really.

of cause you can not boost up performance by just adding rop/bus. you also can't just add ALU without major increase on rops/bus

wahdangun · Sep 30, 2010

cheezburger said:
no i'm not afraid of betting , it's just i can't ignore the stupidity that's all. amd is not going to make a 500mm^2 die gpu just to add more ALU and feature 256bit bus and 32 rops, when ading ALU will cost more die space? that is hard fact!

just a question. what do you need so many shader for if your frame rate won't increase from 200 fps to 800 fps... just being feature rich? folding@home is generally garbage for vast high end gamer and "NO ONE WILL BUY A GFX JUST TO RUN FOLDING@HOME TO SAVE THE MANKIND WHILE CAN'T DO SHIT ON FRAME RATE" if human would die then let them all die....simple.

i would personally throw 500 dollars into water than save human race

anway read below post before you start think 32rop, 256bus with ridiculous 2560 shader will hit to the market with such bad scaling design.

hard fact, however people just don't listen

of cause you can not boost up performance by just adding rop/bus. you also can't just add ALU without major increase on rops/bus

first of all, i don't give a shit about F@H, and second we don't know for sure, its useless to speculate right now, just look at HD 4870 launch people speculate it will have 480 shader but in the end we get 800 shader more than twice the shader on HD 3870, and btw maybe cayman will just have 20% different on the performance than bart, and if this are big GPU like nvdia, ATI will like to cute the cost and use 256 bit instead, and maybe thats why bart was launched earlier because to wait those high speed GDDR5 ready, just like HD 4850 was launched earlier.

cheezburger · Sep 30, 2010

wahdangun said:
first of all, i don't give a shit about F@H, and second we don't know for sure, its useless to speculate right now, just look at HD 4870 launch people speculate it will have 480 shader but in the end we get 800 shader more than twice the shader on HD 3870, and btw maybe cayman will just have 20% different on the performance than bart, and if this are big GPU like nvdia, ATI will like to cute the cost and use 256 bit instead, and maybe thats why bart was launched earlier because to wait those high speed GDDR5 ready, just like HD 4850 was launched earlier.

you haven't answer my question, why would amd want to make a huge die GPU by adding more ALU/shader if they knew it will cost more by adding more shader? why don;t they just simply optimize their ALU more and adding rops/bus instead?

this is no long speculation, this is fact! we all know shader cost 60% die space in current evergreen design and adding more then twice shader is non sense and make gpu as big as fermi while no frame rate gain and bad scaling is just plenty stupid. you can add more shader on 3870 is because r670 only has die size of 179mm^2 and 282mm^2 in 4870. increase roughly 60% while adding extra 100ALU/24TMU&SIMD cluster.but if we speculate this on cayman it will be 534mm^2 if you design to add more ALU like it did on r770. you fail one thing, if cayman is ONLY 20% gain in performance over barts then why is amd bother to make it out if it's only 20% over a mid range card while having die size of 500mm^2?? a 480:96:64 will have better scaling and frame rate burst over a 1280: (128)64:32.

guess you didn't know anything about how a gpu work. ALU in gpu are act as program decoder and material generator. while rops(Raster Operations Pipeline or Render Output Units in nvidia) are operate as material/texture loading and instruction processed by shader/ALU and finalize. more ALU don't ensure performance boost, in extreme case like highest detail/AA/AF it helps frame rate from dropping in serious margin. for example r670 and r770 don't see much of difference in fps when comes to lower detail/lighting and frame rate are mostly identical except fps. but when come to extreme detail r770 will take advantage because of shader and drop less than r670. however both r670 and r770 having little difference in pixel fill rate except r770 having higher clock and given little more fps. so you want more frame rate then you will need more rops.

yogurt_21 · Sep 30, 2010

Tatty_One said:
Lol, I didnt use the comparision as a "reason" they should be compared to bart etc, I used it because in your previous post you said that you found it difficult to belive that mid and high end cards would have the same ROP count and my example clearly shows that is not always the case because the 5850 and 5870 do. All that you have said does not change the fact that currently, in order for the ROP count to be increased, the memory bus must also be increased, so unless you are sure that we will see some 512bit bus versions then which ever way you want a look at it, you are going to pay a huge premium for that, one of the main reasons ATi have been so competative price wise recently is because they have gone for the 256 bus, NVidia's 384bit + bus widths cost more to produce, just in PCB terms alone .
Using the comparison between the 5830 and the 5770, throws up some odd results, as well as what you have mentioned, despite having double the memory bus it has the same ROP count as the 5770 but were you aware, despite it having double the memory bus, the 5830 is actually SLOWER in pixel fill rate than the 5770, now thats for a couple of reasons but my point is Bus and ROP count are just ingredients in the overall performance, people seem to get too hung up on it, you can get to a point where too many ROP's actually strangle performance and show little improvement where other ingredients can give a greater boost.

Now if we do see a 512bit bus..... and I am not saying we won't, then as you have said, there is more potential there, but with that comes a fairly large hike in prices, I have some doubts that AMD want to go down that route personally, although maybe on just the one top end card.......... my point all along has simply been 2 fold.......

1. Currently I beleive there are limitations on ROP count against Memory Bus size, you aint gonna get 64 ROP's on a 256bit wide bus.
2. There are a lot more factors to overall performance than just bus size and ROP count.

Simple as that really.

again 5850 and 5870 are in the same range, to actually seperate mid from high or high from enthusiest ati/amd has given vast spec differences, infact double in the case of 5770>5870>5970. so I think the thing you're missign here is the fact that I consider the 5850 a highend part, not a midrange to me midrange spans the 100-200$ price point at launch. highend 300-500 and enthusiest 500+. if you read that correctly fermi has no enthusiest single part in my mind, and only enter that realm in sli.

and again the 5850 and the 5870 have the came config only different shaders and clocks, what i refered to in my above post is that clocks makes up 99% of the performance difference between the two cards and when you match their clock speeds on the same rig, the 5870 will barely edge out the 5850 at the same clock speeds. Proving that the shader difference between the two doesn't affect performance significantly.

now doubling the shader count might, but not likly enough to grant as much a performance difference as ther eis between the 5870 and 5770 which regarldless will skew pucharse decision away from the highend parts. Being that highend parts already sell less than midrange and are more expensive to manufacturer it could be a costly decision.

despite having double the memory bus it has the same ROP count as the 5770 but were you aware, despite it having double the memory bus, the 5830 is actually SLOWER in pixel fill rate than the 5770

Click to expand...

don't know why you posted this as it proves my point, since the 5770 has the same rop/tmu/memory bit per shader balance as the the 5870 it has a nice scalable architecture that as you pointed out has a better fillrate than the 5830 depite the fact that the 5830 has 40% more shaders. so...shaders again aren't enough on their own. They need the raw hp of the rop combined with the tmu to get the job done. And no you comclusion based on the data is incorrect, the 5830 has a SHADER bottleneck, not an rop/tmu one. that's why the 5770 with 40% less shaders and 40% less tmu's can have a higher fillrate. (granting the 200MHZ memory and 50MHZ core increase in clock speed on the 5770 might be helping the fillrate).

based on what we know about ati, though they cannot increase the rop count per memory bit in a series,they can disable them second thing we know is that cypress was essentially two seperate cores on a single die and juniper was a single of those cores.

it is possible that ATI/AMD already have a working core with 64 rops on a 256bit bus and we're seeign half of that on barts. another thing to keep in mind is that a few years ago 16 rop's were the max ati could do on a 256-bit bus, so at the time I could have argued that they couldn't put 32 rops on that bus width, I would have been wrong.

besides the fact I don't care if they have to go to a 384-bit bus width with 48 rop's, caymen needs to increase the rop count as well as shaders and tmu's to fit in with barts in the lineup otherwise barts will be the odd man out and steal the sales.

bear jesus · Sep 30, 2010

yogurt_21 said:
besides the fact I don't care if they have to go to a 384-bit bus width with 48 rop's, caymen needs to increase the rop count as well as shaders and tmu's to fit in with barts in the lineup otherwise barts will be the odd man out and steal the sales.

After learning a little more about the limitations in gpu core design I'm kind of hoping it would be a 384bit bus as it looks like it would be the best option for increasing everything but not pushing the die size too far, but then again i am just a noob when it comes to gpu chip design :laugh:

dalelaroy · Sep 30, 2010

cheezburger said:
not until i get his 5850 first then i'll trade my 9600gt to him for GT240 for physx

i don't see there's any point adding ridiculous number of shader on exist 40nm fab..based on my previous calculation if cayman is double of barts even except rops/bus increase as you were mention it will turn out to be like below if the spec is 2560:128:32 and 256bit bus

shader die space in cypress is 60% and 4D shader is 80% of 5D shader in size and SIMD controller and TMU took about 15% then here will be 2(334x 0.6 x0.8)+2(334x0.15)+334x0.25 = 320.64 + 100.2 +83.5 = 504.34mm^2 + hard wiring = 510mm^2

that is huge die and such 510mm^2 only has 32 rops????and i don't see any reason why we'd need 640ALU for? folding@home?
and you expect a 510mm^2 chip using a narrow 256bit bus on it?

if the shader turn out to be 5120(1280ALU) then the die size will be:

4(334x 0.6 x0.8)+4(334x0.15)+334x0.25 = 641.28 + 200.4 + 83.5 = 925.18mm^2 + hard wiring = 940mm^2......

shader like this are pointless if you don't have more rops to push it. like g92 was bottleneck by its 16 rop while it had 128 ALU. and now cayman that has 1280 ALU but 32 rops....that is a big joke...

if the specification turn out to be 1920:96:64 512bit story will be vastly different from above

1.5(334x0.6x0.8)+1.5(334x0.15)+2(334x0.25) = 240.48 + 75.15 + 167 = 482.64mm^2 + hard wiring = 484mm^2

480ALU is what we need in existed 40nm..no go further....

First of all, I read an interview with an AMD engineer in which he stated that the shaders of Cypress take up 80% of the Cypress die. This was within the context of discussing SIMD pipelines, so he might have meant SIMD pipelines, which would be shaders plus TMUs plus SIMD logic, but even your 60% for shaders plus 15% for TMUs and SIMD logic do not add up to the 80% stated by this engineer. Where do you get your figures.

Second, while it is common to quote 1600 for the number of shaders in Cypress, Cypress actually has 1600 ALUs organized as 320 shaders, that are arranged in 20 SIMD pipelines having 16 shaders and 4 TMUs each. Each shader has 4 simple ALUs and 1 complex ALU. Barts/Cayman is supposed to have 4 moderate complexity ALUs per shader.

Barts/Cayman are not derivatives of Juniper or Cypress. They were designed in parallel with Evergreen by the team(s) that designed RV7xx, including RV740. The engineer that was interviewed stated that the 4 ALU per shader design of Northern Islands took up slightly less space per shader than the 4+1 ALU design of Cypress while delivering between 1.5x to 1.8x the performance per shader of Cypress. The engineer might have meant 1.5x to 1.8x the performance per ALU, deliberately using the wrong term to make things clearer to the interviewer that often mentioned the 1600 shaders of Cypress.

The Radeon HD 5830 has the same number of ROPs and memory controllers as the Radeon HD 4870/4890, and falls between the two of them in average performance despite having 1.4x the number of SIMD pipelines. Chances are that it is not the performance of the individual shaders/TMUs that is crippling Cypress, but the SIMD control logic. My guess is that the NI design team went with a 4 moderate complexity ALU design for NI to simplify the control logic, thus enabling them to achieve at least the per shader performance of RV770 while implementing double precision floating point, as well as the DX11 features. Just getting NI to RV770 level per ALU performance would have given NI 12% higher performance per shader than Cypress. And it is possible that other improvements, including higher utilization of the ALUs due to fewer of them per shader and the number of ALUs per shader being a power of two, increased performance per shader to within 95% of the 4+1 ALU shaders. Thus the 1.5x to 1.8x figure quoted.

My guess is that, since the small die size strategy was well established at the time NI was being designed, and 32nm allows for just a bit over 56% more transistors per mm2 versus 40nm, and the 4 ALU shader design is only slightly smaller than the 4+1 ALU shader design, Turks was to be 1.6x Redwood, Barts 1.6x Juniper, and Cayman 1.6x Cypress with regards to shaders/SIMD pipelines. This would make Turks 128 shaders(512 ALUs), Barts 256 shaders (1024 ALUs), and Cayman 512 shaders (2048 ALUs). When 40nm was cancelled, only Cayman had to be cut down, and this was only to keep the TDP within the limits of what was needed to produce a dual GPU "Cayman".

Bus width is primarily a function of die size, and since Barts would have had about the same die size as Juniper at 32nm, Barts would have started with a 128-bit bus. But with Barts having over 50% more core performance than Juniper, there would have been a push towards either increasing the number of ROPs per memory controller by at least 50% or increasing the memory width by 50%. If they went with the memory width solution, Barts would have had a 192-bit wide bus at 32nm. Cayman was probably not large enough for a 384-bit memory bus at 32nm, so my guess is that the number of ROPs per memory controller was increased.

If indeed the Radeon HD 2900 GT had 12 ROPs (persumably 16 total with 4 disabled) it is Cayman might have had 12 ROPs per memory controller at 32nm. Well actually 16 ROPs per memory controller organized as four clusters of 4 ROPs each, with one ROP cluster per memory controller serving as a spare. I estimate that, at the time the GTX 480 was introduced, approximately 14% of all Radeon HD 5850/5870 yield was being lost to defective ROP clusters. At the time the Radeon HD 5830 was introduced this yield loss to defective ROP clusters would have been higher, thus the need to salvage a part with one ROP cluster per memory controller disabled. ATI probably anticipated similar yield problems at 32nm, and at least wanted one spare ROP cluster per memory controller available to improve yields, so the design could have been three ROP clusters per memory contoller with the third serving only as a spare, but more likely, with the need for 50% higher ROP performance to match the 50% higher core performance, ROP clusters per memory controller were doubled, with the fourth ROP cluster per memory controller serving as a spare.

With 32nm being cancelled and NI reimplemented at 40nm, die size grew, and there was increased perimeter on which to implement edge pads, enabling Barts to grow from 192-bits to 256-bits, and perhaps Cayman can now be 384-bit instead of 256-bit. If not however, I do expect Cayman to have at least 50% more ROPs per memory controller.

cadaveca · Sep 30, 2010

daelalroy, I gotta agree with your thoughts about control logic. Given that nVidia has now said that this exact thing is what went wrong with Fermi in development, and given Huang's explanation, I feel it's safe to say that this is definately a sore spot for the 40nm process. Also, AMD has previously mentioned that the dispatch processor would get a serious revamp.

wahdangun · Sep 30, 2010

cheezburger said:
you haven't answer my question, why would amd want to make a huge die GPU by adding more ALU/shader if they knew it will cost more by adding more shader? why don;t they just simply optimize their ALU more and adding rops/bus instead?

this is no long speculation, this is fact! we all know shader cost 60% die space in current evergreen design and adding more then twice shader is non sense and make gpu as big as fermi while no frame rate gain and bad scaling is just plenty stupid. you can add more shader on 3870 is because r670 only has die size of 179mm^2 and 282mm^2 in 4870. increase roughly 60% while adding extra 100ALU/24TMU&SIMD cluster.but if we speculate this on cayman it will be 534mm^2 if you design to add more ALU like it did on r770. you fail one thing, if cayman is ONLY 20% gain in performance over barts then why is amd bother to make it out if it's only 20% over a mid range card while having die size of 500mm^2?? a 480:96:64 will have better scaling and frame rate burst over a 1280: (128)64:32.

guess you didn't know anything about how a gpu work. ALU in gpu are act as program decoder and material generator. while rops(Raster Operations Pipeline or Render Output Units in nvidia) are operate as material/texture loading and instruction processed by shader/ALU and finalize. more ALU don't ensure performance boost, in extreme case like highest detail/AA/AF it helps frame rate from dropping in serious margin. for example r670 and r770 don't see much of difference in fps when comes to lower detail/lighting and frame rate are mostly identical except fps. but when come to extreme detail r770 will take advantage because of shader and drop less than r670. however both r670 and r770 having little difference in pixel fill rate except r770 having higher clock and given little more fps. so you want more frame rate then you will need more rops.

sorry i don't know how to design the GPU, i'm just saying it because the correlation between each GPU design,

bear jesus · Sep 30, 2010

I have to admit all this is getting so confusing, i wish AMD would hurry up and start telling us something official about the cards.

jasper1605 · Sep 30, 2010

bear jesus said:
I have to admit all this is getting so confusing, i wish AMD would hurry up and start telling us something official about the cards.

Amen to that! For someone who doesn't understand ultra tech lingo to begin with and then reading conflicting views on ROPS SIMD lanes ALUs MEOW (just for kix) it gets very confusing

bear jesus · Sep 30, 2010

jasper1605 said:
Amen to that! For someone who doesn't understand ultra tech lingo to begin with and then reading conflicting views on ROPS SIMD lanes ALUs MEOW (just for kix) it gets very confusing

I have almost given up on trying to understand this all, although i admit it was a good excuse to read up on gpu design but really i'm only that interested in how powerful a card is and how that translates into high fps at high resolution and detail within a reasonable cost.

I damn AMD for being so quiet about it all, i geuss all we can do is wait for the release as i'm not expecting much official information before then, hopefully AMD has a nice supprise for us all.

Tatty_Two · Sep 30, 2010

yogurt_21 said:
it is possible that ATI/AMD already have a working core with 64 rops on a 256bit bus and we're seeign half of that on barts. another thing to keep in mind is that a few years ago 16 rop's were the max ati could do on a 256-bit bus, so at the time I could have argued that they couldn't put 32 rops on that bus width, I would have been wrong.

besides the fact I don't care if they have to go to a 384-bit bus width with 48 rop's, caymen needs to increase the rop count as well as shaders and tmu's to fit in with barts in the lineup otherwise barts will be the odd man out and steal the sales.

We could disagree over individual points on this all day.... as it seems we are, and to be honest, I have lost the will to live! So i will just re-iterate my origional point which instigated this lengthy discussion, not just with you but with one or two others....current architecture prohibits more than 32 ROP's on a 256 bit memory bus, not being an Engineer or whatever, I don't know if thats because it's technically impossible (because of the interlinked technology or whether it is just totally impractical which is precisely why NVidia have had to raise said bus to 384 bit to fit more ROP's on, don't you or anyone else think that if 64 ROP's could be linked to a cheaper 256bit bus without to much grief then manuafacturers would adopt that higher performance lower cost option? (assuming the cost would be lower as no additional PCB layers would need to be added) I am not saying it is impossible, I am saying that both AMD and NVidia's architecture and relationship between their memory controllers and ROP's suggests strongly to me that this will not happen.

As I said earlier, I am quite prepared to stand up and proclaim I am wrong if more than 32 appear on a 256 bit Bus. I don't and have never argued against the benefits of a wider bus with a greater ROP count, just the point that there are many more elements to performance than just that and if the 5870/5850 only show that to a small degree, that is probably simply due to the fact that in retail, AMD's easiest and cheapest option is just to raise core clocks, I am sure if they wanted to they could have increased the performance some more without increasing the bus/ROP count.... but why would they want to with the cards positioning? I simply think that Cayman may well have more ROP's than 32, i suppose I just don't think that they will be on a 256bit bus

just my thoughts and opinions.

yogurt_21 · Sep 30, 2010

Tatty_One said:
We could disagree over individual points on this all day.... as it seems we are, and to be honest, I have lost the will to live! So i will just re-iterate my origional point which instigated this lengthy discussion, not just with you but with one or two others....current architecture prohibits more than 32 ROP's on a 256 bit memory bus, not being an Engineer or whatever, I don't know if thats because it's technically impossible (because of the interlinked technology or whether it is just totally impractical which is precisely why NVidia have had to raise said bus to 384 bit to fit more ROP's on, don't you or anyone else think that if 64 ROP's could be linked to a cheaper 256bit bus without to much grief then manuafacturers would adopt that higher performance lower cost option? (assuming the cost would be lower as no additional PCB layers would need to be added) I am not saying it is impossible, I am saying that both AMD and NVidia's architecture and relationship between their memory controllers and ROP's suggests strongly to me that this will not happen.

As I said earlier, I am quite prepared to stand up and proclaim I am wrong if more than 32 appear on a 256 bit Bus. I don't and have never argued against the benefits of a wider bus with a greater ROP count, just the point that there are many more elements to performance than just that and if the 5870/5850 only show that to a small degree, that is probably simply due to the fact that in retail, AMD's easiest and cheapest option is just to raise core clocks, I am sure if they wanted to they could have increased the performance some more without increasing the bus/ROP count.... but why would they want to with the cards positioning? I simply think that Cayman may well have more ROP's than 32, i suppose I just don't think that they will be on a 256bit bus just my thoughts and opinions.

as always none of are engineers so it's all speculation (and if there is an amd engineer watching this thread, wtf? get back to work!) we'll see how it comes out, they could very well prove us al wrong and show such a strong improvemnt in shader power that we start seeing nvidia style shader counts for all we know. lol

cheezburger · Sep 30, 2010

dalelaroy said:
First of all, I read an interview with an AMD engineer in which he stated that the shaders of Cypress take up 80% of the Cypress die. This was within the context of discussing SIMD pipelines, so he might have meant SIMD pipelines, which would be shaders plus TMUs plus SIMD logic, but even your 60% for shaders plus 15% for TMUs and SIMD logic do not add up to the 80% stated by this engineer. Where do you get your figures.

Second, while it is common to quote 1600 for the number of shaders in Cypress, Cypress actually has 1600 ALUs organized as 320 shaders, that are arranged in 20 SIMD pipelines having 16 shaders and 4 TMUs each. Each shader has 4 simple ALUs and 1 complex ALU. Barts/Cayman is supposed to have 4 moderate complexity ALUs per shader.

Barts/Cayman are not derivatives of Juniper or Cypress. They were designed in parallel with Evergreen by the team(s) that designed RV7xx, including RV740. The engineer that was interviewed stated that the 4 ALU per shader design of Northern Islands took up slightly less space per shader than the 4+1 ALU design of Cypress while delivering between 1.5x to 1.8x the performance per shader of Cypress. The engineer might have meant 1.5x to 1.8x the performance per ALU, deliberately using the wrong term to make things clearer to the interviewer that often mentioned the 1600 shaders of Cypress.

The Radeon HD 5830 has the same number of ROPs and memory controllers as the Radeon HD 4870/4890, and falls between the two of them in average performance despite having 1.4x the number of SIMD pipelines. Chances are that it is not the performance of the individual shaders/TMUs that is crippling Cypress, but the SIMD control logic. My guess is that the NI design team went with a 4 moderate complexity ALU design for NI to simplify the control logic, thus enabling them to achieve at least the per shader performance of RV770 while implementing double precision floating point, as well as the DX11 features. Just getting NI to RV770 level per ALU performance would have given NI 12% higher performance per shader than Cypress. And it is possible that other improvements, including higher utilization of the ALUs due to fewer of them per shader and the number of ALUs per shader being a power of two, increased performance per shader to within 95% of the 4+1 ALU shaders. Thus the 1.5x to 1.8x figure quoted.

My guess is that, since the small die size strategy was well established at the time NI was being designed, and 32nm allows for just a bit over 56% more transistors per mm2 versus 40nm, and the 4 ALU shader design is only slightly smaller than the 4+1 ALU shader design, Turks was to be 1.6x Redwood, Barts 1.6x Juniper, and Cayman 1.6x Cypress with regards to shaders/SIMD pipelines. This would make Turks 128 shaders(512 ALUs), Barts 256 shaders (1024 ALUs), and Cayman 512 shaders (2048 ALUs). When 40nm was cancelled, only Cayman had to be cut down, and this was only to keep the TDP within the limits of what was needed to produce a dual GPU "Cayman".

Bus width is primarily a function of die size, and since Barts would have had about the same die size as Juniper at 32nm, Barts would have started with a 128-bit bus. But with Barts having over 50% more core performance than Juniper, there would have been a push towards either increasing the number of ROPs per memory controller by at least 50% or increasing the memory width by 50%. If they went with the memory width solution, Barts would have had a 192-bit wide bus at 32nm. Cayman was probably not large enough for a 384-bit memory bus at 32nm, so my guess is that the number of ROPs per memory controller was increased.

If indeed the Radeon HD 2900 GT had 12 ROPs (persumably 16 total with 4 disabled) it is Cayman might have had 12 ROPs per memory controller at 32nm. Well actually 16 ROPs per memory controller organized as four clusters of 4 ROPs each, with one ROP cluster per memory controller serving as a spare. I estimate that, at the time the GTX 480 was introduced, approximately 14% of all Radeon HD 5850/5870 yield was being lost to defective ROP clusters. At the time the Radeon HD 5830 was introduced this yield loss to defective ROP clusters would have been higher, thus the need to salvage a part with one ROP cluster per memory controller disabled. ATI probably anticipated similar yield problems at 32nm, and at least wanted one spare ROP cluster per memory controller available to improve yields, so the design could have been three ROP clusters per memory contoller with the third serving only as a spare, but more likely, with the need for 50% higher ROP performance to match the 50% higher core performance, ROP clusters per memory controller were doubled, with the fourth ROP cluster per memory controller serving as a spare.

With 32nm being cancelled and NI reimplemented at 40nm, die size grew, and there was increased perimeter on which to implement edge pads, enabling Barts to grow from 192-bits to 256-bits, and perhaps Cayman can now be 384-bit instead of 256-bit. If not however, I do expect Cayman to have at least 50% more ROPs per memory controller.

that 80% is already included TMU/SIMD controller. consider amd's architecture is shader/ALU tight up with TMU/SIMD ctrl in the same module while it separate rop and bus to another section. so basically my calculation is close to it.

hd 2900gt was indeed 16 total with 4 disable. consider the die size and yield is completely identical to xt/pro version. however like 5830, its bad scaling ending generate more heat and far less performance in expectation. any cut down version that to be 3/4 or going odd number like fermi will cause bad scaling and performance loss. especially on amd's bus design it is impossible to go 6/12 configure then 8/16. their SIMD cluster and instruction pipeline has prevent it happen. so it will be logical either stay the same or double it. 40/320bit or 48rop/384bit bus will not possible on amd line, at least not in this generation.

Wile E · Oct 1, 2010

TheMailMan78 said:
Its just you have bad cards. I ran crossfire with 4850s for a very long time without issue.

I've run single 4850, single 4870, crossfire 4850's, 4870+4850, crossfire 4870, 4870x2 + 4870, and finally just 4870X2.

Bugs in every single release past 8.10. Even on completely clean OS installs.

bear jesus · Oct 1, 2010

Wile E said:
I've run single 4850, single 4870, crossfire 4850's, 4870+4850, crossfire 4870, 4870x2 + 4870, and finally just 4870X2.

Bugs in every single release past 8.10. Even on completely clean OS installs.

To be honest i'm sure one major reason why some people seam to have bugs and others don't is mainly due to different hardware/os setups and also different choices in games.

Widjaja · Oct 1, 2010

Wile E said:
I've run single 4850, single 4870, crossfire 4850's, 4870+4850, crossfire 4870, 4870x2 + 4870, and finally just 4870X2.

Bugs in every single release past 8.10. Even on completely clean OS installs.

Bugs?

If there are I have not noticed them with my HD4850.

Processor	Intel Core i9 11900KF @ -.080mV PL max @220w
Motherboard	MSI MAG Z490 TOMAHAWK
Cooling	DeepCool LS520SE Liquid + 3 Phanteks 140mm case fans
Memory	32GB (4 x 8GB SR) Patriot Viper Steel Bdie @ 3600Mhz CL14 1.45v Gear 1
Video Card(s)	Asus Dual RTX 4070 OC + 8% PL
Storage	WD Blue SN550 1TB M.2 NVME//Crucial MX500 500GB SSD (OS)
Display(s)	AOC Q2781PQ 27 inch Ultra Slim 2560 x 1440 IPS
Case	Phanteks Enthoo Pro M Windowed - Gunmetal
Audio Device(s)	Onboard Realtek ALC1200/SPDIF to Sony AVR @ 5.1
Power Supply	Seasonic CORE GM650w Gold Semi modular
Software	Win 11 Home x64

Processor	intel i5 2500k,atom280@1930mhz
Motherboard	msi h61
Cooling	i have cooler master V8 but now im on stock :P
Memory	team group elite 2x2 1600 @1333 7 7 7 20
Video Card(s)	sapphere 5850 toxic@900/1250
Storage	WD 1TB green 5400 32mb ,WD 250gb 7200/8mb
Display(s)	HP 22" 1680x1050 hdmi
Case	asus vento
Audio Device(s)	creative XFi fatality+creative gigaworksG500 5.1
Power Supply	cooler master gx 750
Software	win7,64bit

System Name	RBMK-1000
Processor	AMD Ryzen 7 5700G
Motherboard	Gigabyte B550 AORUS Elite V2
Cooling	DeepCool Gammax L240 V2
Memory	2x 16GB DDR4-3200
Video Card(s)	Galax RTX 4070 Ti EX
Storage	Samsung 990 1TB
Display(s)	BenQ 1440p 60 Hz 27-inch
Case	Corsair Carbide 100R
Audio Device(s)	ASUS SupremeFX S1220A
Power Supply	Cooler Master MWE Gold 650W
Mouse	ASUS ROG Strix Impact
Keyboard	Gamdias Hermes E2
Software	Windows 11 Pro

System Name	Gaming temp// HTPC
Processor	AMD A6 5400k // A4 5300
Motherboard	ASRock FM2A75 PRO4// ASRock FM2A55M-DGS
Cooling	Xigmatek HDT-D1284 // stock phenom II HSF
Memory	4GB 1600mhz corsair vengeance // 4GB 1600mhz corsair vengeance low profile
Storage	64gb sandisk pulse SSD and 500gb HDD // 500gb HDD
Display(s)	acer 22" 1680x1050
Power Supply	Seasonic G-450 // Corsair CXM 430W

Processor	intel i5 2500k,atom280@1930mhz
Motherboard	msi h61
Cooling	i have cooler master V8 but now im on stock :P
Memory	team group elite 2x2 1600 @1333 7 7 7 20
Video Card(s)	sapphere 5850 toxic@900/1250
Storage	WD 1TB green 5400 32mb ,WD 250gb 7200/8mb
Display(s)	HP 22" 1680x1050 hdmi
Case	asus vento
Audio Device(s)	creative XFi fatality+creative gigaworksG500 5.1
Power Supply	cooler master gx 750
Software	win7,64bit

System Name	no bases
Processor	E8400/e5300/qx9770
Motherboard	rampage formula/DG41TY/p5q DELUXE
Cooling	stock DTC cooler&copper core
Memory	titanium XTC DDR2 800 2gbx4/2gbx2/ballistix 2GBx4 DDR2-800
Video Card(s)	evga gtx 460 oc/zotac 9600gt amp/evga gtx 580
Storage	WD cavior black 2TB 16mb eSATA 2/500gb 16mb ATA133/ OCZSSD2-1ONX32G + samsung 320gb 8mb ESATA
Case	cm 690/GZ-x2/antec qaudro 1200w
Power Supply	antec quattro 1200w/zumax 500w v2/antec HCG 900w
Software	windows server 2008 sp2/windows xp x64 pro sp2c/windows server 2008 sp1

System Name	Thought I'd be done with this by now
Processor	i7 11700k 8/16
Motherboard	MSI Z590 Pro Wifi
Cooling	Be Quiet Dark Rock Pro 4, 9x aigo AR12
Memory	32GB GSkill TridentZ Neo DDR4-4000 CL18-22-22-42
Video Card(s)	MSI Ventus 2x Geforce RTX 3070
Storage	1TB MX300 M.2 OS + Games, + cloud mostly
Display(s)	Samsung 40" 4k (TV)
Case	Lian Li PC-011 Dynamic EVO Black
Audio Device(s)	onboard HD -> Yamaha 5.1
Power Supply	EVGA 850 GQ
Mouse	Logitech wireless
Keyboard	same
VR HMD	nah
Software	Windows 10
Benchmark Scores	no one cares anymore lols

System Name	Vegnagun
Processor	Ryzen 5950x
Motherboard	Asus B550 Gaming-E
Cooling	Noctua NH-U14
Memory	4x8 G. Skill 3800mhz CL14
Video Card(s)	EVGA RTX 3080 FTW3
Storage	WD SN850 2tb
Display(s)	Viotek 1080p 120hz
Case	Fractal Design Define 7
Power Supply	Corsair AX850
Mouse	Logitech
Keyboard	Logitech 815 tactile
Software	Windows 10 Education
Benchmark Scores	top 1% in the world for weekly score in Killzone 2 :)

System Name	The ClusterF**k
Processor	980X @ 4Ghz
Motherboard	Gigabyte GA-EX58-UD5 BIOS F12
Cooling	MCR-320, DDC-1 pump w/Bitspower res top (1/2" fittings), Koolance CPU-360
Memory	3x2GB Mushkin Redlines 1600Mhz 6-8-6-24 1T
Video Card(s)	Evga GTX 580
Storage	Corsair Neutron GTX 240GB, 2xSeagate 320GB RAID0; 2xSeagate 3TB; 2xSamsung 2TB; Samsung 1.5TB
Display(s)	HP LP2475w 24" 1920x1200 IPS
Case	Technofront Bench Station
Audio Device(s)	Auzentech X-Fi Forte into Onkyo SR606 and Polk TSi200's + RM6750
Power Supply	ENERMAX Galaxy EVO EGX1250EWT 1250W
Software	Win7 Ultimate N x64, OSX 10.8.4

System Name	Darth Obsidious
Processor	Intel i5 2500K
Motherboard	ASUS P8Z68-V/Gen3
Cooling	Cooler Master Hyper 212+ in Push Pull
Memory	2X4GB Corsair Vengeance DDR3 1600
Video Card(s)	ASUS R9 270x TOP
Storage	128GB Samsung 830 SSD, 1TB WD Black, 2TB WD Green
Display(s)	LG IPS234V-PN
Case	Corsair Obsidian 650D
Audio Device(s)	Infrasonic Quartet
Power Supply	Corsair HX650w
Software	Windows 7 64bit and Windows XP Home
Benchmark Scores	2cm mark on bench with a razor blade.

AMD Radeon HD 6700 Series ''Barts'' Specs Sheet Surfaces

My name is Dave

Gone Fishing

New Member

Editor & Senior Moderator

New Member

New Member

New Member

wahdangun

Guest

New Member

New Member

New Member

My name is Dave

wahdangun

Guest

New Member

New Member

Gone Fishing

New Member

Power User

New Member