Thursday, April 24th 2008

ATI Radeon HD 4800 Series Video Cards Specs Leaked

Apr 24th, 2008 06:49 Discuss (278 Comments)

Thanks to TG Daily we can now talk about the very soon to be released ATI HD 4800 series of graphics cards with more details. One week ahead of its presumable release date, general specifications of the new cards have been revealed. All Radeon 4800 graphics will use the 55nm TSMC produced RV770 GPU, that include over 800 million transistors, 480 stream processors or shader units (96+384), 32 texture units, 16 ROPs, a 256-bit memory controller (512-bit for the Radeon 4870 X2) and native GDDR3/4/5 support as reported before. At first, AMD's graphics division will launch three new cards - Radeon HD 4850, 4870 and 4870 X2:

ATI Radeon HD 4850 - 650MHz/850MHz/1140MHz core/shader/memory clock speeds, 20.8 GTexel/s (32 TMU x 0.65 GHz) fill-rate, available in 256MB/512MB of GDDR3 memory or 512MB of GDDR5 memory clocked at 1.73GHz
ATI Radeon HD 4870 - 850MHz/1050MHz/1940MHz core/shader/memory clock speeds, 27.2 GTexel/s (32 TMU x 0.85 GHz) fill-rate, available in 1GB GDDR5 version only
ATI Radeon HD 4870 X2 - unknown core/shader clock speeds, available with 2048MB of GDDR5 memory clocked at 1730MHz

The 4850 256MB GDDR3 version will arrive as the successor of the 3850 256MB with a price in the sub-$200 range. The 4850 512MB GDDR3 should retail for $229, while the 4850 512MB GDDR5 will set you back about $249-269. The 1GB GDDR5 powered 4870 will retail between $329-349. The flagship Radeon HD 4870 X2 will ship later this year for $499.

Source: TG Daily

Add your own comment

278 Comments on ATI Radeon HD 4800 Series Video Cards Specs Leaked

#251

Morgoth

Fueled by Sapphire

rwar more power consuption = more heat = les overclock = nucler melddown when overclock

#252

imperialreign

Morgothrwar more power consuption = more heat = les overclock = nucler melddown when overclock

yep - Intel has proven this (except for the less OC bit) with the Prescott lineup :p

#253

HAL7000

Morgothrwar more power consuption = more heat = les overclock = nucler melddown when overclock

Understood, that is of course you plan on overclocking and how you cool your system . The energy I was refering to would not even come near what the prescott consumed.
My point was simple, I don't care about saving energy to power my system up.
I just would like AMD to release something worth building (for myself). The 4870 x2 and whatever else they decide to release I hope is not just a play on words. Its been real close many times to join the dark side. But will hold off this one last time.

#254

MrMilli

www.tgdaily.com/html_tmp/content-view-37453-135.html

quote:
In terms of performance, we heard some interesting claims. A 4870 should perform on par with or better than a dual-chip 3870 X2.

lemonadesoda? reading this? ;)

#255

lemonadesoda

your linkIn terms of performance, we heard some interesting claims. A 4870 should perform on par with or better than a dual-chip 3870 X2. Our sources explained to us that using a PCIe Gen1 controller 3870 X2 was a mistake, since the board was hungry for data and didn't sync well with this interface

I'd be delighted if the 4870 really was as fast as a 2x 3870 in crossfire. (A 3870X2 is actually clocked as a 3850 and should really be called 3850X2)

But I dont think the reasons they give will result in such a performance gain:

#1. 480 vs. 320 shaders = 50% improvement in the BEST POSSSIBLE situation, ie. purely shader limited.

#2. 32 vs. 16 TMU = 100% improvement... now I actually think THIS is going to have a bigger impact.

#3. 16 ROPS vs. 16 ROPS = no change here or to architecture.

#4. PCIe v1.0 controller? Well, check my benchies... my AGP is as fast as a PCIe16 card... given similar processor and proc speed. No. The interface is irrelevant UNLESS the graphics assets are in memory and not on the card.

#5. As I have always said, there will be increases associated with increased clocks, but points 1-4 refer to clock for clock gainst.

Net net? 50%-100% improvement IN THE BEST sitation (clock for clock) depending on where the limit was, ie shader limit or resolution limit.

On average? Less than 50%.

In practice. For the average person, FPS at, say, 1280x1024 will not improve by more than 20-30%. But you WILL BE ABLE to dial up much higher FSAA and AA without performance penalty. (And PLEASE read that as "much performance penalty". Its a relative comment, not supposed to mean exactly 100% same performance :rolleyes:)

#256

MrMilli

lemonadesodaI'd be delighted if the 4870 really was as fast as a 2x 3870 in crossfire. (A 3870X2 is actually clocked as a 3850 and should really be called 3850X2)

But I dont think the reasons they give will result in such a performance gain:

#1. 480 vs. 320 shaders = 50% improvement in the BEST POSSSIBLE situation, ie. purely shader limited.

#2. 32 vs. 16 TMU = 100% improvement... now I actually think THIS is going to have a bigger impact.

#3. 16 ROPS vs. 16 ROPS = no change here or to architecture.

#4. PCIe v1.0 controller? Well, check my benchies... my AGP is as fast as a PCIe16 card... given similar processor and proc speed. No. The interface is irrelevant UNLESS the graphics assets are in memory and not on the card.

#5. As I have always said, there will be increases associated with increased clocks, but points 1-4 refer to clock for clock gainst.

Net net? 50%-100% improvement IN THE BEST sitation (clock for clock) depending on where the limit was, ie shader limit or resolution limit.

On average? Less than 50%.

In practice. For the average person, FPS at, say, 1280x1024 will not improve by more than 20-30%. But you WILL BE ABLE to dial up much higher FSAA and AA without performance penalty. (And PLEASE read that as "much performance penalty". Its a relative comment, not supposed to mean exactly 100% same performance :rolleyes:)

lemonadesoda, if you don't mind, i have to correct you.

First off, the 3870X2 gpu's are clocked at 825Mhz. The 3870 is clocked at 777Mhz. So i don't know why you compare it with a 3850 (670Mhz btw).
Check out this for refrence: techreport.com/articles.x/14284/5
So a 3870X2 is only 4% slower than a 3870 CF. The reason why it's slower is because only one CF bridge is connected onboard and one is still free for CF-X. Normal CF uses two bridges.
All things aside, 4% is nothing. So if they say it's as fast or faster than a 3870X2 then that means around 70% faster than a 3870. That's what that sentence mean. Nothing more, nothing less.

A couple of things i need to rectify (again):
You shouldn't compare the amount of shaders but the GFlop they can compute.
RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.

Also i don't care what a GPU can do at 1280x1024. That resolution is mostly cpu bound.
If you want to compare GPU's, you need to go over 1600x1200. That's just the way it is.

#257

magibeg

Children no fighting until the cards are released please. Then we can find out what they're actually capable of.

#258

DarkMatter

lemonadesodaI'd be delighted if the 4870 really was as fast as a 2x 3870 in crossfire. (A 3870X2 is actually clocked as a 3850 and should really be called 3850X2)

But I dont think the reasons they give will result in such a performance gain:

#1. 480 vs. 320 shaders = 50% improvement in the BEST POSSSIBLE situation, ie. purely shader limited.

#2. 32 vs. 16 TMU = 100% improvement... now I actually think THIS is going to have a bigger impact.

#3. 16 ROPS vs. 16 ROPS = no change here or to architecture.

#4. PCIe v1.0 controller? Well, check my benchies... my AGP is as fast as a PCIe16 card... given similar processor and proc speed. No. The interface is irrelevant UNLESS the graphics assets are in memory and not on the card.

#5. As I have always said, there will be increases associated with increased clocks, but points 1-4 refer to clock for clock gainst.

Net net? 50%-100% improvement IN THE BEST sitation (clock for clock) depending on where the limit was, ie shader limit or resolution limit.

On average? Less than 50%.

In practice. For the average person, FPS at, say, 1280x1024 will not improve by more than 20-30%. But you WILL BE ABLE to dial up much higher FSAA and AA without performance penalty. (And PLEASE read that as "much performance penalty". Its a relative comment, not supposed to mean exactly 100% same performance :rolleyes:)

Every time you post about this, you demostrate your lack of knowledge on the matter. The X2 with HD3850 clocks? My God. Whatever, I don't want to fight again, I will only try to explain why could they say PCIe 1 wasn't enough and why is so important.

One thing is the interface between the card and the system and another completely different one is the one inside the card, the PCIe bridge. The one they are reffering to is the bridge chip between the two RV670 cores. They are using PCIe as they could have used Hyper Transport or another one. They used this for driver compativility, I'm 99,99% sure. I guess they are using it to comunicate between the cores (obvious), but most importantly to get some kind of cache coherency between them. Why this coherency is important? Because that way one core can use the info calculated by the other. AFAIK normal Crossfire (and SLI) does little of this, each card renders odd frames or lines or something (pixel quads, clusters, whatever, let's call them pixel arrays), while the other renders even parts. If the array in core 1 takes 10 times more to render than the one in core 2, you lose a lot of time waiting. You need some kind of comunication between them to let core2 continue the work of core1 without doing a mess. PCIe bandwidth while more than enough for texture and general data transfers between main memory and the card, is pretty slow for that kind of work. For a comparison PCIe 1.1 has a maximum bandwidth of 4 GB/s, while typical CPU chaches are around 30-50 GB/s. PCIe 2.0 increases to 8 GB/s which is still far away, but definately better. Only Ati knows why they used PCIe 1 in the first place, knowing this, but they learnt and move ahead. Let's hope it turns out better this time around.

#259

lemonadesoda

@milli,

My bad, i read elsewhere that the 3870X2 "is actually 3850X2 but marketed as 3870X2". Yep, I should be more careful about what info I pick up and pass on. Thanks for the correction.

RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.

If that is true, then GREAT! But hasnt RV770 been advertised as no architectural change? For 50% increase in shaders to give 100% increase in power *must* require quite a different architectural approach. If that's true, then 4870 will be a winner, baby!

@darkmatter,

everytime you prove your lack of diplomacy. Man, they must have been tough on you at school.

#260

DarkMatter

lemonadesoda@milli,

My bad, i read elsewhere that the 3870X2 "is actually 3850X2 but marketed as 3870X2". Yep, I should be more careful about what info I pick up and pass on. Thanks for the correction.

RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.

If that is true, then GREAT! But hasnt RV770 been advertised as no architectural change? For 50% increase in shaders to give 100% increase in power *must* require quite a different architectural approach. If that's true, then 4870 will be a winner, baby!

@darkmatter,

everytime you prove your lack of diplomacy. Man, they must have been tough on you at school.

:shadedshu I explained why it has double the shader power in the very first post that flamated our discusiion. Maybe I lack diplomacy, but at least I listen (read) to others and learn. I don't talk too much, knowing zero about the matter and discuss others opinions with arbitrary numbers taken out of mist.

#261

lemonadesoda

Refer to point #5. You are taking my comments out of context. I'm talking about performance increases clock/clock. Until the boards are out in the channels, we dont know the clocks, so we can only make assessments on KNOWN architectural changes, while the clock effects are guesswork until we know what they are. My comments have always been very clearly stated as changes on same clocks... so please go back and "read" before getting so hot under the collar!

3870X2 = 3850X2 with overclock. Fact. Both use GDDR3. Put the 3870X2 at the same core clocks as 2x 3870 in crossfire (on GDDR4) and which will win?

#262

btarunr

Editor & Senior Moderator

lemonadesoda@milli,

My bad, i read elsewhere that the 3870X2 "is actually 3850X2 but marketed as 3870X2". Yep, I should be more careful about what info I pick up and pass on. Thanks for the correction.

RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.

If that is true, then GREAT! But hasnt RV770 been advertised as no architectural change? For 50% increase in shaders to give 100% increase in power *must* require quite a different architectural approach. If that's true, then 4870 will be a winner, baby!

]

The 'different' architecture comes in the form of shaders having their own clock-generator, shaders are clocked well above 1 GHz while the geometry domain stays below 800 MHz.

#263

DarkMatter

lemonadesodaRefer to point #5. You are taking my comments out of context. I'm talking about performance increases clock/clock. Until the boards are out in the channels, we dont know the clocks, so we can only make assessments on KNOWN architectural changes, while the clock effects are guesswork until we know what they are. My comments have always been very clearly stated as changes on same clocks... so please go back and "read" before getting so hot under the collar!

3870X2 = 3850X2 with overclock. Fact. Both use GDDR3. Put the 3870X2 at the same core clocks as 2x 3870 in crossfire (on GDDR4) and which will win?

Point is you can't use clock for clock comparisons because RV770 will run faster, and that's in fact one of the advancements of new chips. Minor changes in internal units can affect how far they reach and improvements in the fab process (within the same process) can help obtaining higher stable clocks. It's like saying that HD3850 is as fast as HD3870 with the argument that if run at same speeds they will be equally fast. Wait, you did. Well it's essentially true, but HD3850 can only dream of reaching as high as HD3870, it's pointless to compare them clock for clock and claim no difference in performance.

lemonadesodaGiven the same architecture, higher clocks, and more shaders, I think these are the performance implications:

1./ Broadly similar performance at standard resolutions e.g. 1280x1024 and with no AA FSAA effects since no architectural changes
2./ General improvement in line with clock-for-clock increases 10-20%
3./ The increase to 32 TMU will mean that the cards wont CHOKE at higher resolutions. It will be able to handle 1920x1200 without hitting the wall
4./ Currently you can dial up 4x AA without any performance hit. With the extra shaders you can do the same at 1920x1200 now
5./ With the extra shaders, you will be able to dial up 8x or 16x at 1280x1024 without a significant hit.
6./ The GPU will run hotter and require more power
7./ Compensated by using GDDR5 memory that will require less power and run a bit cooler

Net net... get the GDDR5 model.

Will there be a "jump" in performance like we saw between the x19xx series and hd38xx? No.

Tell me where did you stated you were talking about clock for clock. You didn't up until now, in fact the post above makes me think you had taken clocks into account, since it's the only thing you say will improve the performance in HD4000 series. Neither can we read anything about clock for clock comparison in the next posts, until post #256. Even then you overlook the fact that shaders are running a lot faster and say a 50% is THE BEST POSSIBLE improvement in this area. But that's not the worst part. The worst part is that after saying there's a 50% improvement in shaders and 100% improvement in textures, you come to the conclusion that performance will be LESS than 50%, in fact around 20-30%! :eek:

How can that be? Well, since GDDR5 would make memory bandwidth double of that in HD3000 series, there's only raster power left. You could have argued the weight of ROPs in the final performance, and say that my thoughts about them were wrong, which could be true AT HIGHER RESOLUTIONS, and not in 1280x as you are saying. If shader and texture power is double that of RV670 there are no reasons to say it won't be 2x faster, specially at lower resolutions where ROPS don't count as much. It's an incongruity everything you said, and that's why I say you don't know about this. That and the fact that by your posts, it seems that you act as if fuctional units (ROP, SP, TMU) and clocks were independent and had nothing to do with each other, or something of the like. Like shaders did only AA, like extra TMUs only work when bigger textures are loaded and are idle otherwise, etc. Example of this is when MrMilli said RV770 has double the GFlops you say:

If that is true, then GREAT! But hasnt RV770 been advertised as no architectural change? For 50% increase in shaders to give 100% increase in power *must* require quite a different architectural approach. If that's true, then 4870 will be a winner, baby!

Maybe I'm understanding this badly, but it seems as if you took that as magic. Like it *must* be something underlaying there, something shaddy. It demostrates your lack of knowlegde IMO.

#264

lemonadesoda

It's a basic analytic approach. To separate independent factors, to understand where the gains are coming from. An analogue is with CPUs.

Comparing P4 to Core2 you can just go, Chip A vs. Chip B. Oh look chip B is faster. Or you can break it down, and analyse the performance on things what you can set independently. Much better to compare A and B at-the-same-clock first, to see architectural gains, then observe the additional gain/loss through different clockspeeds. Likewise (in the CPU world) with amount of cache, or number of cores.

With the RV770, you can break it down to:

1./ Increase in shaders ---> impact, and in what situations
2./ Increase in TMU ---> impact, and in what situations
3./ Increase in ROP ---> impact, and in what situations
4./ Change in memory type ---> impact
5./ Increase in clocks ---> impact (unknown at start of thread, although strong speculation now about what they will be, but until in retail channels, we really dont know what is "consumer stable" from the product manufacturers).

With the R770 what is ATi trying to address? The shader and texture "wall" at high resolutions for greater FPS. For regular resolutions? The benefit is being able to dial up higher AA and FSAA. I still hold the view that at a regular 1280x1024 without (or low) AA, FSAA, the performance gains will be relatively small. At high resolutions like 1920x1600, or when at 8xx and 16xx FSAA, AA, thats where the gain will be.

It's quite clear from the benchmarks that the RV770 *will* be very fast HD.4870.3Dmark06benchmark.leak.html=21,223 :D.

I'm very happy to listen to any argument except the lame "you demonstrate lack of knowledge", "you're not very versed, are you?". I find it insulting, and your continued use of it demonstrates a major lack of politeness and bellicose attitude.

Refer to post #218. Please do not try to kindle old flames. This was dealt with. Turn off your microphone. The jury's out until the cards are in.

#265

Mussels

Freshwater Moderator

hey guys... we can stop. you're both offering advice/insight here, and conflicting. why dont we just take bets until the first reviews come out, and see if its 20-30% faster, or 50%+

Either way, you can buy me cards and i'll do an independant review for you

#266

btarunr

Editor & Senior Moderator

lemonadesodaIt's quite clear from the benchmarks that the RV770 *will* be very fast HD.4870.3Dmark06benchmark.leak.html=21,223 :D.

Hey you rick-rolled us with that link to benchmarks :shadedshu

#267

DarkMatter

lemonadesodaIt's a basic analytic approach. To separate independent factors, to understand where the gains are coming from. An analogue is with CPUs.

Comparing P4 to Core2 you can just go, Chip A vs. Chip B. Oh look chip B is faster. Or you can break it down, and analyse the performance on things what you can set independently. Much better to compare A and B at-the-same-clock first, to see architectural gains, then observe the additional gain/loss through different clockspeeds. Likewise (in the CPU world) with amount of cache, or number of cores.

With the RV770, you can break it down to:

1./ Increase in shaders ---> impact, and in what situations
2./ Increase in TMU ---> impact, and in what situations
3./ Increase in ROP ---> impact, and in what situations
4./ Change in memory type ---> impact
5./ Increase in clocks ---> impact (unknown at start of thread, although strong speculation now about what they will be, but until in retail channels, we really dont know what is "consumer stable" from the product manufacturers).

With the R770 what is ATi trying to address? The shader and texture "wall" at high resolutions for greater FPS. For regular resolutions? The benefit is being able to dial up higher AA and FSAA. I still hold the view that at a regular 1280x1024 without (or low) AA, FSAA, the performance gains will be relatively small. At high resolutions like 1920x1600, or when at 8xx and 16xx FSAA, AA, thats where the gain will be.

It's quite clear from the benchmarks that the RV770 *will* be very fast HD.4870.3Dmark06benchmark.leak.html=21,223 :D.

I'm very happy to listen to any argument except the lame "you demonstrate lack of knowledge", "you're not very versed, are you?". I find it insulting, and your continued use of it demonstrates a major lack of politeness and bellicose attitude.

Refer to post #218. Please do not try to kindle old flames. This was dealt with. Turn off your microphone. The jury's out until the cards are in.

Man, that's good and all, but you then forget that if you analyse it and have an increase (2X in fact) in EVERY STAGE, then performance will increase in every (or most) situations!! Until you understand this, I feel I have to continue. I will put an example:

You have three guys making cheeseburgers, one does the meat (A), the second (B) does the cheese and the third takes that, the bread and puts it together (C).

Analyticaly:

- If instead of A we put two guys, we won't get any benefit and will only get an improvement in those situations where you need more than one guy, i.e if you want to put two meat sticks per burger (sorry I don't know their actual names).

- If we use two B guys it happens the same, unless you want more cheese in the mix.

- Same with C, with the difference that there are proofs that C, indeed, is more than enough to put more burgers together than what A and B can provide.

A is SPs, B is TMUs and C are ROPs, we could add D guy who would provide the products as well as carry the finished ones, memory subsystem and platform, including chipset and CPU. So if we double A, B or C independently we won't get any benefit, but if we improve A and B and C can truly handle the new income of products (and again we have proofs it could be that way), we will either be able to provide more burgers or same amount of burgers with more meat/cheese in each burger. Comparatively in the graphics card, we will be able to provide either more complex image 1920x1200 4xAA 16xAF at same speed or more frames of lesser complexity. BOTH!

EDIT: And following with the example. You say we won't see as much of an improvement on low resolutions and AA/AF levels, and that's right, but not for the reasons you say. It's not because of A,B or C guys, it's because D is not able to provide the resources and carry the large amount of finished burgers that others are generating! They have told you so already, it's because of the CPUs you won't see such an improvement on those settings...

#268

lemonadesoda

Oh man, you are the King!

"Have it your way!"
"Can you taste the fire?"

:roll:

Yes, its all down to where the bottleneck is. I guess we have different positions on where the bottleneck is... AND... we are looking at different points of the spectrum where gains (or roadblocks) will be.

When is the GPU shader constrained... RV770 fixes this
When is the GPU texture fill constrained... RV770 fixes this
When is the GPU memory constrained... RV770 fixes this in the XT version with GDDR5, but nor RV770 Pro, and although GDDR5 has higher clocks, lowerer power consumption (important given the GPU core will need more power), it is also higher latency. We need to see benchmarks for the net net.
When is the GPU vertex, polygon, z-plane, ROP contrained... RV770 does not fix this, except for the core "overclock", which is, in fact, pretty much the same as a regular overclock 3870.

For each resolution, the impact of the above will be different.
For each FSAA, AA setting, the impact of the above will be different.

In some situations, it will be very low improvement, in others up to 100% improvement. But the 100% improvement will only exist if current performance is limited by THAT specific bottleneck.

It's going to be mixed results. 1920x1600 will be REAL winners. But if you are on 1280x1024, I still maintain it wont be worth the upgrade UNLESS you are trying to get to 16xAA 16xFSAA. At 0x, 2x or 4x, I'm not convinced, at 1280x1024, the performance improvement will be that great. Why, because at that resolution and those FSAA AA settings the GPU *is not* shader or TMU constrained. Anyway, i await with interest the first benchmarks that come out.

#269

DarkMatter

lemonadesodaOh man, you are the King!

"Have it your way!"
"Can you taste the fire?"

:roll:

Yes, its all down to where the bottleneck is. I guess we have different positions on where the bottleneck is... AND... we are looking at different points of the spectrum where gains (or roadblocks) will be.
When is the GPU shader constrained... RV770 fixes this
When is the GPU texture fill constrained... RV770 fixes this
When is the GPU memory constrained... RV770 fixes this in the XT version with GDDR5, but nor RV770 Pro, and although GDDR5 has higher clocks, lowerer power consumption (important given the GPU core will need more power), it is also higher latency. We need to see benchmarks for the net net.
When is the GPU vertex, polygon, z-plane, ROP contrained... RV770 does not fix this, except for the core "overclock", which is, in fact, pretty much the same as a regular overclock 3870.
For each resolution, the impact of the above will be different.
For each FSAA, AA setting, the impact of the above will be different.

In some situations, it will be very low improvement, in others up to 100% improvement. But the 100% improvement will only exist if current performance is limited by THAT specific bottleneck.

It's going to be mixed results. 1920x1600 will be REAL winners. But if you are on 1280x1024, I still maintain it wont be worth the upgrade UNLESS you are trying to get to 16xAA 16xFSAA. At 0x, 2x or 4x, I'm not convinced, at 1280x1024, the performance improvement will be that great? Why, because at that resolution and those FSAA AA settings the GPU *is not* shader or TMU constrained.

We are heading somewhere in the end. :toast:
But first of all, we are not discussing the impact these cards will have on PCs or games of today, i.e. if someone wants to upgrade and have a big improvement, but the actual power of the card. We have already said why you won't see a big improvement now, but you will when new CPU/chipsets launch, a jump you won't get with HD3870 because it's not as much platform bottlenecked as HD4000 will be.
And second vertex and poly data are done in SPs and not in ROPs and I would also assume that since AA is done in shaders, z-depth, at least z-culling is done in SPs in the Radeons as well. So they don't have to repeat the work you know. Anyway since geometry complexity is not going up, according to the trend followed lately, that won't be a problem. If there's going to be any improvement in geometry complexity, this will be done by geometry shaders and tesselation basically. Shaders once again.

EDIT: Just to be clear. My point is there's nothing to fix in ROP arena. Reasons for that are given in previous posts, but basically because:

1- Nvidia cards have less raster power. 16 ROPs @ 600Mhz vs. 16 ROPs @800 Mhz. And having to perform AA in them, still is almost 50% faster (9800GTX).

2- You just don't increase everything, and I mean everything just to let that be a bottleneck...

#270

lemonadesoda

If "geometry" = "bump mapping" (in its broadest sense, and including the auto-tesselation concept first introduced by ATi as "TruForm"... yes, I owned a Radeon 8500) then yes, shaders can do this, and = great for games.

If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD.

TBH, I don't know how to interpret the Stream Processor comment (SP) in the RV770 architecture. How has SP changed R600 to R700? I really dont know. With the comments about "no architectural change" with RV770, I assumed SP was the same. I could well be wrong on this one.

#271

DarkMatter

lemonadesodaIf "geometry" = "bump mapping" (in its broadest sense, and including the auto-tesselation concept first introduced by ATi as "TruForm"... yes, I owned a Radeon 8500) then yes, shaders can do this, and = great for games.

If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD.

TBH, I don't know how to interpret the Stream Processor comment (SP) in the RV770 architecture. How has SP changed R600 to R700? I really dont know. With the comments about "no architectural change" with RV770, I assumed SP was the same. I could well be wrong on this one.

Vertex (geometry) data has ALWAYS been done in vertex shaders. Since R500 (Xenon), G80 and R600 and their unified shaders this is done in shader or stream processor, which packs vertex shaders, pixel shaders and geometry shaders in the same unit, to say it in some way.
More complex objects require more SPs not more ROPs, in no way. You do need more ROP for Z calculations, unless this is done in SPs as I suggested. But AGAIN vertex data is treated in SPS no ROPS.

Also tesselation is taking a simple model and make it more complex, in the sense of more vertex and polygons. It has nothing to do with bump mapping, except that may use bump maps to have some sort of control on how that NEW vertex data would be, instead of just making the same as TurboSmoth does in 3DSMax for example.

#272

MrMilli

lemonadesodaIf "geometry" = "bump mapping" (in its broadest sense, and including the auto-tesselation concept first introduced by ATi as "TruForm"... yes, I owned a Radeon 8500) then yes, shaders can do this, and = great for games.

If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD.

TBH, I don't know how to interpret the Stream Processor comment (SP) in the RV770 architecture. How has SP changed R600 to R700? I really dont know. With the comments about "no architectural change" with RV770, I assumed SP was the same. I could well be wrong on this one.

lemonadesoda, i firmly believe you must be pulling our leg.
If not, then please (i'm asking nicely), stop. Just stop because everything you say is wrong.
For the sake of all of us and for your own embarrassment, please stop.

<strike>Bumb</strike>(lol) Bump mapping (MAPPING: the word says it already) has nothing to do with geometry.
You are still connecting geometry to a T&L unit which doesn't exist anymore in modern GPU's. It's emulated on the shaders.

About the shaders on the RV770: they run at 1050Mhz. That why the GFlop increases so much.

That's the last time i'm going to correct you and i'm not comming back to this thread. You ruined it.

#273

btarunr

Editor & Senior Moderator

MrMillilemonadesoda, i firmly believe you must be pulling our leg.
If not, then please (i'm asking nicely), stop. Just stop because everything you say is wrong.
For the sake of all of us and for your own embarrassment, please stop.

Bumb mapping (MAPPING: the word says it already) has nothing to do with geometry.
You are still connecting geometry to a T&L unit which doesn't exist anymore in modern GPU's. It's emulated on the shaders.

About the shaders on the RV770: they run at 1050Mhz. That why the GFlop increases so much.

That's the last time i'm going to correct you and i'm not comming back to this thread. You ruined it.

Ehm, that's bump-mapping. Bumbs are the heavy things we all carry, there's not much to map, really, except occasional goose-pimples, hair and a deep gorge in the middle.

#274

lemonadesoda

Traditional

Unified Shader

If you had a "screen render" that fitted into the existing pipeline "4 cycles", single pass for each cycle in the rendering stage... as shown in the diagram, then increasing the number of shaders doesnt change anything. The spare-capacity doesnt help. A low FSAA, AA, 1280x1024 can "fit in" the "4 cycle" path, single pass for each stage.

If you have a scene that is 1920x1200 with 16x, 16x, then a screen render will require more than one pass through each stage.

In instance A, clock speed will get you faster FPS. Shaders doesnt help much.

In instance B, increasing the shaders means more can be done in each pass, meaning fewer passes, ultimately getting to just one single pass through each stage. Here, gains are from increased shaders in addition to increased clocks.

That's how I've always understood it. If there is a fallacy with the logic... let me know.

#275

lemonadesoda

MrMilliBump mapping has nothing to do with geometry. You are still connecting geometry to a T&L unit which doesn't exist anymore in modern GPU's. It's emulated on the shaders.

Please note the word "If" meaning that, under the situation you might be calling bump mapping geometry effects (which they are)... then all well and true. I did not SAY geometry=bump mapping.

As for the second statement I made, If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD, then YES, I withdraw that statement. It is wrong for Unified Shaders architecture DirectX10 Shader Model 4.0. It is only true for previous generation GPU.

Add your own comment

ATI Radeon HD 4800 Series Video Cards Specs Leaked

278 Comments on ATI Radeon HD 4800 Series Video Cards Specs Leaked

Latest GPU Drivers

New Forum Posts

Popular Reviews

TPU on YouTube

Controversial News Posts