Thursday, April 24th 2008
ATI Radeon HD 4800 Series Video Cards Specs Leaked
Thanks to TG Daily we can now talk about the very soon to be released ATI HD 4800 series of graphics cards with more details. One week ahead of its presumable release date, general specifications of the new cards have been revealed. All Radeon 4800 graphics will use the 55nm TSMC produced RV770 GPU, that include over 800 million transistors, 480 stream processors or shader units (96+384), 32 texture units, 16 ROPs, a 256-bit memory controller (512-bit for the Radeon 4870 X2) and native GDDR3/4/5 support as reported before. At first, AMD's graphics division will launch three new cards - Radeon HD 4850, 4870 and 4870 X2:
Source:
TG Daily
- ATI Radeon HD 4850 - 650MHz/850MHz/1140MHz core/shader/memory clock speeds, 20.8 GTexel/s (32 TMU x 0.65 GHz) fill-rate, available in 256MB/512MB of GDDR3 memory or 512MB of GDDR5 memory clocked at 1.73GHz
- ATI Radeon HD 4870 - 850MHz/1050MHz/1940MHz core/shader/memory clock speeds, 27.2 GTexel/s (32 TMU x 0.85 GHz) fill-rate, available in 1GB GDDR5 version only
- ATI Radeon HD 4870 X2 - unknown core/shader clock speeds, available with 2048MB of GDDR5 memory clocked at 1730MHz
278 Comments on ATI Radeon HD 4800 Series Video Cards Specs Leaked
My point was simple, I don't care about saving energy to power my system up.
I just would like AMD to release something worth building (for myself). The 4870 x2 and whatever else they decide to release I hope is not just a play on words. Its been real close many times to join the dark side. But will hold off this one last time.
quote:
In terms of performance, we heard some interesting claims. A 4870 should perform on par with or better than a dual-chip 3870 X2.
lemonadesoda? reading this? ;)
But I dont think the reasons they give will result in such a performance gain:
#1. 480 vs. 320 shaders = 50% improvement in the BEST POSSSIBLE situation, ie. purely shader limited.
#2. 32 vs. 16 TMU = 100% improvement... now I actually think THIS is going to have a bigger impact.
#3. 16 ROPS vs. 16 ROPS = no change here or to architecture.
#4. PCIe v1.0 controller? Well, check my benchies... my AGP is as fast as a PCIe16 card... given similar processor and proc speed. No. The interface is irrelevant UNLESS the graphics assets are in memory and not on the card.
#5. As I have always said, there will be increases associated with increased clocks, but points 1-4 refer to clock for clock gainst.
Net net? 50%-100% improvement IN THE BEST sitation (clock for clock) depending on where the limit was, ie shader limit or resolution limit.
On average? Less than 50%.
In practice. For the average person, FPS at, say, 1280x1024 will not improve by more than 20-30%. But you WILL BE ABLE to dial up much higher FSAA and AA without performance penalty. (And PLEASE read that as "much performance penalty". Its a relative comment, not supposed to mean exactly 100% same performance :rolleyes:)
First off, the 3870X2 gpu's are clocked at 825Mhz. The 3870 is clocked at 777Mhz. So i don't know why you compare it with a 3850 (670Mhz btw).
Check out this for refrence: techreport.com/articles.x/14284/5
So a 3870X2 is only 4% slower than a 3870 CF. The reason why it's slower is because only one CF bridge is connected onboard and one is still free for CF-X. Normal CF uses two bridges.
All things aside, 4% is nothing. So if they say it's as fast or faster than a 3870X2 then that means around 70% faster than a 3870. That's what that sentence mean. Nothing more, nothing less.
A couple of things i need to rectify (again):
You shouldn't compare the amount of shaders but the GFlop they can compute.
RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.
Also i don't care what a GPU can do at 1280x1024. That resolution is mostly cpu bound.
If you want to compare GPU's, you need to go over 1600x1200. That's just the way it is.
One thing is the interface between the card and the system and another completely different one is the one inside the card, the PCIe bridge. The one they are reffering to is the bridge chip between the two RV670 cores. They are using PCIe as they could have used Hyper Transport or another one. They used this for driver compativility, I'm 99,99% sure. I guess they are using it to comunicate between the cores (obvious), but most importantly to get some kind of cache coherency between them. Why this coherency is important? Because that way one core can use the info calculated by the other. AFAIK normal Crossfire (and SLI) does little of this, each card renders odd frames or lines or something (pixel quads, clusters, whatever, let's call them pixel arrays), while the other renders even parts. If the array in core 1 takes 10 times more to render than the one in core 2, you lose a lot of time waiting. You need some kind of comunication between them to let core2 continue the work of core1 without doing a mess. PCIe bandwidth while more than enough for texture and general data transfers between main memory and the card, is pretty slow for that kind of work. For a comparison PCIe 1.1 has a maximum bandwidth of 4 GB/s, while typical CPU chaches are around 30-50 GB/s. PCIe 2.0 increases to 8 GB/s which is still far away, but definately better. Only Ati knows why they used PCIe 1 in the first place, knowing this, but they learnt and move ahead. Let's hope it turns out better this time around.
My bad, i read elsewhere that the 3870X2 "is actually 3850X2 but marketed as 3870X2". Yep, I should be more careful about what info I pick up and pass on. Thanks for the correction.
RV670 = 497GFlop
RV770 = 1008GFlop
So shader power is increased more than 100%.
If that is true, then GREAT! But hasnt RV770 been advertised as no architectural change? For 50% increase in shaders to give 100% increase in power *must* require quite a different architectural approach. If that's true, then 4870 will be a winner, baby!
@darkmatter,
everytime you prove your lack of diplomacy. Man, they must have been tough on you at school.
3870X2 = 3850X2 with overclock. Fact. Both use GDDR3. Put the 3870X2 at the same core clocks as 2x 3870 in crossfire (on GDDR4) and which will win?
The 'different' architecture comes in the form of shaders having their own clock-generator, shaders are clocked well above 1 GHz while the geometry domain stays below 800 MHz.
How can that be? Well, since GDDR5 would make memory bandwidth double of that in HD3000 series, there's only raster power left. You could have argued the weight of ROPs in the final performance, and say that my thoughts about them were wrong, which could be true AT HIGHER RESOLUTIONS, and not in 1280x as you are saying. If shader and texture power is double that of RV670 there are no reasons to say it won't be 2x faster, specially at lower resolutions where ROPS don't count as much. It's an incongruity everything you said, and that's why I say you don't know about this. That and the fact that by your posts, it seems that you act as if fuctional units (ROP, SP, TMU) and clocks were independent and had nothing to do with each other, or something of the like. Like shaders did only AA, like extra TMUs only work when bigger textures are loaded and are idle otherwise, etc. Example of this is when MrMilli said RV770 has double the GFlops you say: Maybe I'm understanding this badly, but it seems as if you took that as magic. Like it *must* be something underlaying there, something shaddy. It demostrates your lack of knowlegde IMO.
Comparing P4 to Core2 you can just go, Chip A vs. Chip B. Oh look chip B is faster. Or you can break it down, and analyse the performance on things what you can set independently. Much better to compare A and B at-the-same-clock first, to see architectural gains, then observe the additional gain/loss through different clockspeeds. Likewise (in the CPU world) with amount of cache, or number of cores.
With the RV770, you can break it down to:
1./ Increase in shaders ---> impact, and in what situations
2./ Increase in TMU ---> impact, and in what situations
3./ Increase in ROP ---> impact, and in what situations
4./ Change in memory type ---> impact
5./ Increase in clocks ---> impact (unknown at start of thread, although strong speculation now about what they will be, but until in retail channels, we really dont know what is "consumer stable" from the product manufacturers).
With the R770 what is ATi trying to address? The shader and texture "wall" at high resolutions for greater FPS. For regular resolutions? The benefit is being able to dial up higher AA and FSAA. I still hold the view that at a regular 1280x1024 without (or low) AA, FSAA, the performance gains will be relatively small. At high resolutions like 1920x1600, or when at 8xx and 16xx FSAA, AA, thats where the gain will be.
It's quite clear from the benchmarks that the RV770 *will* be very fast HD.4870.3Dmark06benchmark.leak.html=21,223 :D.
I'm very happy to listen to any argument except the lame "you demonstrate lack of knowledge", "you're not very versed, are you?". I find it insulting, and your continued use of it demonstrates a major lack of politeness and bellicose attitude.
Refer to post #218. Please do not try to kindle old flames. This was dealt with. Turn off your microphone. The jury's out until the cards are in.
Either way, you can buy me cards and i'll do an independant review for you
You have three guys making cheeseburgers, one does the meat (A), the second (B) does the cheese and the third takes that, the bread and puts it together (C).
Analyticaly:
- If instead of A we put two guys, we won't get any benefit and will only get an improvement in those situations where you need more than one guy, i.e if you want to put two meat sticks per burger (sorry I don't know their actual names).
- If we use two B guys it happens the same, unless you want more cheese in the mix.
- Same with C, with the difference that there are proofs that C, indeed, is more than enough to put more burgers together than what A and B can provide.
A is SPs, B is TMUs and C are ROPs, we could add D guy who would provide the products as well as carry the finished ones, memory subsystem and platform, including chipset and CPU. So if we double A, B or C independently we won't get any benefit, but if we improve A and B and C can truly handle the new income of products (and again we have proofs it could be that way), we will either be able to provide more burgers or same amount of burgers with more meat/cheese in each burger. Comparatively in the graphics card, we will be able to provide either more complex image 1920x1200 4xAA 16xAF at same speed or more frames of lesser complexity. BOTH!
EDIT: And following with the example. You say we won't see as much of an improvement on low resolutions and AA/AF levels, and that's right, but not for the reasons you say. It's not because of A,B or C guys, it's because D is not able to provide the resources and carry the large amount of finished burgers that others are generating! They have told you so already, it's because of the CPUs you won't see such an improvement on those settings...
"Have it your way!"
"Can you taste the fire?"
:roll:
Yes, its all down to where the bottleneck is. I guess we have different positions on where the bottleneck is... AND... we are looking at different points of the spectrum where gains (or roadblocks) will be.
- When is the GPU shader constrained... RV770 fixes this
- When is the GPU texture fill constrained... RV770 fixes this
- When is the GPU memory constrained... RV770 fixes this in the XT version with GDDR5, but nor RV770 Pro, and although GDDR5 has higher clocks, lowerer power consumption (important given the GPU core will need more power), it is also higher latency. We need to see benchmarks for the net net.
- When is the GPU vertex, polygon, z-plane, ROP contrained... RV770 does not fix this, except for the core "overclock", which is, in fact, pretty much the same as a regular overclock 3870.
For each resolution, the impact of the above will be different.For each FSAA, AA setting, the impact of the above will be different.
In some situations, it will be very low improvement, in others up to 100% improvement. But the 100% improvement will only exist if current performance is limited by THAT specific bottleneck.
It's going to be mixed results. 1920x1600 will be REAL winners. But if you are on 1280x1024, I still maintain it wont be worth the upgrade UNLESS you are trying to get to 16xAA 16xFSAA. At 0x, 2x or 4x, I'm not convinced, at 1280x1024, the performance improvement will be that great. Why, because at that resolution and those FSAA AA settings the GPU *is not* shader or TMU constrained. Anyway, i await with interest the first benchmarks that come out.
But first of all, we are not discussing the impact these cards will have on PCs or games of today, i.e. if someone wants to upgrade and have a big improvement, but the actual power of the card. We have already said why you won't see a big improvement now, but you will when new CPU/chipsets launch, a jump you won't get with HD3870 because it's not as much platform bottlenecked as HD4000 will be.
And second vertex and poly data are done in SPs and not in ROPs and I would also assume that since AA is done in shaders, z-depth, at least z-culling is done in SPs in the Radeons as well. So they don't have to repeat the work you know. Anyway since geometry complexity is not going up, according to the trend followed lately, that won't be a problem. If there's going to be any improvement in geometry complexity, this will be done by geometry shaders and tesselation basically. Shaders once again.
EDIT: Just to be clear. My point is there's nothing to fix in ROP arena. Reasons for that are given in previous posts, but basically because:
1- Nvidia cards have less raster power. 16 ROPs @ 600Mhz vs. 16 ROPs @800 Mhz. And having to perform AA in them, still is almost 50% faster (9800GTX).
2- You just don't increase everything, and I mean everything just to let that be a bottleneck...
If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD.
TBH, I don't know how to interpret the Stream Processor comment (SP) in the RV770 architecture. How has SP changed R600 to R700? I really dont know. With the comments about "no architectural change" with RV770, I assumed SP was the same. I could well be wrong on this one.
More complex objects require more SPs not more ROPs, in no way. You do need more ROP for Z calculations, unless this is done in SPs as I suggested. But AGAIN vertex data is treated in SPS no ROPS.
Also tesselation is taking a simple model and make it more complex, in the sense of more vertex and polygons. It has nothing to do with bump mapping, except that may use bump maps to have some sort of control on how that NEW vertex data would be, instead of just making the same as TurboSmoth does in 3DSMax for example.
If not, then please (i'm asking nicely), stop. Just stop because everything you say is wrong.
For the sake of all of us and for your own embarrassment, please stop.
<strike>Bumb</strike>(lol) Bump mapping (MAPPING: the word says it already) has nothing to do with geometry.
You are still connecting geometry to a T&L unit which doesn't exist anymore in modern GPU's. It's emulated on the shaders.
About the shaders on the RV770: they run at 1050Mhz. That why the GFlop increases so much.
That's the last time i'm going to correct you and i'm not comming back to this thread. You ruined it.
Unified Shader
If you had a "screen render" that fitted into the existing pipeline "4 cycles", single pass for each cycle in the rendering stage... as shown in the diagram, then increasing the number of shaders doesnt change anything. The spare-capacity doesnt help. A low FSAA, AA, 1280x1024 can "fit in" the "4 cycle" path, single pass for each stage.
If you have a scene that is 1920x1200 with 16x, 16x, then a screen render will require more than one pass through each stage.
In instance A, clock speed will get you faster FPS. Shaders doesnt help much.
In instance B, increasing the shaders means more can be done in each pass, meaning fewer passes, ultimately getting to just one single pass through each stage. Here, gains are from increased shaders in addition to increased clocks.
That's how I've always understood it. If there is a fallacy with the logic... let me know.
As for the second statement I made, If "geometry" = "more complex objects" then no, shaders wont help, and = not so great for CAD, then YES, I withdraw that statement. It is wrong for Unified Shaders architecture DirectX10 Shader Model 4.0. It is only true for previous generation GPU.