Thursday, November 18th 2010

AMD Zambezi ''Bulldozer'' Desktop CPU Roadmap Revealed
AMD's next-generation PC processor architecture that seeks to challenge the best Intel has, codenamed "Bulldozer", is set to make its desktop PC debut in 2Q next year, with a desktop processor die codenamed "Zambezi". AMD is seeking to target all market segments, including an enthusiast-grade 8-core segment, a performance 6-core segment, and a mainstream 4-core segment. The roadmap reveals that Zambezi will make its entry with the enthusiast-grade 8-core models first, starting with 125W and 95W models, trailed by 6-core and 4-core ones.
Another couple of architectural details revealed is that Zambezi's integrated memory controller (IMC) supports DDR3-1866 MHz as its standard memory type, just like Deneb supports DDR3-1333 MHz as its standard. DDR3-1866 MHz, or PC3-14900 as it's technically known, will churn out 29.8 GB/s in dual-channel mode, that's higher than triple-channel DDR3-1066 MHz (25.6 GB/s), which is Intel Core i7 LGA1366 processors' official memory standard. The 8-core and 6-core Zambezi models feature 8 MB of L3 cache, while the 4-core ones feature 4 MB. Another tidbit you probably already knew is that existing socket AM3 processors are forwards-compatible with AM3+ (Zambezi's socket), but Zambezi processors won't work on older AM3/AM2(+) socket motherboards.
Source:
ATI Forum
Another couple of architectural details revealed is that Zambezi's integrated memory controller (IMC) supports DDR3-1866 MHz as its standard memory type, just like Deneb supports DDR3-1333 MHz as its standard. DDR3-1866 MHz, or PC3-14900 as it's technically known, will churn out 29.8 GB/s in dual-channel mode, that's higher than triple-channel DDR3-1066 MHz (25.6 GB/s), which is Intel Core i7 LGA1366 processors' official memory standard. The 8-core and 6-core Zambezi models feature 8 MB of L3 cache, while the 4-core ones feature 4 MB. Another tidbit you probably already knew is that existing socket AM3 processors are forwards-compatible with AM3+ (Zambezi's socket), but Zambezi processors won't work on older AM3/AM2(+) socket motherboards.
123 Comments on AMD Zambezi ''Bulldozer'' Desktop CPU Roadmap Revealed
Already tried NVIDIA encoding, and it's the same deal: a little faster, but still no Adobe and no point in it.
CPU encoding is the best and will always be the best as there is no driver issues. We need a CPU which is a TRUE fusion of a CPU and a GPU, like Cell, but not so fail and better single threading. Nah, the AMD plugin for Premiere only works on AMD CPU + ATI Radeon <- it will never be an AMD Radeon. Even AMD themselves think so when you look at the download drivers section: www.amd.com/au/Pages/AMDHomePage.aspx
And IIRC from a review the Escalade was all plastic from inside (as most US cars are).
I have AMD/ATI and the stream encoder/decoder don't work with the program they are supposed to work with. So AMD can kiss my ass with their digital dream. Once they make good with the promises I have paid for, with my last five cards from them I might consider them again, but really I am moving to Nvidia once they get their issues fixed.
It's not installed by default so a lot of things don't work if you just try running them.
You have to download the sdk to get it working.
Shit flies when decoding on my gpus!
2.) Like it or not, they are one of Cadillac's top of the line premium models.
3.) I didn't compare it to other makers. I used Cadillac as a parallel to AMD. I didn't say it was the best car in the world, I used Escalade to prove the point that poeple know what brand it is because of the premium models. I never once mentioned the value of the car, or compared it to other makers.
You have completely missed the point of the exercise.
Certain cities even ask fees if people wish to go through the downtown district by car (to disencourage the use of them and encourage people to go by bike, scooter, mass transit or whatever) , because of the traffic congestion. So SUV drivers are, when using such vehicles downtown, quite being frowned upon, because they hinder the other traffic so much.
And in Asia the traffic is often even worse than in Europe. I should have realised that. My bad. If you want to make such a point, better pick a more globally known car brand as a parrallel (Asian or Euro brands, like Hyundai, Toyota, Mercedes, BMW, etc.). Cars like the Escalade are barely known here, I actually only know it because of tv. The only US brands selling well in Europe are Ford and General Motors (under the names of Chevrolet (mostly former Daewoo models) and Opel/Vauxhall). Not completly my fault, IMHO, if your parallels don't work that well for non-Americans. I mean, I can try my best of course but there is some chance of "failure". ;)
So either I choose to shoot at lower resolution than a few years old Canon common format high def camcorder can do and go back to 90's formats, or suck it up and continue to spend days on projects.
Yeah, ATI/AMD can suck it.
Just wait.
(I apologize ahead of time for any meanderings that may be momentary rants, lol. I mean to mostly be informative, and hopefully with some humor, but atm I'm a bit upset with AMD ;) )
People can't saturate their memory bandwidth because it can't be done. The bus is fine; the access is what is lacking. The problem is hardware limitations when you try to address the same target with too many cores, and greater ram speed is almost moot: Fewer dedicated memory paths than the amount of cores causes contention among cores, among many other things I mention below. You can still fill your bucket (memory) slowly, with a slow hose (low # of ram channels @ higher speed = higher memory controller & strap latencies, memory latencies, core-contention, bank, rank, controller interleaving, all while refreshing, strange ratios, etc) -but this is not 'performance' when it is time that we want to save. I can't just be excited that I can run 6 things 'ok.'
Case in point:
Look at this study performed by Sandia National Labratories in ALBUQUERQUE, N.M. (Please note that they are terming multi-core systems as 'supercomputers'):
share.sandia.gov/news/resources/news_releases/more-chip-cores-can-mean-slower-supercomputing-sandia-simulation-shows/
In a related occurrence, look at how the 'cpu race' topped out. More speed in a single core just resulted in a performance to consumption ratio just made a really good shin-cooker. Instead, the answer was a smaller die process with low-power, moderate speed SMP cores; much like nVidia's Cuda or ATI's Stream. Memory controllers/ram is no different.
What good is a ridiculously fast DDR HT-bus when you can't send solid, concurrent dedicated data streams down it to memory because of all the turn-taking and latencies? It's like taking turns with water spigots down a huge hose that has a small nozzle on it.
You can't achieve high throughput with current (and near-future) configurations. Notice that as soon as we said "Yay! Dual-core AND dual channel ram!" it quickly became, "what do you mean "controller interleave?" ...But then - the fix: Ganged memory at half the data width.
...And there was (not) much rejoicing. (yay.)
Why did they do this?
(My opinion only): It continues, always more, to look like it's because they won't ever, ever give you what you want, and it doesn't matter who the manufacturer is. They need you NEEDING the next model after only 6 months. Look at these graphics cards today. I almost spit out my coffee when I read the benchmarks for some of the most recent cards, priced from $384 to $1100. At 800 mHz and up, some sporting dual gpu and high-speed 1GB-2GB gddr5 getting LESS THAN HALF of the framerates of my 5 year old 600 mhz, ddr3 512MB, pci-e card; same software, with all eye-candy on, and my processor is both older and slower than those showcased. It's obviously not a polygon-math issue. What's going on? Are we going backwards? I can only guess that Cuda and Stream have cores that fight over the memory with a bit-width that is still behind.
I also do 3d animation on the same system that I game with, transcode movies, etc, etc, etc. So far, I have tested both Intel and AMD multi-core multi-threading with real-world, specifically compiled software only (thanks to Intel's filthy compiler tricks.) Engaging additional cores just results in them starving for memory access in a linear fashion. In addition, so far all my tests suggest that no more than 4GB of DDR3-1066 @ 5-5-5-15-30 can be filled on dual channel... at all, on 2 through 6 core systems. (On a side note: WOW- my Intel C2D machine, tested with non-Intel compiled (lying) software, performs like an armless drunk trying to juggle corded telephones with his face.)
Anyway, the memory speed needed for current configurations would be well over what is currently available to even match parallel processing performance for the 3:1 [core : mem controller] ratio when you're done accounting for latencies, scheduling, northbridge strap, throughput reduction due to spread-spectrum because of the high frequency, etc, etc.
So in conclusion, more parallel, lower speed, low latency controllers and memory modules (with additional, appropriate hardware) could generate a system with a far greater level of real, usable throughput. Because I would much prefer, but will most likely not be able to afford, a Valencia quad-core (server core... for gaming too? -and then only if it had concurrent memory access,) - it looks like I'm giving up before Bulldozer even gets here.
6 cores and 2 channels for Zambezi? No thanks.
I'm tired of waiting, both for my renders, and a pc that renders 30 seconds in less than 3 days. ;)
(One final aside): About these 'modules' on the Bulldozer- wouldn't that rub additionally if it's the 'core 'a' passes throughput to core 'b'' design that was created in some of the first dual cores? Time will tell.
~Peace
Your eye candy is due to DX rendering paths, your older card will render in DX 8 or 9 at OK framerates, but the newer cards will struggle to render all the advanced features of DX11 that make the small differences.
Try HL2 CM 10 with all high settings. I can do it, why can't you?
But yes you are right about core starvation, I see it happen on my system, increasing the core speed helps alleviate the problem to a small degree, just the effect of reduced latencies, adding more RAM won't help, higher RAM speed won't help, we are entering the era of needing 1 stick of 2GB RAM for one core on its own memory path, or being able to read and write to different parts of RAM and track what becomes available to read and write to and where it is for the next step in the process. Almost like RAM RAID.
I feel I need to clear some things up though. Before firing of with 'bullshit':
1) I developed no such theory; it was purely a facetious 'speculation' based upon a real-world observation; As usual, lack of inflection in writing causes these problems. I will annotate such comments in the future, seeing as the words 'I can only guess' doesn't appear to get that idea across.
2) Direct X 11 rendering paths have nothing to do with DX 9 benchmarks of DX 9 games just because it's tested on DX11 compatible hardware.
3) I am only familiar with HL2. I don't know why you're asking why I can't run something I neither mentioned nor tried. But now that you bring it up, DX 11, like DX 10, were touted as being easier to render by compatible hardware and thus requiring less processing power. Why then, if you want to compare DX9 to 10 or 11, are respective framerates much lower in these title add-ons, with even more powerful hardware at the same resolutions?
Thanks.
Have a nice day.
AMD is delivering dual channel with up to 50% greater throughput than current products. Isn't that a better option?
As to quad channel on the desktop, if triple channel was not clearly a product differentiator, why would quad be any better? Sometimes people get caught up in the specs but they don't focus on the output.
If dual channel and quad channel were about the same in throughput would you rather have dual channel with lower cost and lower power or quad with higher cost and higher power?
AMD already has quad-channel, just not for the desktop.
What would make a proper difference, if dual-channel could be coupled to a per-core setup.