...and rightfully they should. '50% more processing power' was the description given for the Opteron replacement; and even then, that doesn't say anything at all about the rest of the hardware. Number of channels of lower speed, lower latency
dedicated memory per core is what gives real-world throughput, not just bench numbers.
(I apologize ahead of time for any meanderings that may be momentary rants, lol. I mean to mostly be informative, and hopefully with some humor, but atm I'm a bit upset with AMD
)
People can't saturate their memory bandwidth because it can't be done. The bus is fine; the access is what is lacking. The problem is hardware limitations when you try to address the same target with too many cores, and greater ram speed is almost moot: Fewer dedicated memory paths than the amount of cores causes contention among cores, among many other things I mention below. You can still fill your bucket (memory) slowly, with a slow hose (low # of ram channels @ higher speed = higher memory controller & strap latencies, memory latencies, core-contention, bank, rank, controller interleaving, all while refreshing, strange ratios, etc) -but this is not 'performance' when it is time that we want to save. I can't just be excited that I can run 6 things 'ok.'
Case in point:
Look at this study performed by Sandia National Labratories in ALBUQUERQUE, N.M. (Please note that they are terming multi-core systems as 'supercomputers'):
https://share.sandia.gov/news/resou...lower-supercomputing-sandia-simulation-shows/
In a related occurrence, look at how the 'cpu race' topped out. More speed in a single core just resulted in a performance to consumption ratio just made a really good shin-cooker. Instead, the answer was a smaller die process with low-power, moderate speed SMP cores; much like nVidia's Cuda or ATI's Stream. Memory controllers/ram is no different.
What good is a ridiculously fast DDR HT-bus when you can't send solid, concurrent dedicated data streams down it to memory because of all the turn-taking and latencies? It's like taking turns with water spigots down a huge hose that has a small nozzle on it.
You can't achieve high throughput with current (and near-future) configurations. Notice that as soon as we said "Yay! Dual-core AND dual channel ram!" it quickly became, "what do you mean "controller interleave?" ...But then - the fix: Ganged memory at half the data width.
...And there was (not) much rejoicing. (yay.)
Why did they do this?
(
My opinion only): It continues, always more, to look like it's because they won't ever, ever give you what you want, and it doesn't matter who the manufacturer is. They need you NEEDING the next model after only 6 months. Look at these graphics cards today. I almost spit out my coffee when I read the benchmarks for some of the most recent cards, priced from $384 to $1100. At 800 mHz and up, some sporting dual gpu and high-speed 1GB-2GB gddr5 getting LESS THAN HALF of the framerates of my 5 year old 600 mhz, ddr3 512MB, pci-e card; same software, with all eye-candy on, and my processor is both older and slower than those showcased. It's obviously not a polygon-math issue. What's going on? Are we going backwards? I can only guess that Cuda and Stream have cores that fight over the memory with a bit-width that is still behind.
I also do 3d animation on the same system that I game with, transcode movies, etc, etc, etc. So far, I have tested both Intel and AMD multi-core multi-threading with real-world, specifically compiled software
only (thanks to Intel's filthy compiler tricks.) Engaging additional cores just results in them starving for memory access in a linear fashion. In addition, so far all my tests suggest that
no more than 4GB of DDR3-1066 @ 5-5-5-15-30 can be filled on dual channel... at all, on 2 through 6 core systems. (On a side note: WOW- my Intel C2D machine, tested with non-Intel compiled (lying) software, performs like an armless drunk trying to juggle corded telephones with his face.)
Anyway, the memory speed needed for current configurations would be well over what is currently available to even match parallel processing performance for the 3:1 [core : mem controller] ratio when you're done accounting for latencies, scheduling, northbridge strap, throughput reduction due to spread-spectrum because of the high frequency, etc, etc.
So in conclusion, more parallel, lower speed, low latency controllers and memory modules (with additional, appropriate hardware) could generate a system with a far greater level of real,
usable throughput. Because I would much prefer, but will most likely not be able to afford, a Valencia quad-core (server core... for gaming too? -and then only if it had concurrent memory access,) - it looks like I'm giving up before Bulldozer even gets here.
6 cores and 2 channels for Zambezi? No thanks.
I'm tired of waiting, both for my renders,
and a pc that renders 30 seconds in less than 3 days.
(One final aside): About these 'modules' on the Bulldozer- wouldn't that rub additionally if it's the 'core 'a' passes throughput to core 'b'' design that was created in some of the first dual cores? Time will tell.
~Peace