Tuesday, August 24th 2010

AMD Details Bulldozer Processor Architecture
AMD is finally going to embrace a truly next generation x86 processor architecture that is built from ground up. AMD's current architecture, the K10(.5) "Stars" is an evolution of the more market-successful K8 architecture, but it didn't face the kind of market success as it was overshadowed by competing Intel architectures. AMD codenamed its latest design "Bulldozer", and it features an x86 core design that is radically different from anything we've seen from either processor giants. With this design, AMD thinks it can outdo both HyperThreading and Multi-Core approaches to parallelism, in one shot, as well as "bulldoze" through serial workloads with a broad 8 integer pipeline per core, (compared to 3 on K10, and 4 on Westmere). Two almost-individual blocks of integer processing units share a common floating point unit with two 128-bit FMACs.
AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.
So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.
Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.
AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.
So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.
Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.
283 Comments on AMD Details Bulldozer Processor Architecture
I didn't expect this board to actually last this long for me, and I could drop in a higher end X4 part and still be happy. I have been building AMD for the last few years now and the fact that I can still get chips to fit older boards for a cheap performance upgrade, and or a board to fit a older chip is amazing from my days as a Intel man.
I have quite a few Intel chips at home from when a board dies, but nothign to do with them and they are now worthless as the boards don't work. I still have a S939 board and chip at home.
$300 amd beating $1000 intel. well if it did beat it wouldn't be $300 rofl.
dont think people remember the Althon FX and unlike intel extreme new one use to come out every few months
its why AMD is still around today they offer good enough and close enough at a lower price they dont beat intels counterparts very often but they put up a damn good show of it and a 1090T at 4ghz+ is a damn good chip for $300.
And sure we remember the athlon FX but it was out in a time when intel was still shitting out P4s and AMD had the performance crown. having the crown means you can charge more money for your shit especially if you have good marketing :toast:
It depends what you want, AMD definitely offers the best bang for buck, but if you're trying to reap all out performance, where that's where Intel shines. In multithreaded games and benchmarks, nothing can touch the 980x(especially when overclocked). The current AMD CPU's will always give you good enough performance. And whether or not that's what you want depends on the person.
If Bulldozer is cheap and can actually beat an i7 this time, then i'll be moving to that platform.
www.anandtech.com/bench/Product/146?vs=46
as shown here 1090T vs i7 940 both trade blows off and on and the i7s only dominace comes in Far Cry 2 and let me tell you its a LANDSLIDE in favor of the i7 in that game :roll: but point is if i drop down to the 920 it becomes even more in favor of the 1090 but thats more due to the higher clock rate helping make up for lesser clock to clock performance.
but at the end of the day its what fits the bill for whats needed... with the $50 rebate + bing cash back tiger direct had awhile ago a 1090T could be grabbed for $225 which is a damn good deal but if i had to go intel now id go 1156 due to the fact the 860 tends to perform a tad bit better then the 920 the mobo is cheaper and dual channel ddr3 is more then enough and it will still hang with the 1090T or any amd cpu as we all know
and yes same if bulldozer is revolutionary and performs great ill switch out as well but im holding my breathe ;) call me a skeptic
Xeon just about keeps me satisfied, and the 5770 is okay.... for now....
The newest fanciest games I get between 22-35 fps D:
Hopefully getting a job soon at CEX ( Computer shop! WOOOO) so expect my rig to have lots of upgrades if I get the job XD
And the $700 was worth every penny to me. ;) I still fail to understand why people use games to test cpu performance. 4 year old cpus still game just fine. It's kind of a pointless test for cpu power.
But, to be honest, I was looking forward to Thuban at first, then I found out it only matches i7 quads clock for clock in the stuff I do. That's when I skipped on a gfx upgrade, and went with the 980X instead. 4870X2 is still plenty for most games, but my QX wasn't doing the trick for me anymore.
I really hope Bulldozer lives up to expectations tho. Competition at the high end will do us some justice. I'll sell my rig and go AMD if they can pull ahead.
ROFL
then lets not forget the switch to global foundries etc whos to say they wont hit a snag fact is by the time bulldozer is in full swing it will be 2012 in my honest opinion and in that situation its still nearly 2 years for 890fx and honestly im not butt hurt i didnt by an 890fx i paid $110 for a 790fx u get what u pay for and in the tech world u pay the price for the lastest and greatest
its ati that skipped 32nm going to 28nm.
While I'd personally love to see TSMC lose ATi's business, I doubt GloFo actually has the capacity to produce 32nm vga chips, without affecting cpu or chipset outputs.
And heres the link.
www.xbitlabs.com/news/other/display/20100401144643_Globalfoundries_Scraps_32nm_Bulk_Fabrication_Process.html
It's a simple case of copy-paste, which doesn't make it any more right or wrong than the rest, nor adds or subtracts from the level of misinformation out there.:p
@Bloodcrazz: trolling is never right, even if he was trying to prove some point.
@Everyone Else: AMD sure is taking long to announce the HD6000s officially if they're coming out as soon as Oct/Nov '10; do you think they're trying to squeeze the announcement as close as they can to the Bulldozer for the sake of the Scorpius platform?
I know ATi have that rep but they did good with HD 4XXX series drivers, so makes me think a good portion of their time is spent on the HD 6XXX series; like mad optimizations, improved loaders for shaders, more accurate CCC overdrive with voltage control, at least 85% xfire scaling. I do look for some of all i've said to happen but i def don't think it all will; that would be too perfect n makes nvidia shit themselves lol
Just as I spoke though: ATI Radeon HD 6000 Series GPU Codenames Surface
Still unofficially however I take it... :-/
TIME FOR SOME MARKETING AMD! That's where Intel and nVidia trump AMD/ATi almost everytime, and I say almost, not because there's been a time I've seen otherwise, but I just assume they must have marketed better at least once!
so nvdia make it like havoc but optimize it further by supporting more than quad core,
It still uses old instruction sets mind you.
an 8600gt for example has twice the performance of my CPU at the moment.
I'd expect my CPU to be as good as 8600gt if phsyx used newer instruction sets for the cpu.