Tuesday, August 24th 2010
AMD Details Bulldozer Processor Architecture
AMD is finally going to embrace a truly next generation x86 processor architecture that is built from ground up. AMD's current architecture, the K10(.5) "Stars" is an evolution of the more market-successful K8 architecture, but it didn't face the kind of market success as it was overshadowed by competing Intel architectures. AMD codenamed its latest design "Bulldozer", and it features an x86 core design that is radically different from anything we've seen from either processor giants. With this design, AMD thinks it can outdo both HyperThreading and Multi-Core approaches to parallelism, in one shot, as well as "bulldoze" through serial workloads with a broad 8 integer pipeline per core, (compared to 3 on K10, and 4 on Westmere). Two almost-individual blocks of integer processing units share a common floating point unit with two 128-bit FMACs.
AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.
So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.
Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.
AMD is also working on a multi-threading technology of its own to rival Intel's HyperThreading, that exploits Bulldozer's branched integer processing backed by shared floating point design, which AMD believes to be so efficient, that each SMT worker thread can be deemed a core in its own merit, and further be backed by competing threads per "core". AMD is working on another micro-architecture codenamed "Bobcat", which is a downscale implementation of Bulldozer, with which it will take on low-power and high performance per Watt segments that extend from all-in-One PCs all the way down to hand-held devices and 8-inch tablets. We will explore the Bulldozer architecture in some detail.Bulldozer: The Turbo Diesel Engine
In many respects, the Bulldozer architecture is comparable to a diesel engine. Lower RPM (clock-speeds), high torque (instructions per second). When implemented, Bulldozer-based processors could outperform competing processor architectures at much lower clock speeds, due to one critical area AMD seems to have finally addressed: instructions per clock (IPC), unlike with the 65 nm "Barcelona" or 45 nm "Shanghai" architectures that upped IPC synthetically by using other means (such as backing the cores up with a level-3 cache, upping the uncore/northbridge clock speeds), the 32 nm Bulldozer actually features a broad integer unit with eight integer pipelines split into two portions, each portion having its own scheduler and L1 Data cache.Parallelism: A Radical Approach?
Back when analysts were pinning high hopes on the Barcelona architecture, their hopes were fueled by early reports suggesting that AMD was using wide 128-bit wide floating point units, leading analysts to believe that AMD may have conquered its biggest nemesis - floating point performance, in turn its pure math crunching abilities. However, that wasn't exactly to be. That's because the processor's overall number crunching abilities were pegged to its floating point performance, ignoring the integer units.AMD split 8 integers per core into two blocks, each block having four integer pipelines, an integer scheduler for those, and an L1 Data cache. These constitute the lowest level of "dedicated components", dedicated to processor threads. There is a shared floating point unit between the two, with two 128-bit FMACs, arbitrated by a floating point scheduler. The Fetch/Decode, an L2 cache, and the FPU constitute "shared" components.AMD is implementing a simultaneous multithreading (SMT) technology, it can split each of the "dedicated" components (in this case, the integer unit) to deal with a thread of its own, while sharing certain components with the other integer unit, and effectively make each set of dedicated components a "core" in its own merit of efficiency. This way, the actual core of the Bulldozer die is deemed a "module", a superlative of two cores, and the Bulldozer die (chip) features n-number of modules depending on the model.So now you have a chip with eight cores with much lower die sizes and transistor counts compared to a hypothetical 32 nm K10 8-core processor. It is unclear whether AMD wants to further push down SMT to the "core" level and run two threads simultaneously over dedicated components, but one thing for sure is that AMD has embraced SMT in some form or another. In all this, the chip-level parallelism is transparent to the operating system, it will only see a fixed number of logical processors, without any special software or driver requirement.
So in one go, AMD shot up its integer performance. Either a thread makes use of one integer unit with its four pipelines, or deals with both the integer units arbitrated by the fetch/decode, and the shared FPU.
Outside the modules
At the chip-level, there's a large L3 cache, a northbridge that integrates the PCI-Express root complex, and an integrated memory controller. Since the northbridge is completely on the chip, the processor does not need to deal with the rest of the system with a HyperTransport link. It connects to the chipset (which is now relegated to a southbridge, much like Intel's Ibex Peak), using A-Link Express, which like DMI, is essentially a PCI-Express link. It is important to note that all modules and extra-modular components are present on the same piece of silicon die. Because of this design change, Bulldozer processors will come in totally new packages that are not backwards compatible with older AMD sockets such as AM3 or AM2(+).Expectations
Not surprisingly, AMD isn't talking about Bulldozer as the next big thing since dual-core processors (something it did with Barcelona). AMD currently does have an 8-core and 12-core processors codenamed "Magny-Cours", which are multichip modules of Shanghai (4-core) and Istanbul (6-core) dies. AMD expects an 8-core Bulldozer implementation (built with four modules), to have 50% higher performance-per-watt compared to Magny-Cours.Market Segments
As mentioned in the graphic before, AMD's modular design allows it to create different products by simply controlling the number of modules on the die (by whichever method). With this, AMD will have processors ready with most PC and server market segments, all the way from desktop PCs, enthusiast-grade PCs, notebooks, to servers. AMD expects to have a full-fledged lineup in 2011. The first Bulldozer CPUs will be sold to the server market.
283 Comments on AMD Details Bulldozer Processor Architecture
After overtaking Nvidia in the gfx department, that really would be something...
And am I alone in thinking these will be big chips? What with everything on them...
Magny-cours mean Phenom II?
However, it seems they were kinda broke, and maybe it is only the 5800 series success that put Bulldozer back on the drawing board. I thought they abandoned the project for lack of resources or something.
Whatever the reality is, I hope it is worth waiting... AMD deserves a high-end CPU that will kick Intel line in the arse... it's maybe time for another performance crown switch like in good old Athlon vs Pentium days...
Really why did AMD bring out 890fx chipset, just for the shitty 1090t it was nothing the 790fx couldn't handle. you ripped off customers and you don't have enuf to do that . i hope your shitty chip sucks.
FUCK YOU AMD
AMD never forced anyone to buy 890FX boards.
You can drop a Phenom II X6 in a 790FX board and it works fine. :slap:
The fact is many 890FX boards (with more features) are released at a lower price than the original 790FX boards, so you comment makes no sense at all.
This one sounds really fresh and new, and I think that only with new ideas and new tech can AMD kick anything/anyone... but the bucket...
9 month support and only 3 new chips evan if i pay $20 for my crosshair iv it would still be a rip off
I think this would bear any meaning if it was a lament on LGA1155 and the "other guy".. you know? with an "I"? :D
just joking
My friends Gigabyte 790GX board (from the Phenom I days) still supports the new X6s.
The 790FX chipsets is even older than that.
The 890FX chipset is just an update to replace the old 790FX, and there is nothing wrong with that.
The same way that the X48 replaced the X38. The difference is the 790FX is already well over 2 years old when the 890FX is released.
^ Magny-Cours, not "Cores"
If they can't that would mean the end of nVidia chipsets on AMD (and all pc related from here forward). Wonder if they will allow SLI on AMD boards? My guess would be no since they are a graphics competitor, but only time will tell.
i own a X48 and have nothing bad to say about it.
X48 was board i had befor upgrading to my 890fx, i could have gone 1366 drop a i7 930 and later i7 970.
He's entitled to say what he likes after all :laugh:
You might just want to wait until later today when AMD makes their speech at Hot Chips to see if they verify this.
I think I can already see a new problem here, with the drop-by readers. HT for Hyper Threading and not Hyper Transport... :D
IMHO, but I don't think they should cram to much of their "legacy" tech inside this kind of novelty... I think that Bulldozer deserves a new platform from scratch...
EDIT: I agree on the first part. :D
On the second part, this is a kind of "paper (re)roll-out". So, only number that could come out is probably a number of slides from that presentation. :roll:
But, I get you. I am also eager to see how (and where) this baby goes...