Friday, September 24th 2010
AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache
Documents related to the "Orochi" 8-core processor by AMD based on its next-generation Bulldozer architecture reveal its cache hierarchy that comes as a bit of a surprise. Earlier this month, at a GlobalFoundries hosted conference, AMD displayed the first die-shot of the Orochi die, which legibly showed key features including the four Bulldozer modules which hold two cores each, and large L2 caches. In coarse visual inspection, the L2 cache of each module seems to cover 35% of its area. L3 cache is located along the center of the die. The documents seen by X-bit Labs reveal that each Bulldozer module has its own 2 MB L2 cache shared between two cores, and an L3 cache shared between all four modules (8 cores) of 8 MB.
This takes the total cache count of Orochi all the way up to 16 MB. This hierarchy suggests that AMD wants to give individual cores access to a large amount of faster cache (that's a whopping 2048 KB compared to 512 KB per core on Phenom, and 256 KB per core on Core i7), which facilitates faster inter-core, intra-module communication. Inter-module communication is enhanced by the 8 MB L3 cache. Compared to the current "Istanbul" six-core K10-based die, that's a 77% increase in cache amount for a 33% core count increase, 300% increase in L2 cache per core. Orochi is built on a 32 nm GlobalFoundries process, it is sure to have a very high transistor count.
Source:
Xbit Labs
This takes the total cache count of Orochi all the way up to 16 MB. This hierarchy suggests that AMD wants to give individual cores access to a large amount of faster cache (that's a whopping 2048 KB compared to 512 KB per core on Phenom, and 256 KB per core on Core i7), which facilitates faster inter-core, intra-module communication. Inter-module communication is enhanced by the 8 MB L3 cache. Compared to the current "Istanbul" six-core K10-based die, that's a 77% increase in cache amount for a 33% core count increase, 300% increase in L2 cache per core. Orochi is built on a 32 nm GlobalFoundries process, it is sure to have a very high transistor count.
152 Comments on AMD Orochi ''Bulldozer'' Die Holds 16 MB Cache
I am sorry if it seamed like i was calling you a troll, i was not and that was not my intention.
Anyway's maybe from hearing something like that has lead to the thought that this is going to be based on servers.
I curious to know if Bulldozer solutions would provide any substantial boost in performance when paired with a 6000 series card over a Intel solution? I'm not sure if you can comment on such a thing though..
Where can I see some reviews and or benchmarks of this, because from what i've seen on the web so far ICH10/R has superior performance to any of AMD's southbridges.
That also includes faster USB 2.0 speeds on intel chipsets aswell.
I must admit i hope the 6xxx and 7xxx cards go well with bulldozer as i'm hoping all of them meet my needs and budget over the next year or so.
Was going to get an am3 set up for x-mas but I'll hold on to my dying intel set up til bulldozer is out.
Feel free to send me AM3+ set up for free for testing and review purposes! XD
As to whether the bulldozer core was designed as a client or a server core, someone else had it right. We have leveraged the same core for both products in the past and will continue to do so (Istanbul/Thuban was the only recent departure).
There are client features turned off in servers and vice versa.
Typically because it is used for both people assume that single threaded performance and clock speed are not going to be good. I would not worry about either of those.
Although i admit that i really don't know much about the server chips/differences between them and client chips and if there is any difference with the chips designed for say 1U upto 4U, is there anything you could say to give a little insight into the differences?
We used the same die for 1000 through 8000 series. That is why we consolidated the line down to 4000/6000 and removed the 4P price premium. Today you can buy a top end AMD 2P or 4P for the same price, $1386, or you could buy the Intel 2P at $1663 and the 4P at ~$3682. Trust me, their 4P is not 2.5X faster than the 2P.
Plus, by having the same core top to bottom, when you write software and customize it, you aren't dealing with 3 different platforms, only 1. Even our chipsets are identical, top to bottom. Software people love that, network administrators love that, and if you are doing virtualization it makes things so much easier because you can easily move VMs around.
On the intel side their 4P is old technology and generally lags by a year from 2P. 3 different platforms, 3 different chipsets, lots of inconsistencies.
I'm intending to learn a lot more about the server side as i intend to buy my first real home made server instead of just making client hardware act as a server with missing server features(although i only really know of eec as one of the major things im missing). it is one of the reasons i'm so interested in bulldozer as with interlagos chips i am hoping to build a very powerful server to last me as long as the hardware does. But of corse this wont be any time soon as i am not expecting it to come cheap.
I expect it to be more expensive (even more so with fast high density ecc ddr3) but then of corse it will all be down to if the extra cost is worth the extra features and speed and to be honest i am hoping it will be, plus i am in no hurry to spend several thousand £ on a server that wont be fully used for near a year :laugh: