Wednesday, August 26th 2009
AMD Demos 48-core ''Magny-Cours'' System, Details Architecture
Earlier slated coarsely for 2010, AMD fine-tuned the expected release time-frame of its 12-core "Magny-Cours" Opteron processors to be within Q1 2010. The company seems to be ready with the processors, and has demonstrated a 4 socket, 48 core machine based on these processors. Magny Cours holds symbolism in being one of the last processor designs by AMD before it moves over to "Bulldozer", the next processor design by AMD built from ground-up. Its release will provide competition to Intel's multi-core processors available at that point.
AMD's Pat Conway at the IEEE Hot Chips 21 conference presented the Magny-Cours design that include several key design changes that boost parallelism and efficiency in a high-density computing environment. Key features include: Move to socket G34 (from socket-F), 12-cores, use of a multi-chip module (MCM) package to house two 6-core dies (nodes), quad-channel DDR3 memory interface, and HyperTransport 3 6.4 GT/s with redesigned multi-node topologies. Let's put some of these under the watch-glass.Socket and Package
Loading 12 cores onto a single package and maintaining sufficient system and memory bandwidth would have been a challenge. With the Istanbul six-core monolothic die already measuring 346 mm² with a transistor-load of 904 million, making something monolithic twice the size is inconceivable, at least on the existing 45 nm SOI process. The company finally broke its contemptuous stance on multi-chip modules which it ridiculed back in the days of the Pentium D, and designed one of its own. Since each die is a little more than a CPU (in having a dual-channel memory controller, AMD chooses to call it a "node", a cluster of six processing cores that connects to its neighbour on the same package using one of its four 16-bit HyperTransport links. The rest are available to connect to neighbouring sockets and the system in 2P and 4P multi-socket topologies.
The socket itself gets a revamp from the existing 1,207-pin Socket-F, to the 1,974-pin Socket G34. The high pin-count ensures connections to HyperTransport links, four DDR3 memory connections, and other low-level IO.Multi-Socket Topologies
A Magny-Cours Opteron processor can work in 2P and 4P systems for up to 48 physical processing cores. The multi-socket technologies AMD devised ensures high inter-core and inter-node bandwidth without depending on the system chipset IO for the task. In the 2P topology, one node from each socket uses one of its HyperTransport 16-bit links to connect to the system, the other to the neighbouring node on the package, and the remaining links to connect to the nodes of the neighbouring socket. It is indicated that AMD will make use of 6.4 GT/s links (probably generation 3.1). In 4P systems, it uses 8-bit links instead, to connect to three other sockets, but ensures each node is connected to the other directly, on indirectly over the MCM. With a total of 16 DDR3 DCTs in a 4P system, a staggering 170.4 GB/s of cumulative memory bandwidth is achieved.Finally, AMD projects a up to 100% scaling with Magny-Cours compared to Istanbul. Its "future-silicon" projected for 2011 is projected to almost double that.
Source:
INPAI
AMD's Pat Conway at the IEEE Hot Chips 21 conference presented the Magny-Cours design that include several key design changes that boost parallelism and efficiency in a high-density computing environment. Key features include: Move to socket G34 (from socket-F), 12-cores, use of a multi-chip module (MCM) package to house two 6-core dies (nodes), quad-channel DDR3 memory interface, and HyperTransport 3 6.4 GT/s with redesigned multi-node topologies. Let's put some of these under the watch-glass.Socket and Package
Loading 12 cores onto a single package and maintaining sufficient system and memory bandwidth would have been a challenge. With the Istanbul six-core monolothic die already measuring 346 mm² with a transistor-load of 904 million, making something monolithic twice the size is inconceivable, at least on the existing 45 nm SOI process. The company finally broke its contemptuous stance on multi-chip modules which it ridiculed back in the days of the Pentium D, and designed one of its own. Since each die is a little more than a CPU (in having a dual-channel memory controller, AMD chooses to call it a "node", a cluster of six processing cores that connects to its neighbour on the same package using one of its four 16-bit HyperTransport links. The rest are available to connect to neighbouring sockets and the system in 2P and 4P multi-socket topologies.
The socket itself gets a revamp from the existing 1,207-pin Socket-F, to the 1,974-pin Socket G34. The high pin-count ensures connections to HyperTransport links, four DDR3 memory connections, and other low-level IO.Multi-Socket Topologies
A Magny-Cours Opteron processor can work in 2P and 4P systems for up to 48 physical processing cores. The multi-socket technologies AMD devised ensures high inter-core and inter-node bandwidth without depending on the system chipset IO for the task. In the 2P topology, one node from each socket uses one of its HyperTransport 16-bit links to connect to the system, the other to the neighbouring node on the package, and the remaining links to connect to the nodes of the neighbouring socket. It is indicated that AMD will make use of 6.4 GT/s links (probably generation 3.1). In 4P systems, it uses 8-bit links instead, to connect to three other sockets, but ensures each node is connected to the other directly, on indirectly over the MCM. With a total of 16 DDR3 DCTs in a 4P system, a staggering 170.4 GB/s of cumulative memory bandwidth is achieved.Finally, AMD projects a up to 100% scaling with Magny-Cours compared to Istanbul. Its "future-silicon" projected for 2011 is projected to almost double that.
104 Comments on AMD Demos 48-core ''Magny-Cours'' System, Details Architecture
Massive multi-core processing is the way of the future, because there's a limit to how fast you can get a single core to go. I'm not being mean, but if you're not believing this by now, you're deluding yourself.
When they get it under 50W, it'll start appearing in dells.
its a when, not an if :)
Box stays out of the way, all you have are personal and purpose (kitchen, etc) terminals.
One PC in the home does all the work, the rest just get it streamed.
we can stream 1080P content from a PC to a 360 (re-encoding in a compatible format if needed), it wont be any harder for a game to be done the same way.
"but mussels, that would suck! if my brother started encoding a video while i was gaming i'd lag out!"
well, how much does it suck when he flushes the toilet when you're in the shower? people live with compromises for convenience/cheapness
This will run at 75W at about 2.3-2.4-2.5 ghz around there, nothing more at 45 NM.
Theese will deffy come with a 32 NM shrink.
Theese will be very cold to have the amount of cores.
Cache is reworked, memory latency decreased, quad memory channel per cpu meaning 8 memory slots so they double the memory bandwidth.
This is possible when doubling amount of cpu die's and IMC that follows with it per cpu package.
*Wonders how intel respons.*
on the topic, I believe future processors would present themselves as much simpler units than they really are.
A big help is fast storage... just built two of these for work:
Has an Adaptec 5805 with 8 x WD RE3. Does about 600/400 read/write and is hella responsive. I've run drive benchmarks in one VM while using another, and I couldn't perceive any loss of performance. All the while WCG is running (4T @ 100%). I love these machines :)
Designers and those who use them, complain about memory and graphics power, not cpu power, they didnt complain about cpu power with 2x dualcores xeons.
Heh, I hope there's no complaining now. :D
well, the 192 gb/92 gb provides no issues except gpu performance, snap in a 4870x2 and shut em up ;D
Well, 32 gb is an issue....
They could do:
Single socket quad. @ 3ghz.
4870x2.
32 gb memory, does a better job cause better videocard.
So to put it this way, they have never complained about cpu power appearantly.
Memory and gpu power is the issue.
But the chiefs doesnt want the high memory cap. comp, so they complained with the 32 gb comp.
To put it this way:
They use OVER TWO hours to load due to memory restrictions(16gb memory that is) for some cads and drawings. they complain cause they're tired of browsing through all the newspapers on the webby.
Id love to bring up Task Manager in front of my m8s when there looking at the screen lol
Simply put, if you got a nail and you need to hammer it in, would you rather have one really big hammer or 48 tiny hammers?
There's a few occassions where multiple cores are good but, those few times are exactly that, a few (no more than four). Faster cores are preferred over SMT. Except the high bandwidth (186.624 MB/s for 1920x1080 24-bit color and 30 FPS) and, if wireless, latency.
the problem with them as it all went over fsb so was limited to 2 sockets because of lack of bandwidth between the cores. amd doesnt have that problem because of its hypertransport connections between all the cores
. intel will probibly build a similar system soon with 6 core i7 xeons as they have QPI links that get over the fsb problems that they had before.
it makes you think that they thought they made a mistake making there native quads then seen intels multi chip quads and thought hmm we could do that with our quads!
Oh... the bandwidth. I'm getting tingly.
just my two cents.
Intel has already shown working 4-way and 8-way Nehalem-EX systems. What's wrong with the diagram?
Of course having Rectal Hector in charge of AMD didn't hurt, but I'm giving the edge to Satan.
Regardless, this trend of adding more and more cores won't persist forever. Very few tasks benefit from SMT or even AMT.
Edit: Just remember, of the same architecture, a dual-core at 3.2 GHz is faster than a quad-core at 1.6 GHz. The more cores you have, the more overhead is involved in keeping them all busy.
you seem to think its hard, but its a feature built into the latest windows 7 - it can recode HD video and stream it to other PC's (or consoles/extenders) on the fly. you're seeing it as starting from nothing, i'm seeing it as an application for an existing tech.