# AMD to Unveil Next-Generation APUs on November 11



## btarunr (Nov 1, 2013)

As a follow-up to our older article on how December-January will play out for AMD's next-generation APU lineup, we have news that the company will unveil, or at least tease its next-generation desktop APU, codename "Kaveri," on November 11, 2013. It's when the company will host its APU'13 event, modeled along the lines of GPU'13, held in Hawaii this September, where it unveiled its Radeon R9 200 and R7 200 GPU families. On its backdrop, the company will also hold its 2013 AMD Developer Summit, which brings together developers making software that take advantage of both CPU and OpenCL-accelerated GPUs. APU'13 will be held in San Jose, USA, and like GPU'13, will be live-streamed to the web. In addition to new APUs, the company is expected to make some big announcements with its HSA (heterogeneous system architecture) initiative that brought some big names in the industry on board. 





The agenda for APU'13 follows.



4:00 - 5:00 p.m. (PST), Monday, November 11:
o Lisa Su, senior vice president & general manager, Global Business Units, AMD: "Developers: The Heart of AMD Innovation"
o Phil Rogers, corporate fellow, AMD: "The Programmers Guide to Reaching for the Cloud"
8:30 - 9:30 a.m. (PST), Tuesday, November 12:
o Mike Muller, CTO, ARM: "Is There Anything New in Heterogeneous Computing?"
o Nandini Ramani, vice president, Java Platform, Oracle Solutions: "The Role of Java in Heterogeneous Computing, and How You Can Help"
1:15 - 2:15 p.m. (PST) Tuesday, November 12:
o Dr. Chien-Ping Lu, senior director, Mediatek USA: "How Many Cores Will We Need?"
o Tony King-Smith, executive vice president, Marketing, Imagination Technologies: "Silicon? Check. HSA? Check. All done? Wrong!"
8:30 - 9:30 a.m. (PST), Wednesday, November 13:
o Dominic Mallinson, senior vice president, Software, Sony: "Inside PlayStation 4: Building the Best Place to Play"
o Brendan Iribe, CEO, Oculus VR: "Virtual Reality - A New Frontier in Computing"
1:15 - 2:15 p.m. (PST) Wednesday, November 13:
o Johan Andersson, technical director, DICE: "Rendering Battlefield 4 with Mantle"
o Mark Papermaster, CTO, AMD: "Powering the Next Generation Surround Computing Experience"
Image Credit: VR-Zone

*View at TechPowerUp Main Site*


----------



## Slomo4shO (Nov 1, 2013)

After sitting through the debacle that was GPU'13, I don't think I will ever participate in another AMD streamed event and will, instead, opt to read the overviews the next day


----------



## ZetZet (Nov 1, 2013)

It would be so awesome if they bring back phenom x6 on fm2+, but they won't, hopefully new athlons perform well.


----------



## Dent1 (Nov 1, 2013)

ZetZet said:


> It would be so awesome if they bring back phenom x6 on fm2+, but they won't, hopefully new athlons perform well.



You mean bring back the Phenom name or the Phenom (Deneb/Thuban) architecture?


----------



## RCoon (Nov 1, 2013)

Dent1 said:


> You mean bring back the Phenom name or the Phenom (Deneb) architecture?



Bring back TRUE hex cores. As an FM2 guy, I don't want anymore module business. Give me a true quad core without hardware hyperthreading. My 1055t still makes me depressed at how good it is after all these years in comparison to my FM2 setup.


----------



## SIGSEGV (Nov 1, 2013)

http://www.sisoftware.eu/rank2011d/...e0ddeccaa29faa8cf4c9f8debbdee3d3f586bb83&l=en
http://www.planet3dnow.de/cms/?p=3279&preview=true


----------



## Dent1 (Nov 1, 2013)

I think the modules will start to come in to their own now. Want to see how the next generation of console ports fair especially.



SIGSEGV said:


> http://www.sisoftware.eu/rank2011d/...e0ddeccaa29faa8cf4c9f8debbdee3d3f586bb83&l=en



What am I looking at, talk us through it.


----------



## ZetZet (Nov 1, 2013)

Dent1 said:


> You mean bring back the Phenom name or the Phenom (Deneb/Thuban) architecture?


Phenom name with 6 actual cores.


----------



## buildzoid (Nov 1, 2013)

ZetZet said:


> Phenom name with 6 actual cores.



You do realize that the FX 63XX and 83XX lines beats Phenoms in games and benchmarks (Including SCII which is most single threaded game ever made)


----------



## Dent1 (Nov 1, 2013)

SIGSEGV said:


> http://www.sisoftware.eu/rank2011d/...e0ddeccaa29faa8cf4c9f8debbdee3d3f586bb83&l=en
> http://www.planet3dnow.de/cms/?p=3279&preview=true




Google Translate: 

Benchmark results of AMD on the Internet " Kaveri " chips are being discussed, which have appeared on the Bench portal of Sisofts Sandra . The test results are less interesting than the output hardware configuration :

If the information is correct , it would " Kaveri " remarkable about 13 Compute Units ( CUs ) - which corresponds to 832 shader - can have and would thus view of computing power between a Radeon HD 7770 (10 CUs/640 shader) and a Radeon HD 7790 ( 14 CUs / 896 shaders) .

Overview of graphics performance makes this because of the much lower memory bandwidth (128-bit DDR3 on socket FM2 + vs. 128 -bit GDDR5 on the graphics card ) , which is about a quarter , not much sense. However, the pure computing power with GPGPU beats by eg OpenCL benchmarks. In SiSoft achieved a " Kaveri " in an encryption test thus more than twice the power of a "rich country " A10- 6800K with the Radeon HD 8670D GPU called internal :


The GPU of the " Kaveri " overclocked but with only 600 MHz and accesses DDR3 -1600, while the A10- 6800K system is clocked at 844 MHz and DDR3 -2133 features . The strong growth in terms of GPU later versions with DDR4 would also be declared .


This system scored only 5.8 GB / s, which is exactly the ratio 8/13 to 9.4 GB / s of the 13- CU system. Thus, one would have a strong indication for the 13 CUs . A 100% security but there is not natural. Maybe it was only in May of any performance robbing bug is now fixed , but the CUs are now read wrong. Up to the Windows version ( Windows 7 / Windows 8) and a slightly different version of Sandra ( 19:35 / 19:44 ) the configuration are identical. The same user also has an A10- 5800K system, in which also the graphics part only runs at 600 MHz. This system achieved the benchmark only 2.6 GB / s, which is even less than a third of the putative 13 -CU- Kaveri .



A closer look at the Sandra results , we noticed that when the system information 1 device is shown with 2 threads. This is strange because usually far - even for large cards - only one thread per device is specified . 2 threads there is only Crossfire setups then 2 devices . However, we can not say exactly how a possible linkage of " Kaveri " and " Hainan" GPU ( 5 CUs have the chance ) would be in the system at the moment. In previous hybrid CrossFire setups are indeed always called both crisps and displayed by Sandra , but this could change due to technical reasons (HSA ) or simple marketing considerations ( AMD's new naming scheme) . Coupling of 8 " Kaveri " CUs and 5 " Hainan" CUs would naturally explain the increased benchmark .

The other benchmarks are unspectacular , the measured 15 GB / s memory bandwidth with expectations higher than its predecessor . According to the leaked inside information Kaveri will indeed have an internal 256 -bit bus per module , so that one could expect a higher single-thread memory throughput. Interesting perhaps is that the graphics unit is described as "AMD Radeon R5 M200 Series" :



Because of the sloping number of 13 CUs of course also raises the question to what extent it might not be a read error . Conveniently, the same encryption result a user has already uploaded but in May , which ran on a " Kaveri " with the expected 8 CUs/512 shader according to system info:




buildzoid said:


> You do realize that the FX 63XX and 83XX lines beats Phenoms in games and benchmarks (Including SCII which is most single threaded game ever made)



But to be fair in some apps "beat" translates into a small boost.


----------



## RCoon (Nov 1, 2013)

buildzoid said:


> You do realize that the FX 63XX and 83XX lines beats Phenoms in games and benchmarks (Including SCII which is most single threaded game ever made)



The 8150 barely beat the Phenom X6's.
http://www.techpowerup.com/reviews/AMD/FX8150/11.html
Sure the 8350 is faster, but it's a slight disgrace at how effectively the Phenom X6's keep up with AMD's current gen of Bulldozer CPU's.


----------



## Dent1 (Nov 1, 2013)

RCoon said:


> The 8150 barely beat the Phenom X6's.
> http://www.techpowerup.com/reviews/AMD/FX8150/11.html
> Sure the 8350 is faster, but it's a slight disgrace at how effectively the Phenom X6's keep up with AMD's current gen of Bulldozer CPU's.




TBH. As much as I love TPU reviews. Those games were not stressing out any of the CPUs listed. When you're seeing fame rates of up to 214 FPS you know
there is little "stress" in the stress testing.


I don't think the modules were the only problem. Because I made an observation in another thread:
The FX 8150 8 core is actually slower than the FX 6300 6 core in a lot of applications too. In Dawn of War II and Dragon Age Origins 12 FPS and 15 FPS separate the two. The FX 6300 seems faster in some of the non-gaming apps too, despite the 2 core, 2MB L2 and 100Mhz handicap. I can only put this down to the Piledriver refinements outweighing the Bulldozer architecture and both are module based.

http://www.anandtech.com/bench/product/434?vs=699


----------



## repman244 (Nov 1, 2013)

buildzoid said:


> You do realize that the FX 63XX and 83XX lines beats Phenoms in games and benchmarks (Including SCII which is most single threaded game ever made)



The 63xx is a hit or miss. In some areas it's better but in some it's just horrible and trails behind the X6. Not really an upgrade from the X6.



RCoon said:


> The 8150 barely beat the Phenom X6's.
> http://www.techpowerup.com/reviews/AMD/FX8150/11.html
> Sure the 8350 is faster, but it's a slight disgrace at how effectively the Phenom X6's keep up with AMD's current gen of Bulldozer CPU's.



The CPU you should be comparing is the 8350 or 8320 not the old 8150. However you can still find tasks that the X6 does better.


----------



## ZetZet (Nov 1, 2013)

buildzoid said:


> You do realize that the FX 63XX and 83XX lines beats Phenoms in games and benchmarks (Including SCII which is most single threaded game ever made)


Yes, but they use too much power and am3+ mATX boards are terrible.


----------



## Recus (Nov 1, 2013)

I hope AMD will equip Kaveri with their exclusive technologies such as high TDP.


----------



## ensabrenoir (Nov 1, 2013)

Recus said:


> I hope AMD will equip Kaveri with their exclusive technologies such as high TDP.
> 
> http://cs419321.vk.me/v419321846/b9d3/Vxo-sx747dE.jpg



.....ok its early where i'm at and my contacts are fighting me......but is that a waffle iron made out of Amd cpu's?

EPIC


----------



## Jorge (Nov 1, 2013)

For those who don't understand the technical aspects of CPU architecture... a dual core module is nothing more than TWO CORES on one segment of a CPU. So if you have a 4 module CPU you have EIGHT genuine CPU cores, contrary to the foolishness perpetuated by some who claim that an 8-core FX processor is really a four core CPU. The fact that the front end decoder was feeding TWO CORES is where those who lack technical understanding went sideways. AMD's CPUs do in fact have exactly the number of cores that they are advertised to have. Steamroller and newer cores will also have more decoders and better fetch and this will increase overall performance.


----------



## RCoon (Nov 1, 2013)

Jorge said:


> For those who don't understand the technical aspects of CPU architecture... a dual core module is nothing more than TWO CORES on one segment of a CPU. So if you have a 4 module CPU you have EIGHT genuine CPU cores, contrary to the foolishness perpetuated by some who claim that an 8-core FX processor is really a four core CPU. The fact that the front end decoder was feeding TWO CORES is where those who lack technical understanding went sideways. AMD's CPUs do in fact have exactly the number of cores that they are advertised to have. Steamroller and newer cores will also have more decoders and better fetch and this will increase overall performance.



cores =/= modules
Also 8 modules and 4 FPU's means those jazzy doubled up modules mean jack to a great deal of programs.


----------



## Steevo (Nov 1, 2013)

RCoon said:


> cores =/= modules
> Also 8 modules and 4 FPU's means those jazzy doubled up modules mean jack to a great deal of programs.



Which is why many of us are still sitting here with our X6 running 4ghz and wondering where to go from here except to the blue camp. I built APU systems, Intel, 8350 with 32GB of 21xx RAM. 


Until there is a significant CPU performance improvement or I feel the need my X6 will stay.


----------



## ensabrenoir (Nov 1, 2013)

Constantine Yevseyev said:


> The picture says something like "Waffle iron based on all-new AMD FX-8350: 448 cores, 7K Watt TDP". Next phrase is an untranslatable play on words: it's a rude joke on fact that AMD gives you no choice (no more K10-like architecture based CPUs) using Russian slang verb ("to waffle").



Thanks..... Should've had the 9590's in there....then you really could've made waffles..


----------



## 1d10t (Nov 1, 2013)

Jorge said:


> For those who don't understand the technical aspects of CPU architecture... a dual core module is nothing more than TWO CORES on one segment of a CPU. So if you have a 4 module CPU you have EIGHT genuine CPU cores, contrary to the foolishness perpetuated by some who claim that an 8-core FX processor is really a four core CPU. The fact that the front end decoder was feeding TWO CORES is where those who lack technical understanding went sideways. AMD's CPUs do in fact have exactly the number of cores that they are advertised to have. Steamroller and newer cores will also have more decoders and better fetch and this will increase overall performance.



I'm not like what AMD doing in the first FX aka Bulldozer,but know i'm understand AMD is going to "another" direction.Just take a look a blue side,even though their traditional design and advanced 22nm technology doesn't giving them slightly advantage in daily use.Taking example,a $250 i5 + $50 GT 220 + $50 hard drive not slightly faster than $150 6800K + $100 SSD.
First step is to reducing floating point while reserving two integer units.A first batch gimped design,they perform underpar with previous Phenom II X6.After little tweaking,by giving two Decoder to feed Integer Unit and ALU within FPU they finally catch performance of single threading.A traditional concepts were 1+1,so major folks claims AMD didn't gave'em a proper core.With such a modular design,AMD can further tweak like creating more complex Fetch (divider,multiplexer,brancher,shifter),maximizing Decoder with two microCode each,giving more data path for Load-Store I/O within Integer and FP unit,maintain L2 cache coherency for Local Data and implementing eDRAM for Global L3 cache with advanced cache granularity shared with graphics compute unit.Notice this is why AMD "need" HSA to taking further serial and parallel task in one die.
Pretty much i'm delirious,taking nonsense without any proof or link...so take this with a huge grain of salt


----------



## FrustratedGarrett (Nov 1, 2013)

*Good News for Gaming!*

Those new APUs are going to be excellent for gaming, especially if developers decide to utilize the GCN logic on 'em through *Mantle* or *open-CL*. The high-end APU I believe has 13 GCN vector cores, each with 64 ALUs. 
GCN like Vector processors are the natural place to compute physics calculations because, like graphics, physics processing is driven by thousands of parallel and mutually independent vector workloads.
Shared memory access and heterogeneous queuing make those APUs the best gaming CPUs, if utilized the way they should be.


----------



## flynnski (Nov 1, 2013)

ensabrenoir said:


> Thanks..... Should've had the 9590's in there....then you really could've made waffles..



funny thing is that as bad as the reputation for the 9590s have been, Intel are no better when clocked to 5Ghz..









With Haswell things are not much better.. +18% clock costs over 60% more power (3900 to 4600mhz).. and that last 400Mhz is going to cost another 50%+ in power consumption to reach 5Ghz.


----------



## flynnski (Nov 1, 2013)

Steevo said:


> Which is why many of us are still sitting here with our X6 running 4ghz and wondering where to go from here except to the blue camp. I built APU systems, Intel, 8350 with 32GB of 21xx RAM.
> 
> 
> Until there is a significant CPU performance improvement or I feel the need my X6 will stay.



In the same boat,.. X6 1100T @ 4.2Ghz, see no reason to upgrade..


----------



## dwade (Nov 1, 2013)

AMD needs to address TDP issues for both their CPU and GPU. It's way too high, which is why they are losing to nVidia badly in the mobile market.


----------



## flynnski (Nov 1, 2013)

dwade said:


> AMD needs to address TDP issues for both their CPU and GPU. It's way too high, which is why they are losing to nVidia badly in the mobile market.



Got any recent data that shows that "they are losing to nVidia badly in the mobile market" ?

John Peddie








> (AMD).. APUs declined 9.6% from Q1 and *increased an astounding 47.1% in notebooks*. The company’s overall PC graphics shipments increased 10.9%.





> Nvidia’s desktop discrete shipments *were down 8.9%* from last quarter; and, the company’s mobile discrete shipments *decreased 7.1%*


----------



## TheoneandonlyMrK (Nov 1, 2013)

dwade said:


> AMD needs to address TDP issues for both their CPU and GPU. It's way too high, which is why they are losing to nVidia badly in the mobile market.



I find it strange that members with little to say all year can be assed throwing a little bait in Amd threads.
 cheers for the input though Dwade however , this being TPU , most readers here fecked efficiency right out the window on day 1 of their new build or old rebuild when they turned Eist / cool and quite off and all other eco features off then overclocked the snot out of it and left it like that eternally or until instability shows up only to step it back a bit

MOOOOOAAAARR POWERSSSS not less pls


----------



## ensabrenoir (Nov 1, 2013)

flynnski said:


> funny thing is that as bad as the reputation for the 9590s have been, Intel are no better when clocked to 5Ghz..
> 
> http://pctuning.tyden.cz/ilustrace3/obermaier/4770K/scaling_sandy.pnghttp://i.imgur.com/gzBNZwN.png
> 
> With Haswell things are not much better.. +18% clock costs over 60% more power (3900 to 4600mhz).. and that last 400Mhz is going to cost another 50%+ in power consumption to reach 5Ghz.



fair enough.... gotta admit though even if it was intel or nvidia in that picture.... it still would've been hilarious  And anyone who buy that level of cpu dont care about power consumption and tend to water cool so its all good


----------



## HisDivineOrder (Nov 1, 2013)

Only San Jose?

Were all the tropical island resorts taken this time, AMD?


----------



## Dent1 (Nov 1, 2013)

flynnski said:


> In the same boat,.. X6 1100T @ 4.2Ghz, see no reason to upgrade..



I can say the same thing with an Athlon II X4 @ 3.6Ghz.



dwade said:


> AMD needs to address TDP issues for both their CPU and GPU. It's way too high, which is why they are losing to nVidia badly in the mobile market.



Hence why AMD/ATI currently have a larger market share than Nvidia in the mobile market.


----------



## Ralfies (Nov 1, 2013)

flynnski said:


> funny thing is that as bad as the reputation for the 9590s have been, Intel are no better when clocked to 5Ghz..
> 
> http://pctuning.tyden.cz/ilustrace3/obermaier/4770K/scaling_sandy.pnghttp://i.imgur.com/gzBNZwN.png
> 
> With Haswell things are not much better.. +18% clock costs over 60% more power (3900 to 4600mhz).. and that last 400Mhz is going to cost another 50%+ in power consumption to reach 5Ghz.



Not doubting the accuracy of this, but it's a little hard to take these charts seriously when they can't even spell what they're measuring.


----------



## DeOdView (Nov 1, 2013)

Dent1 said:


> I can say the same thing with an Athlon II X4 @ 3.6Ghz.
> 
> 
> Didn't we (customers) wanted our CPU to lastttt long?  Funny, enough some complaint AMD socket, too!
> ...


----------



## Dent1 (Nov 1, 2013)

DeOdView said:


> I can say the same thing with an Athlon II X4 @ 3.6Ghz.
> 
> 
> Didn't we (customers) wanted our CPU to lastttt long?  Funny, enough some complaint AMD socket, too!
> ...



100% agree. When AMD was churning out CPUs they are saying slow down we just upgraded, damn corporate greed milking the consumer. Then they give us hardware which almost half a decade and people complain for something new.



Ralfies said:


> Not doubting the accuracy of this, but it's a little hard to take these charts seriously when they can't even spell what they're measuring.



Yh power "consuption". Think its the authors second language.


----------



## flynnski (Nov 1, 2013)

Dent1 said:


> 100% agree. When AMD was churning out CPUs they are saying slow down we just upgraded, damn corporate greed milking the consumer. Then they give us hardware which almost half a decade and people complain for something new.
> 
> 
> 
> Yh power "consuption". Think its the authors second language.



Iirc, the author is Czech/Slovak.. I assume that you communicate Czechoslovakian as well as they do English ???!


----------



## Ralfies (Nov 1, 2013)

flynnski said:


> Iirc, the author is Czech/Slovak.. I assume that you communicate Czechoslovakian as well as they do English ???!



I assumed as much, but that is beside the point. If I was publishing data in Czechoslovakian and trying to be taken seriously, you bet your ass I'd make sure there weren't any spelling errors. Once again, I'm not saying the data is false or inaccurate, just a little unprofessional. 

Anyways, back on topic. 832 GCN cores seems like a waste of space/power if they're just going to be held back by the memory bandwidth anyways. I'm thinking it'll be between 384 and 512 cores.


----------



## Dent1 (Nov 2, 2013)

flynnski said:


> Iirc, the author is Czech/Slovak.. I assume that you communicate Czechoslovakian as well as they do English ???!



Nope. But I wasn't criticising the authors spelling Ralfies was.  I was just point out the mistake so everyone knew what Ralfies was talking about.


----------



## flynnski (Nov 2, 2013)

Dent1 said:


> Nope. But I wasn't criticising the authors spelling Ralfies was.  I was just point out the mistake so everyone knew what Ralfies was talking about.



Ahh k apologies then


----------



## The Von Matrices (Nov 2, 2013)

I'm not expecting much of anything regarding hardware announcements, even during the press conference.  All the seminars involve software, and I expect that will be the theme of the conference.



Ralfies said:


> Anyways, back on topic. 832 GCN cores seems like a waste of space/power if they're just going to be held back by the memory bandwidth anyways. I'm thinking it'll be between 384 and 512 cores.



That bandwidth constraint is the issue.  I don't understand why mobile processors haven't gotten wider memory buses to compensate.  I can understand socketed desktop CPUs needing too many pins to support a wider memory bus, and also DIMM placement is an issue with a wide bus.  But modern notebooks use BGA CPUs and soldered down memory.  Theoretically a 256-bit DDR3 bus shouldn't require much more space in a laptop than a 128-bit bus, and the only increase in cost might be a PCB with a few more layers.  In exchange graphics performance would scale immensely.  Microsoft and Sony can do it for the APUs in their consoles and AMD's graphics division does it all the time for its GPUs, so I don't see why it isn't done for the mass market APUs.


----------



## alwayssts (Nov 2, 2013)

The Von Matrices said:


> That bandwidth constraint is the issue.  I don't understand why mobile processors haven't gotten wider memory buses to compensate.  I can understand socketed desktop CPUs needing too many pins to support a wider memory bus, and also DIMM placement is an issue with a wide bus.  But modern notebooks use BGA CPUs and soldered down memory.  Theoretically a 256-bit DDR3 bus shouldn't require much more space in a laptop than a 128-bit bus, and the only increase in cost might be a PCB with a few more layers.  In exchange graphics performance would scale immensely.  Microsoft and Sony can do it for the APUs in their consoles and AMD's graphics division does it all the time for its GPUs, so I don't see why it isn't done for the mass market APUs.



You're asking the questions that have boggled my mind since the conception of the APU, and certainly what I find the most interesting challenge moving forward.  

There are many conceivable answers, a wider bus among them (256-bit ddr3 would be sufficient for a ~512sp design), although perhaps less probable as we move to ddr4 and it's 1dimm-per-channel restriction and larger, more demanding iGPUs that will quickly outpace a 128-bit ddr4 bus.  Certainly there is bga, but I wonder if amd is really willing to take that leap with their larger designs (as a consumer platform, ie not the ps4 or iterations of bobcat).

Hypertransport, if not a discrete (or optional) gddr5 bus to a gpu cache (ala what used to be called Sideport Memory in the discrete IGP days) seemed like a realistic option even up to this generation.  While 32-bit, with the max bandwidth of a link resting somewhere near what gddr5 is capable on AMD's current gpu controllers, and meshing fairly nicely with being around half of what a 32/28nm iGPU would need (and twice what a 128-bit ddr3 bus could deliver), that would have more-or-less made sense.  Obviously moving past this gen it would be less so, unless itself coupled with a ddr4 bus (ie ddr4 + gddrX).

From there, we have the possibilities of larger/faster caches (like the X1's on-die ram) offsetting what is needed externally.  There is also the possibility of things like on-package off-die caches (not unlike Intel's Iris) as well stacked dram like Volta.

Whatever their solution, they need to do it yesterday.  Their strength is (and has always been) in the floating point computation per mm (per process/cost) their designs deliver.  While HSA capitalizes on this fact, as it should, with each passing node they lose that (realistic) advantage to intel, whom can ramp clocks higher until they reach parity in design (and then clock them lower to save power) even as their priority lies in improving their cpu cores.  With each passing gpu gen nvidia grows closer to parity, as they are clearly receding from purely thinking of their designs as efficient gpus to rather more-or-less a floating point core (that makes sense as such unit with or without the shell of a cpu).  The scary thing about all that is...intel and nvidia, those least dependant on memory bandwidth currently, have shown their plans for going forward.  AMD, whom already is restricted on all fronts by this reality, has not (outside the ps4.)  

I find that sincerely troubling.  No doubt they have an answer...I just hope it comes sooner rather than later.


----------



## The Von Matrices (Nov 2, 2013)

alwayssts said:


> You're asking the questions that have boggled my mind since the conception of the APU, and certainly what I find the most interesting challenge moving forward.
> 
> There are many conceivable answers, a wider bus among them (256-bit ddr3 would be sufficient for a ~512sp design), although perhaps less probable as we move to ddr4 and it's 1dimm-per-channel restriction and larger, more demanding iGPUs that will quickly outpace a 128-bit ddr4 bus.  Certainly there is bga, but I wonder if amd is really willing to take that leap with their larger designs (as a consumer platform, ie not the ps4 or iterations of bobcat).
> 
> ...



From what I've read about AMD's goals, they don't want a heterogeneous memory pool like the XBOX One where different memory addresses have different bandwidths and latencies.  AMD is pushing to have all memory addresses the same speed and latency in order to avoid the need for software to shuffle memory among different addresses in order to optimize bandwidth, sort of what is one with a discrete GPU today.  This doesn't eliminate an algorithm implemented in the core hardware managing more levels of cache (like what Intel does with Crystalwell), but AMD wants this to be transparent to the developer.

As far as DDR4, the doubled bandwidth will stave off the bandwidth limitation for a while but even without the need for more bandwidth the 1 DIMM/channel limitation will encourage wider memory buses.  The people who want lots of memory for the desktop or mobile will now need lots double the memory channels to achieve the same capacity with DDR4 as DDR3.  The server market already moved in this direction with DDR3; the reason for the migration to 256-bit buses were more for the sheer memory capacity of that many memory channels rather than the increased bandwidth.


----------



## NeoXF (Nov 2, 2013)

Again people with the same concerns and mentality... *sigh*

Let me put it simple... HSA > pure iGPU for games and crap.

HSA is ment as a revolution in x86... and possibly the only thing that can save it from a slow and painful death by ARM.

Seriously, while the iGPU part should be beastly, even if with the new IMC and faster DDR3 support, it'll still come short of it's potential... the great iGPU is far from the (only) point of Kaveri...

And I'm sure, on paper at least, adding an extra 192-256 ALUs make much more performance sense to AMD, than adding 2 extra cores.


----------



## Steevo (Nov 2, 2013)

The Von Matrices said:


> From what I've read about AMD's goals, they don't want a heterogeneous memory pool like the XBOX One where different memory addresses have different bandwidths and latencies.  AMD is pushing to have all memory addresses the same speed and latency in order to avoid the need for software to shuffle memory among different addresses in order to optimize bandwidth, sort of what is one with a discrete GPU today.  This doesn't eliminate an algorithm implemented in the core hardware managing more levels of cache (like what Intel does with Crystalwell), but AMD wants this to be transparent to the developer.
> 
> As far as DDR4, the doubled bandwidth will stave off the bandwidth limitation for a while but even without the need for more bandwidth the 1 DIMM/channel limitation will encourage wider memory buses.  The people who want lots of memory for the desktop or mobile will now need lots double the memory channels to achieve the same capacity with DDR4 as DDR3.  The server market already moved in this direction with DDR3; the reason for the migration to 256-bit buses were more for the sheer memory capacity of that many memory channels rather than the increased bandwidth.



No, they DO want it. It improves their performance in all facets.

http://arstechnica.com/information-...orm-memory-access-coming-this-year-in-kaveri/


Instead of having software decide where to run the process from, the hardware decides in real time which is more efficient, and then runs it. Addresses are the same, so no latency penalty for transporting it around. Hugely improved performance in DSP and other filtered data, serial data still run on the CPU cores.


----------



## NeoXF (Nov 3, 2013)

Apparently AMD has indirectly confirmed the naming scheme for desktop Kaveri.






So as I suspected, Ax-7x00x. Like A10-7800K for the next top tier model.

Edit: As well as the existance of next Athlon CPUs... Like Athlon II X4 770K or 850K? I guess.


----------



## The Von Matrices (Nov 3, 2013)

Steevo said:


> No, they DO want it. It improves their performance in all facets.
> 
> http://arstechnica.com/information-...orm-memory-access-coming-this-year-in-kaveri/
> 
> Instead of having software decide where to run the process from, the hardware decides in real time which is more efficient, and then runs it. Addresses are the same, so no latency penalty for transporting it around. Hugely improved performance in DSP and other filtered data, serial data still run on the CPU cores.



I don't understand how that article refutes what I said; I think you agree with me but didn't understand what I said.  I wasn't referring to dedicated memory for the GPU and GPU, which is obviously going away.  I was referring AMD not wanting something like a NUMA where different memory addresses have different bandwidths and latencies.  

When programming for the XBOX One, programmers have to write their code so that the most latency and bandwidth sensitive parts are sent to the small SRAM while the rest of the data is written to the larger but slower main memory.  AMD doesn't want to have developers worrying about swapping data between the SRAM versus main memory, so they want a unified memory architecture like the PS4.

This is why I don't see something like alwayssts said occurring, where there is a small, high speed, on chip cache managed by software.    The whole point of AMD's heterogeneous computing initiative is to make it as easy as possible for programmers to utilize heterogeneous computing.  If there is to be a large SRAM cache at all, AMD wants something more like Crystalwell where the cache is managed by hardware and it is transparent to the developer.


----------



## TheoneandonlyMrK (Nov 3, 2013)

The Von Matrices said:


> I don't understand how that article refutes what I said; I think you agree with me but didn't understand what I said.  I wasn't referring to dedicated memory for the GPU and GPU, which is obviously going away.  I was referring AMD not wanting something like a NUMA where different memory addresses have different bandwidths and latencies.
> 
> When programming for the XBOX One, programmers have to write their code so that the most latency and bandwidth sensitive parts are sent to the small SRAM while the rest of the data is written to the larger but slower main memory.  AMD doesn't want to have developers worrying about swapping data between the SRAM versus main memory, so they want a unified memory architecture like the PS4.
> 
> This is why I don't see something like alwayssts said occurring, where there is a small, high speed, on chip cache managed by software.    The whole point of AMD's heterogeneous computing initiative is to make it as easy as possible for programmers to utilize heterogeneous computing.  If there is to be a large SRAM cache at all, AMD wants something more like Crystalwell where the cache is managed by hardware and it is transparent to the developer.



thats exactly it and exactly where i think they are all going, stacked chips with multi layered memory and in centralising the memory rescource it only makes sense to up the bandwidth of each route to it and remove some of the intermediary caches to bring back some latency.
Im thinking quad module for Amd but per layer and effectively 4x ddr4 imc per layer x2 for 16 logic cores from 8 tied across an 8 channel ddr4 interface to 8 gig of Tsv connected dram, drop the sytem ram too at this point and the year is,,, ,likely 2015


----------

