# AMD "Jaguar" Micro-architecture Takes the Fight to Atom with AVX, SSE4, Quad-Core



## btarunr (Feb 19, 2013)

AMD hedged its low-power CPU bets on the "Bobcat" micro-architecture for the past two years now. Intel's Atom line of low-power chips caught up in power-efficiency, CPU performance, to an extant iGPU performance, and recent models even feature out-of-order execution. AMD unveiled its next-generation "Jaguar" low-power CPU micro-architecture for APUs in the 5W - 25W TDP range, targeting everything from tablets to entry-level notebooks, and nettops. 

At its presentation at the 60th ISSC 2013 conference, AMD detailed "Jaguar," revealing a few killer features that could restore the company's competitiveness in the low-power CPU segment. To begin with, APUs with CPU cores based on this micro-architecture will be built on TSMC's 28-nanometer HKMG process. Jaguar allows for up to four x86-64 cores. The four cores, unlike Bulldozer modules, are completely independent, and only share a 2 MB L2 cache. 



 

 

 




"Jaguar" x86-64 cores feature a 40-bit wide physical address (Bobcat features 36-bit), 16-byte/cycle load/store bandwidth, which is double that of Bobcat, a 128-bit wide FPU data-path, which again is double that of Bobcat, and about 50 percent bigger scheduler queues. The instruction set is where AMD is looking to rattle Atom. Not only does Jaguar feature out-of-order execution, but also ISA instruction sets found on mainstream CPUs, such as AVX (advanced vector extensions), SIMD instruction sets such as SSSE3, SSE4.1, SSE4.2, and SSE4A, all of which are quite widely adopted by modern media applications. Also added is AES-NI, which accelerates AES data encryption. In the efficiency department, AMD claims to have improved its power-gating technology that completely cuts power to inactive cores, to conserve battery life. 



 

 

 



*View at TechPowerUp Main Site*


----------



## xvi (Feb 19, 2013)

btarunr said:


> The four cores, unlike Bulldozer modules, are completely independent, and only share a 2 MB L2 cache.



I am *SO* glad to hear this.


----------



## TRWOV (Feb 19, 2013)

Hello new file server


----------



## TheoneandonlyMrK (Feb 19, 2013)

Sounds better then I expected on the feature support side, now I really am expecting x8 core version in an upcoming set top gameing fandangle


----------



## Ikaruga (Feb 19, 2013)

Strange they do this before Sony's PS4 announcement tomorrow. Both new consoles from MS and Sony gonna have these new cores in their CPUs, I thought Sony would ask them for all the "flare" they can get. It's also strange only four cores allowed on the PC side while there will be more in the consoles (assuming that all the leaks are correct ofc).


----------



## Filiprino (Feb 19, 2013)

I suffered the shitty mono core Intel Atom N450 with the microstutters. The first iteraton of Bobcat was already faster than the Intel Atoms, and you had a pair of cores, now with a quadcore and improved performance per core these things should go quick enough, and even better with all of the "higher end" features like AVX, SSE4, AES and alike.

Sony and Microsoft made a good choice for their consoles (going with an octacore instead of a quadcore), and AMD made a good move.


> Jaguar allows for up to four x86-64 cores. The four cores, unlike Bulldozer modules, are completely independent, and only share a 2 MB L2 cache.





> "Jaguar" x86-64 cores feature a 40-bit wide physical address (Bobcat features 36-bit), 16-byte/cycle load/store bandwidth, which is double that of Bobcat, a 128-bit wide FPU data-path, which again is double that of Bobcat, and about 50 percent bigger scheduler queues. The instruction set is where AMD is looking to rattle Atom. Not only does Jaguar feature out-of-order execution, but also ISA instruction sets found on mainstream CPUs, such as AVX (advanced vector extensions), SIMD instruction sets such as SSSE3, SSE4.1, SSE4.2, and SSE4A, all of which are quite widely adopted by modern media applications. Also added is AES-NI, which accelerates AES data encryption. In the efficiency department, AMD claims to have improved its power-gating technology that completely cuts power to inactive cores, to conserve battery life.



Sweet.


----------



## Over_Lord (Feb 19, 2013)

Do these get HD 8000 IGP?


----------



## TheoneandonlyMrK (Feb 19, 2013)

thunderising said:


> Do these get HD 8000 IGP?



Good question not sure bit id expect them to and have an inclination to believe I read they had gcn in


----------



## Aquinus (Feb 19, 2013)

btarunr said:


> To begin with, APUs with CPU cores based on this micro-architecture will be built on TSMC's 28-nanometer HKMG process. Jaguar allows for up to four x86-64 cores. The four cores, unlike Bulldozer modules, are completely independent, and only share a 2 MB L2 cache.



More cores in more places! Sounds good to me. This is how APUs started; I spy 28-nm CPUs APUs.


----------



## McSteel (Feb 19, 2013)

Wonder if it will go over 1.6 GHz, seeing how it's more efficient than the last generation, and uses 28nm process...


----------



## PopcornMachine (Feb 19, 2013)

thunderising said:


> Do these get HD 8000 IGP?





theoneandonlymrk said:


> Good question not sure bit id expect them to and have an inclination to believe I read they had gcn in



I'm sure AMD will clear that up by saying that later this year it will have an unspecified upgrade that may or may not be called 7000 or 8000 and will maybe be bigger or smaller than previous chips, in the southern islands or solar system family, which may or may not be the same things.


----------



## Cataclysm_ZA (Feb 19, 2013)

Holy moly. I'd like to play with that! Looking forward to the future reviews where this crushes Atom. I'm not sure if it'll help stop the onslaught of Intel's low-power Core architecture as they drop it under 10W, but Jaguar is certainly going to be useful in a lot of applications.


----------



## NeoXF (Feb 19, 2013)

thunderising said:


> Do these get HD 8000 IGP?








Notice the "GCN" bit... obviously gonna end up named up as HD 8000 something... even tho it isn't known yet if it's GCN 1 or 2.


Also, you might be interested in this:

AMD Temash Prototype plays DiRT Showdown at 1080p ...


To be honest, I'm surprised these slides/videos aren't more known of...


----------



## Aquinus (Feb 19, 2013)

Atom: dual-core with hyper-threading.
Jaguar: Real quad-core with real cores.

We're not even talking modules. This is the real deal. So real they decided to keep using it. 



NeoXF said:


> Also, you might be interested in this:
> 
> AMD Temash Prototype plays DiRT Showdown at 1080p ...



I dig that displayport hub at the end of that video.


----------



## happita (Feb 19, 2013)

Aquinus said:


> Jaguar: Real quad-core with real cores.
> 
> We're not even talking modules. This is the real deal. So real they decided to keep using it.



It's about time. I was real close to getting a Bulldozer CPU when it dropped, but it flopped...so I didn't. I hope AMD gets it's head out of their ass soon. Whether it's in the desktop/laptop/tablet or whatever other area they seem to be in, a few steps in the right direction and it could slowly turn around for them.


----------



## TRWOV (Feb 19, 2013)

*off topic* Anyone noticed that cpu-world finally updated their front page timeline? Ivy and FX shows up at last. Went to check on C-70's IGP.


----------



## Jorge (Feb 19, 2013)

*Nice Job AMD*

Nice to see AMD moving forward with another excellent product that should sell well. This is definitely a win for consumers as Jaguar has a lot of potential and widespread application. You can bet that the next round of desktop CPUs and APUs will have similar but imrpoved features.


----------



## Nordic (Feb 19, 2013)

I want to see a jaguar vs atom vs via showdown.


----------



## Ikaruga (Feb 19, 2013)

james888 said:


> I want to see a jaguar vs atom vs via showdown.



I would rather see jaguar vs avoton


----------



## Nordic (Feb 19, 2013)

Ikaruga said:


> I would rather see jaguar vs avoton



avoton is atom. Best jaguar vs best atom vs best via.


----------



## DannibusX (Feb 19, 2013)

I wouldn't have named my product anything close to the Atari Jaguar.


----------



## Ikaruga (Feb 19, 2013)

james888 said:


> avoton is atom. Best jaguar vs best atom vs best via.



I know it's part of the atom family, but that's the one Intel wants to "use" against Jaguar, and with 8 cores and 4MB cache, they will probably win again.


----------



## Nordic (Feb 19, 2013)

Ikaruga said:


> I know it's part of the atom family, but that's the one Intel wants to "use" against Jaguar, and with 8 cores and 4MB cache, they will probably win again.



With 8 cores probably. 4 core vs 4 core vs 4 core then. I remember an older benchmark having via better than atom. I just wonder how it would be now.


----------



## Apocalypsee (Feb 20, 2013)

I been eying on AMD Jaguar architecture since last year. Once any Windows 8 tablets comes out with any of these I will bought it in a heartbeat.


----------



## btarunr (Feb 20, 2013)

McSteel said:


> Wonder if it will go over 1.6 GHz, seeing how it's more efficient than the last generation, and uses 28nm process...



According to AMD, it will be clocked up to 1.80 GHz.


----------



## Mussels (Feb 20, 2013)

[George Takei] Oh Myyyyyy [/Takei]


this should shake up the low end market a decent amount. intel atom is just too damn slow. aint nobody got time for it.


----------



## Lionheart (Feb 20, 2013)

Mussels said:


> [George Takei] Oh Myyyyyy [/Takei]
> 
> 
> this should shake up the low end market a decent amount. intel atom is just too damn slow. aint nobody got time for it.



*AGREED*


----------



## sergionography (Feb 20, 2013)

McSteel said:


> Wonder if it will go over 1.6 GHz, seeing how it's more efficient than the last generation, and uses 28nm process...


yes it will be clocked higher


btarunr said:


> According to AMD, it will be clocked up to 1.80 GHz.


no amd said it will be clocked 10% higher than what bobcat *would've* clocked at on 28nm node. that being said 1.8ghz is the worst case scenario, realistic scenario is probably about 20-30% higher due to the added stage in the pipleline in the design, and some due to the 28nm node, so 2ghz-2.2ghz is very likely, but seeing that they introduced a 25w tdp part on these i wont be surprise to see turbo clocks at over 2.4-2.8ghz(considering trinity 19w tdp parts do 2.0-2.8ghz)


xvi said:


> I am *SO* glad to hear this.



yes but then bobcat/jaguar is half the bulldozer module, it has 2decoders and 128bit fpu vs bulldozers 4decoders and 256bit fpu


----------



## Aquinus (Feb 20, 2013)

sergionography said:


> es but then bobcat/jaguar is half the bulldozer module, it has 2decoders and 128bit fpu vs bulldozers 4decoders and 256bit fpu



...and without any shared resources to run the additional thread like a module would. Most software can't utilize the 256-bit FPU yet anyways. So it's not like this is a gimped BD chip but rather it is a beefed up bobcat chip. There are a lot of CPU features and instructions that will be offered that is pretty neat.

Also you said something about the pipeline being larger. How do you figure? This CPU doesn't use modules or the module design so why would the pipeline be longer? Shouldn't it be similar to the PII pipeline?


----------



## sergionography (Feb 20, 2013)

Aquinus said:


> ...and without any shared resources to run the additional thread like a module would. Most software can't utilize the 256-bit FPU yet anyways. So it's not like this is a gimped BD chip but rather it is a beefed up bobcat chip. There are a lot of CPU features and instructions that will be offered that is pretty neat.
> 
> Also you said something about the pipeline being larger. How do you figure? This CPU doesn't use modules or the module design so why would the pipeline be longer? Shouldn't it be similar to the PII pipeline?



yes but the bulldozer core can max out a big portion of the module on a single thread while a bobcat/jaguar cant use up a second core for better single thread  the fundamental behind the bulldozer is excellent, but the implementation was horrible, they shared way too much at once, and now with steamroller unsharing some of the parts like the decoder is proof for that, they shouldve started like jaguar, share the L2 cache, then go from there to share prefetch, and then other parts if needed

but now back to jaguar which is what this thread is about!
when jaguar was announced in the amd presentation they mentioned adding a stage to the pipleline, it used to be 11 now its 12 i believe, or 10 became 11 cant remember

and im talking about the integer pipelines, every cpu has one. and no pII had 13 stages if im not mistaken so bobcat had a new redesigned one. bulldozer has 19-22 also redesigned from pII


----------



## Aquinus (Feb 20, 2013)

sergionography said:


> when jaguar was announced in the amd presentation they mentioned adding a stage to the pipleline, it used to be 11 now its 12 i believe, or 10 became 11 cant remember



Right, where did you read that because I can't find anything to confirm it.


----------



## sergionography (Feb 20, 2013)

Aquinus said:


> Right, where did you read that because I can't find anything to confirm it.


semiaccurate goes briefly over the added stage in the pipeline and has an amd slide about it also, but what o remember for sure is a YouTube video i saw were they presented trinity and then jaguar, i will send links later as now im using my phone to reply


----------



## ste2425 (Feb 20, 2013)

you got me all excited for nothing i though AMD was teaming up with jaguar for something then 

AMD XJS


----------



## lilhasselhoffer (Feb 20, 2013)

An Atom style chip that doesn't suck.  It's too bad that AMD didn't do this two years ago, and completely curb stomp Intel in the market.


As it stands, Intel is getting closer to making a viable Atom every revision.  They suck on the graphics side, but have the weight to push Atom forward.  AMD really caught the boat with an APU, but haven't done enough (as yet) to close the market to Intel offerings.


Here's to the hope that Intel will get thoroughly beaten by an excellent APU string.  I'd get behind a quad core tablet running, ostensibly, 7xxx generation GCN graphics.  It beats the tar out of the crap Intel has phoned in with Atom.


----------



## Harlequin_uk (Feb 20, 2013)

hmmm make it a 95w part and 4ghz?


----------



## sergionography (Feb 21, 2013)

lilhasselhoffer said:


> An Atom style chip that doesn't suck.  It's too bad that AMD didn't do this two years ago, and completely curb stomp Intel in the market.
> 
> 
> As it stands, Intel is getting closer to making a viable Atom every revision.  They suck on the graphics side, but have the weight to push Atom forward.  AMD really caught the boat with an APU, but haven't done enough (as yet) to close the market to Intel offerings.
> ...



what are you talking about? bobcat stomped atom on so many levels


----------



## AlB80 (Feb 21, 2013)

*Ps4*

I heard JG will be inside PS4.
ps. 1.6GHz


----------



## NinkobEi (Feb 21, 2013)

the PS4 will have an 8-core version @ 1.84 ghz. Or will it just be two Jaguars? A mommy and a poppy. Hmm. Anyone seen the benchies for this puppy yet?


----------



## Ikaruga (Feb 21, 2013)

NinkobEi said:


> the PS4 will have an 8-core version @ 1.84 ghz. Or will it just be two Jaguars? A mommy and a poppy. Hmm. Anyone seen the benchies for this puppy yet?



What we "know" so far about about Orbis's CPU (from the rumors/leaks) is this:

-   Orbis contains eight Jaguar cores at 1.6 Ghz, arranged as two “clusters”
-   Each cluster contains 4 cores and a shared 2MB L2 cache
-   256-bit SIMD operations, 128-bit SIMD ALU
-   SSE up to SSE4, as well as Advanced Vector Extensions (AVX)
-   One hardware thread per core
-   Decodes, executes and retires at up to two intructions/cycle
-   Out of order execution
-   Per-core dedicated L1-I and L1-D cache (32Kb each)
-   Two pipes per core yield 12,8 GFlops performance
-   102.4 GFlops for system

1.6Ghz might get a little boost before the release, since they also doubled the RAM from 4GB to 8GB already.

btw a little off-toppic: anyone has any idea, how the hell are they going to deal with the insane amount of latency of the GDDR5 as main memory, this is something which puzzles me since yesterday?


----------



## cadaveca (Feb 21, 2013)

Ikaruga said:


> What we "know" so far about about Orbis's CPU (from the rumors/leaks) is this:
> 
> -   Orbis contains eight Jaguar cores at 1.6 Ghz, arranged as two “clusters”
> -   Each cluster contains 4 cores and a shared 2MB L2 cache
> ...



What Latency?


----------



## Aquinus (Feb 21, 2013)

Ikaruga said:


> btw a little off-toppic: anyone has any idea, how the hell are they going to deal with the insane amount of latency of the GDDR5 as main memory, this is something which puzzles me since yesterday?





cadaveca said:


> What Latency?



I don't think latency is going to be a problem. If they're using GDDR5 for main memory as well as video memory then I suspect that the CPU will directly access memory. It's not like a discrete GPU on a computer where you have to copy the data over the PCI-E bus where latency would be a very real issue, but I don't think that will be the case.


----------



## sergionography (Feb 21, 2013)

Ikaruga said:


> What we "know" so far about about Orbis's CPU (from the rumors/leaks) is this:
> 
> -   Orbis contains eight Jaguar cores at 1.6 Ghz, arranged as two “clusters”
> -   Each cluster contains 4 cores and a shared 2MB L2 cache
> ...



we also know it will have 18gcn clusters = 1152 gcn cores rated at 800mhz
and it was rated at 1.84gflops or something actually

as for the latency then i guess its up to the custom hsa memory controller, i would bet on that to handle things, after all the chip is an apu and its interesting to see what a buff apu can do, as the latency between cpu and gpu is much lower so gpgpu on an apu is much better than on a dedicated gpu with the same specs, and with gddr5 the high bandwidth will cover up the latency especialy that on consoles developers will optimize specifically for the hardware so it wont be too hard to tap into the flops available

and above all the good news out of this is that amd is smart to offer a multicore solution with with high latency to optimize because if anything this will only make their desktop solutions shine in future games since developers will start to work around it. this might explain why with steamroller amd paid no attention to most of the higher level cache subsystem  (high latency on l3 and l2 cache)


----------



## AsRock (Feb 22, 2013)

DannibusX said:


> I wouldn't have named my product anything close to the Atari Jaguar.



LMAO, for some odd reason it was the 1st thing i thought when i read AMD Jaguar for what must been coursed by that terrible console.


----------



## Mussels (Feb 22, 2013)

sounds like they're planning crossfired APU's. for media/2D use, drop back to single CPU + GPU, then for games that require it, ramp it up to 8 core/dual GPU.


----------



## Ikaruga (Feb 22, 2013)

sergionography said:


> we also know it will have 18gcn clusters = 1152 gcn cores rated at 800mhz
> and it was rated at 1.84gflops or something actually



yep, I forgot about those changes, thanks.



cadaveca said:


> What Latency?





sergionography said:


> we also know it will have 18gcn clusters = 1152 gcn cores rated at 800mhz
> and it was rated at 1.84gflops or something actually
> 
> as for the latency then i guess its up to the custom hsa memory controller, i would bet on that to handle things, after all the chip is an apu and its interesting to see what a buff apu can do, as the latency between cpu and gpu is much lower so gpgpu on an apu is much better than on a dedicated gpu with the same specs, and with gddr5 the high bandwidth will cover up the latency especialy that on consoles developers will optimize specifically for the hardware so it wont be too hard to tap into the flops available
> ...



Don't forget that it's not DDR5 but GDDR5! There is a significant difference. GDDR5 is basically a heavily tweaked DDR3 (well, not exactly, but let's just forgot the little details for the sake of the subject). They sacrifice the low latency of the DDR3 to boost the bandwidth. GPUs don't really need very low latencies since their parallel nature "comes to the rescue" when a thread/calculation stalls, and only internal speed what matters the most, to be able to move large amount of data chunks as fast as possible. 

Don't get me wrong, I'm sure Sony knows what they are doing and eight CPU cores is apparently makes it parallel enough to use GDDR5 as system memory, but I'm still very curious how they are doing it, because if it's better, I sure want something like that on our PC side as well


----------



## de.das.dude (Feb 22, 2013)

this is good. way to go AMD. intel atom is seriously slow. even running windows 7 is a chore. aint nobody got time for that


----------



## Aquinus (Feb 22, 2013)

Ikaruga said:


> Don't forget that it's not DDR5 but GDDR5! There is a significant difference. GDDR5 is basically a heavily tweaked DDR3 (well, not exactly, but let's just forgot the little details for the sake of the subject). They sacrifice the low latency of the DDR3 to get boost the bandwidth. GPUs don't really need very low latencies since their parallel nature "comes to the rescue" when a thread/calculation stalls, and only internal speed what matters the most, to be able to move large amount of data chunks sas fast as possible.



How do you figure? The actual timings might be higher but keep in mind that GDDR5 gets run at nutty high clock speeds. I think any issue with latency will be mitigated with proper pre-fetching and a large (and fast) CPU cache.


----------



## Ikaruga (Feb 22, 2013)

Aquinus said:


> How do you figure? The actual timings might be higher but keep in mind that GDDR5 gets run at nutty high clock speeds. I think any issue with latency will be mitigated with proper pre-fetching and a large (and fast) CPU cache.



It's probably the new 4Gb Hynix or Samsung chips available from Q1 this year (they gonna use 16 piece in clamshell mode I assume), and both of those will have 32ns latency, fairly high for any kind of CPU.... hence my technical curiosity.


----------



## Aquinus (Feb 22, 2013)

Ikaruga said:


> It's probably the new 4Gb Hynix or Samsung chips available from Q1 this year (they gonna use 16 piece in clamshell mode I assume), and both of those will have 32ns latency, fairly high for any kind of CPU.... hence my technical curiosity.



What? You're joking right? The only CPUs out that are even capable of getting close to accessing memory in 32ns is an IVB chip. I couldn't even get close to that with my SB-E 3820. There are a lot of CPUs with more latency than that.

I think it will be fine.


----------



## Ikaruga (Feb 22, 2013)

Aquinus said:


> What? You're joking right? The only CPUs out that are even capable of getting close to accessing memory in 32ns is an IVB chip. I couldn't even get close to that with my SB-E 3820. There are a lot of CPUs with more latency than that.
> 
> I think it will be fine.
> 
> http://www.techpowerup.com/forums/attachment.php?attachmentid=50174&stc=1&d=1361524743



No, and I don't really understand why would I joke about ram timings on my favorite enthusiast site. Do you understand that I was citing the actual latency of the chip itself, and not the latency the MC will have to deal with when accessing the memory? 
For example, a typical DDR3@1600 module has about 12ns latency in a modern PC.


----------



## McSteel (Feb 22, 2013)

I believe that AIDA does round-trip latency, and Ikaruga (love that game btw) probably claims that the GDDR5 used has a CL of 32ns. 1600 MT/s CL9 DDR3 has a CL of ~11.25ns max, close to three times less.

Still, with some intelligent queues and cache management, this won't be too much of a problem.


## EDIT ##
Have I ever mentioned how I *hate it* when I get distracted when replying, only to find out I made myself look like an idiot by posting the exact same thing as the person before me? Well, I do.
Sorry Ikaruga.


----------



## Aquinus (Feb 22, 2013)

Ikaruga said:


> No, and I don't really understand why would I joke about ram timings on my favorite enthusiast site. Do you understand that I was citing the actual latency of the chip itself, and not the latency the MC will have to deal with when accessing the memory?
> For example, a typical DDR3@1600 module has about 12ns latency in a modern PC.



You mean the 32ns refresh? That's not access speeds my friend, that is how often that a bit in a DRAM cell is refreshed. All DRAM needs to be refreshed since data is stored in a capacitor and needs to be replenished as caps leak when they're disconnected from active power. Other than that, I see no mention of 32ns there.

That "32ns" sounds a lot like tRFC on DDR3 chips, not access latency.


----------



## Ikaruga (Feb 22, 2013)

McSteel said:


> I believe that AIDA does round-trip latency, and Ikaruga (love that game btw) probably claims that the GDDR5 used has a CL of 32ns. 1600 MT/s CL9 DDR3 has a CL of ~11.25ns max, close to three times less.
> 
> Still, with some intelligent queues and cache management, this won't be too much of a problem.
> 
> ...



Yes I meant that speed, sorry for my English:shadedshu


----------



## Harlequin_uk (Feb 22, 2013)

so sony will be accessing it with there own version of LibGCM , along with OCL 1.2 means some awesome and better control over the hardware - something they cant really do now in the PC world as the hardware is variable , could see `on the fly` changes to core useage depending on whether is high physics load or a cut scene movie


----------



## Aquinus (Feb 22, 2013)

McSteel said:


> claims that the GDDR5 used has a CL of 32ns. 1600 MT/s CL9 DDR3 has a CL of ~11.25ns max, close to three times less.



Isn't that kind of moot since GDDR5 can run at clocks that are 3 times faster than DDR3-1600? It's the same deal that happened when moving from DDR to DDR2 and to DDR3. Latencies increased but access times remained the same because the memory frequency increased which compensates for it and at the same time provides more bandwidth.

Yeah, there might be more latency, it's possible, but I don't think it will make that much of a difference. Also with more bandwidth you can load more data into cache in one clock than DDR3. So I think the benefits will far outweigh the costs.


----------



## Ikaruga (Feb 22, 2013)

Aquinus said:


> Isn't that kind of moot since GDDR5 can run at clocks that are 3 times faster than DDR3-1600? It's the same deal that happened when moving from DDR to DDR2 and to DDR3. Latencies increased but access times remained the same because the memory frequency increased which compensates for it and at the same time provides more bandwidth.
> 
> Yeah, there might be more latency, it's possible, but I don't think it will make that much of a difference. Also with more bandwidth you can load more data into cache in one clock than DDR3. So I think the benefits will far outweigh the costs.



I don't think the price is the reason why we still don't use GDDR5 as main memory in PCs, after all they are selling graphics cards for much more than how much a GDDR5 ram kit or a supporting chipset/architecture would cost. I did not really red anything about overcoming the GDDR5 latency issue in the past, so that's what made me curious.



McSteel said:


> .....and Ikaruga (love that game btw)


----------



## EpicShweetness (Feb 26, 2013)

sergionography said:


> we also know it will have 18gcn clusters = 1152 gcn cores rated at 800mhz
> and it was rated at 1.84gflops or something actually



18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.

So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.


----------



## Aquinus (Feb 26, 2013)

EpicShweetness said:


> 18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.
> 
> So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.



They already said that the graphics power is going to be similar to a 7870, didn't they?


----------



## sergionography (Feb 28, 2013)

Aquinus said:


> You mean the 32ns refresh? That's not access speeds my friend, that is how often that a bit in a DRAM cell is refreshed. All DRAM needs to be refreshed since data is stored in a capacitor and needs to be replenished as caps leak when they're disconnected from active power. Other than that, I see no mention of 32ns there.
> 
> That "32ns" sounds a lot like tRFC on DDR3 chips, not access latency.



and that is exactly what latency is tho, as what happens the ram issues the data to the cpu, and after 32ns it refreshes to send the next batch,gpus are highly paralleled so they arent as affected by latency as most gpus just need a certain amount of data to render while the ram sends the next batch, while cpus are a much more random and general purpose than gpus, for example were certain calculations would be issued from the ram, but in order for the cpu to complete the process it must wait for the second batch of data for example, in such a case the cpu would wait for another 32ns, and this is a big issue with cpus now adays but i think it can be easily masked with a large enough l2 cache for the jaguar cores(I think having 8 of them means 4mb cache that can be shared meaning one core can have all 4mb if it needs to. another thing i can think of is whether all 8gb refresh all at once, or whether sony will allow for the ram to work in turns to feed the cpu/gpu more dynamicaly rather than in big chunks of data(note that bulldozer/piledriver have relatively large data pools of l3 and l2 cache to mask their higher latency, and with steamroller adding larger l1 cahce aswell that sais something, and not to mention how much l3 cache affects piledriver in trinity which is pretty slower than fx piledriver, while phenom II vs athlon II barely had any affect due to its lower latency)



EpicShweetness said:


> 18gcn clusters! Can that be right! That would mean Jaguar would get 576 which is more then a 7750, and that alone is 40w of power. Something is a miss here for me.
> 
> So 45+45+ say 45 again (cpu) is 135w+!! Something has to be a miss.



jaguar gets 576 what?
and the highest end jaguar apu with its graphics cores(128 of them?) is rated at 25watt and with much higher clockspeed than 1.6ghz(amd in their presentation said jaguar will clock 10-15 higher than what bobcat wouldve clocked at 28nm) so ur talking atleast over 2ghz.
and if llano with 400outdated radeon cores, and 4 k10.5 cores clocked atleast 1.6 before turbo, so expect jaguar to be much more efficient on a new node and power efficient architecture, say 25watt max for the cpu cores only, if not less, that leaves them with 75-100watt headroom to work with(think hd7970m rated at 100w, thats 1280gcn cores at 800mhz, this would have 1152gcn cores at 800mhz and after a year of optimization its easily at 75watt)to add up to 100-125w which is very reasonable and since its an apu u just need one proper cooler, also think of graphics cards rated at 250w only requiring one blower fan and a dual slot cooler to cool both gddr5 chips and the gpu. in other words the motherboard and the chip can be as big as a hd7970(but with 100-125w u only need something the size of hd7850 which is rated at 110w-130w) but then of course add the br drive and other goodies. main point is cooling is no problem unless multiple chips are involved requiring cooling the case in general rather than the chip itself using a graphic card style cooler


Aquinus said:


> They already said that the graphics power is going to be similar to a 7870, didn't they?


more like between hd7850 and hd7970m, it seems 800mhz is the sweet spot in terms of performance/efficiency/die size considering an hd7970m with 1280gnc cores at 800mhz is at 100w versus 110w measured/130w rated  on hd7850 with 1024w with 860mhz
not to mention the mobile pitcairn loses 30watts-75watts measured/rated when clocked at 800mhz(advertised tdp on desktop pitcairn is 175wat but measured at 130w according to the link i have below)

http://www.guru3d.com/articles_pages/amd_radeon_hd_7850_and_7870_review,6.html
here is a reference in regards to the measured tdp, because advertised tdp by amd is higher but also consider other parts on the board and allowing overclock headroom or whatever the case is


----------



## Prima.Vera (Feb 28, 2013)

Ikaruga said:


> I don't think the price is the reason why we still don't use GDDR5 as main memory in PCs, after all they are selling graphics cards for much more than how much a GDDR5 ram kit or a supporting chipset/architecture would cost. I did not really red anything about overcoming the GDDR5 latency issue in the past, so that's what made me curious.



Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already...


----------



## Aquinus (Feb 28, 2013)

Prima.Vera said:


> Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already...



Read the entire thread before you jump to conclusions, this is mainly stemming from the PS4 discussion.

GDDR5 itself can do whatever it wants, there are no packages or CPU IMCs that will handle it though, that *does not mean that it can not be used*. PS4 is lined up to use GDDR5 for system and graphics memory and I suspect that Sony isn't just saying that for shits and giggles.

Also it's not all that different, latencies are different, performance is (somewhat, not a ton,) optimized for bandwidth other latency but other than that, communication is about the same sans two control lines for reading and writing. It's a matter of how that data is transmitted, but your statement here is really actually wrong.

Just because devices don't use a particular bit of hardware to do something doesn't mean that you can't use that hardware to do something else. For example, for the longest time low voltage DDR2 was used in phones and mobile devices and not DDR3. Does that mean that DDR3 will never get used in smartphones? Most of us know the answer to that and it's a solid no, GDDR5 is no different. Just because it works best on video cards doesn't mean that it can not be used of a CPU that would be build with a GDDR5 memory controller.


----------



## Ikaruga (Feb 28, 2013)

Prima.Vera said:


> Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory because the Graphic DDR5 is special RAM only to be used in graphics. Is a very big difference between how a GPU and CPU uses RAM. Also GDDR5 is based on DDR3 so you are already using it for a long time. Don't know exactly the specifics, but you can google it already...



Please consider switching from "write-only mode" on the forum, and read my comments if your reply to me:

many thanks


----------



## btarunr (Feb 28, 2013)

Prima.Vera said:


> Guys, you need to stop the confusion. You CANNOT use GDDR5 in your PC as a main memory.



Oh but you can. PS4 uses GDDR5 as system memory.


----------



## Prima.Vera (Feb 28, 2013)

Ikaruga said:


> Please consider switching from "write-only mode" on the forum, and read my comments if your reply to me:
> 
> many thanks



Please don't tell me what to do, or what I am allowed to do or not.

many thanks



btarunr said:


> Oh but you can. PS4 uses GDDR5 as system memory.



PS4 is NOT PC...But if what you all say is true, than why nobody introduced GDDR5 for PC?? Is from a long time on video cards. And why is it called Graphic DDR then?


----------



## Frick (Feb 28, 2013)

Ps4 is pretty much a custom PC.

EDIT: With a custom OS.


----------



## Ikaruga (Feb 28, 2013)

Prima.Vera said:


> Please don't tell me what to do, or what I am allowed to do or not.



Again :shadedshu. I think if you would start reading what I write, perhaps you would see that I did not tell you _"what to do, or what you are allowed to do or not."_, I only *asked you to consider* it: 





Ikaruga said:


> Please consider....


----------



## btarunr (Feb 28, 2013)

Prima.Vera said:


> PS4 is NOT PC...But if what you all say is true, than why nobody introduced GDDR5 for PC?? Is from a long time on video cards. And why is it called Graphic DDR then?



The CPU and software are completely oblivious to memory type. The only component that really needs to know how the memory works at the physical level is the integrated memory controller. To every other component, memory type is irrelevant. It's the same "load" "store" "fetch" everywhere else.

Just because GDDR5 isn't a PC memory standard doesn't mean it can't be used as system main memory. It would have comparatively high latency to DDR3, but it still yields high bandwidth. GDDR5 stores data in the same ones and zeroes as DDR3, SDR, and EDO.


----------



## Aquinus (Feb 28, 2013)

Prima.Vera said:


> And why is it called Graphic DDR then?



Because it is *optimized for graphics, not exclusively for graphics*.

You're just digging yourself into a hole.


----------



## tokyoduong (Feb 28, 2013)

Ikaruga said:


> Strange they do this before Sony's PS4 announcement tomorrow. Both new consoles from MS and Sony gonna have these new cores in their CPUs, I thought Sony would ask them for all the "flare" they can get. It's also strange only four cores allowed on the PC side while there will be more in the consoles (assuming that all the leaks are correct ofc).



Game consoles are not mobile. These are designed for mobile devices with 5-25W power envelope.



btarunr said:


> The CPU and software are completely oblivious to memory type. The only component that really needs to know how the memory works at the physical level is the integrated memory controller. To every other component, memory type is irrelevant. It's the same "load" "store" "fetch" everywhere else.
> 
> Just because GDDR5 isn't a PC memory standard doesn't mean it can't be used as system main memory. It would have comparatively high latency to DDR3, but it still yields high bandwidth. GDDR5 stores data in the same ones and zeroes as DDR3, SDR, and EDO.



^^tru dat!



Aquinus said:


> Because it is *optimized for graphics, not exclusively for graphics*.
> 
> You're just digging yourself into a hole.



By graphics you probably meant bandwidth which is correct. 

I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.


----------



## Mussels (Mar 1, 2013)

tokyoduong said:


> I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.



my thoughts as well. these arent meant to be generic multipurpose machines, they're meant to he gaming consoles with pre-set roles, and time to code each game/program to run specifically on them.


this gives game devs the ability to split that 8GB up at will, between CPU and GPU. that could really extend the life of the console, and its capabilities.


----------



## Aquinus (Mar 1, 2013)

tokyoduong said:


> By graphics you probably meant bandwidth which is correct.



That's to go without saying. GDDR is optimized for graphics which performs best under high bandwidth, high(er) latency situations.


tokyoduong said:


> I'm guessing latency is not as much of an issue when it comes to a specific design for a console rather than a broad compatibility design for PC.


I'm not willing to go that far, but I'm sure they will have stuff to mitigate any slowdown it may cause such as intelligent caching and pre-fetching.


----------



## Ikaruga (Mar 1, 2013)

Mussels said:


> my thoughts as well. these arent meant to be generic multipurpose machines, they're meant to he gaming consoles with pre-set roles, and time to code each game/program to run specifically on them.


Yes that's one of the advantage of  working on closed systems like consoles. It helps a lot both in development speed and efficiency wise, but the developing procedure is still the same.



Mussels said:


> this gives game devs the ability to split that 8GB up at will, between CPU and GPU. that could really extend the life of the console, and its capabilities.


 You can't do anything else but split unified memory (this is also the case with APUs and IGPs on the PC ofc), that's why it's called unified. 
The N64 was a console and developers actually released titles on it, but this doesn't change the fact how horrid the memory latency really was on that system, and how much extra effort and work the programmers had to make to get over that huge limitation (probably the main reason why Nintendo introduced 1T-SRAM in the Gamecube, which was basically eDram on die).
If you really wan't to split unified memory into CPU and GPU memory (you can't btw, but let's assume you could), it's extremely unlikely that developers will use more than 1-2GB for "Video memory" in the PS4, not only because the bandwidth would be not enough to use more, but also because it's simply not needed (ok, the ps4 is extremely powerful on the bandwidth side and perhaps there will be some new rendering technique in the future which we don't know about yet, but current methods like deferred rendering, voxel, megatexturing, etc will run just fine using only 1-2GB for "rendering").


----------

