# AMD Debuts New 12- and 16-Core Opteron 6300 Series Processors



## Cristian_25H (Jan 22, 2014)

AMD today announced the immediate availability of its new 12- and 16-core AMD Opteron 6300 Series server processors, code named "Warsaw." Designed for enterprise workloads, the new AMD Opteron 6300 Series processors feature the "Piledriver" core and are fully socket and software compatible with the existing AMD Opteron 6300 Series. The power efficiency and cost effectiveness of the new products are ideal for the AMD Open 3.0 Open Compute Platform - the industry's most cost effective Open Compute platform. 

Driven by customers' requests, the new AMD Opteron 6338P (12 core) and 6370P (16 core) processors are optimized to handle the heavily virtualized workloads found in enterprise environments, including the more complex compute needs of data analysis, xSQL and traditional databases, at optimal performance per-watt, per-dollar.



 



"With the continued move to virtualized environments for more efficient server utilization, more and more workloads are limited by memory capacity and I/O bandwidth," said Suresh Gopalakrishnan, corporate vice president and general manager, Server Business Unit, AMD. "The Opteron 6338P and 6370P processors are server CPUs optimized to deliver improved performance per-watt for virtualized private cloud deployments with less power and at lower cost points."

The new AMD Opteron 6338P and 6370P processors are available today through Penguin and Avnet system integrators and have been qualified for servers from Sugon and Supermicro at a starting price of $377 and $598, respectively. More information can be found on AMD's website.





*View at TechPowerUp Main Site*


----------



## fullinfusion (Jan 22, 2014)

Wow nothing wrong with it's price!

16 real cores


----------



## buildzoid (Jan 22, 2014)

If there were desktop boards for these I'd be all over the 12 core variant.


----------



## Pap1er (Jan 22, 2014)

buildzoid said:


> If there were desktop boards for these I'd be all over the 12 core variant.



I would also like to see desktop board for these meat grinders


----------



## ZetZet (Jan 22, 2014)

fullinfusion said:


> Wow nothing wrong with it's price!
> 
> 16 real cores


Not all that real.


----------



## buildzoid (Jan 22, 2014)

ZetZet said:


> Not all that real.


More real than intel's 8 cores 16 threads. The 8 extra threads only appear in specific scenarios and in others they don't exist whereas AMD's 16 cores are always capable of doing 16 tasks simultaneously it just doesn't scale perfectly because 1 core will do 100% single core performance but 16 will only do around 1260% instead of semi perfect scaling like what intel has where 1 core does 100% and 8 cores do 799% and with hyper threading it maxes out at 1038% so in some scenarios(3D graphics rendering) the 2000$ 8 core intel will beat the 600$ 16 core AMD and the AMD will win in video encoding and similar dumb work loads like searching for stuff so the AMD is a better server CPU than the intel.


----------



## Assimilator (Jan 22, 2014)

And, sadly, the Xeons will still beat the ever living crap out of these.


----------



## NC37 (Jan 22, 2014)

Assimilator said:


> And, sadly, the Xeons will still beat the ever living crap out of these.



I dunno. In multithreading AMD was beating Intel. Xeons are another story but when it comes to price for the performance. That I'd be interested to see.


----------



## SIGSEGV (Jan 22, 2014)

Assimilator said:


> And, sadly, the Xeons will still beat the ever living crap out of these.




Sadly, there is no opteron based server available in my country.. 
So i have no choice instead using (buying) xeon server and workstation for my lab which is very expensive.. 
That was very frustating..


----------



## techy1 (Jan 22, 2014)

soon there will be AMD marketing slides about +400% performance increase over "other competitors" 4 core CPUs


----------



## ensabrenoir (Jan 22, 2014)

...cool and at a great price..... but once again the lemming approach.  A bunch of little ...adequate cores.  The best result would be a price reduction at intel....bah hhhaaa hhhhaaaa yeah right.  Maybe some day but not by this. Nether the less Amd is still moving in the right direction.


----------



## Aquinus (Jan 22, 2014)

I would like everyone to remember what the equivalent Xeon is at that price point. I'm willing to bet that the Opteron is more cost effective, considering a 10 Core Xeon starts at 1600 USD, I think everything needs to be put into perspective. I would rather take two 16c Opterons than a single 10c Xeon, but that's just me.


----------



## buildzoid (Jan 22, 2014)

techy1 said:


> soon there will be AMD marketing slides about +400% performance increase over "other competitors" 4 core CPUs


it'd be true for the integer math capability but not much else.


----------



## Fragman (Jan 22, 2014)

ZetZet said:


> Not all that real.



Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and  if 1 core go's up in speed all the other stays down until used.
That makes for better power usage.


----------



## Breit (Jan 22, 2014)

Fragman said:


> Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and  if 1 core go's up in speed all the other stays down until used.
> That makes for better power usage.



I don't get what the power characteristics have to do with the debate about what counts as a "real" core and what does not?!
The fact is that with the Bulldozer architecture AMD choose to implement CMT in the form of modules rather than Hyperthreading as implemented by Intel (here called SMT). A module on an AMD CPU acts as 2 independent cores, but nonetheless they share certain functional units together. So technically they are NOT 2 independent cores. It's more or less the same as with Intels Hyperthreading, where a core can run 2 threads simultaneously and is seen by the OS as 2, but is actually only one core.
So maybe AMDs implementation of CMT/SMT in the form of modules is a step further in the direction of independent cores than Intel is with Hyperthreading. But all that doesn't really matter at all. At the end of the day, what counts is the performance you get out of the CPU (or performance per dollar or performance per watt, whatever matters most to you).

As far as I'm concerned, they should advertise these as 6 modules / 12 threads and 8 modules / 16 threads like Intel does with for instance the 8 core / 16 threads (8c/16t) nomenclature...


----------



## Prima.Vera (Jan 22, 2014)

Fragman said:


> Your either to stupid or don't know anything about Amd CPU's they are all independent cores with own multiplier and volt control and  if 1 core go's up in speed all the other stays down until used.
> That makes for better power usage.


Wow. You must be very smart for insulting and flaming users. Please, go on...


----------



## Aquinus (Jan 22, 2014)

Breit said:


> I don't get what the power characteristics have to do with the debate about what counts as a "real" core and what does not?!
> The fact is that with the Bulldozer architecture AMD choose to implement CMT in the form of modules rather than Hyperthreading as implemented by Intel (here called SMT). A module on an AMD CPU acts as 2 independent cores, but nonetheless they share certain functional units together. So technically they are NOT 2 independent cores. It's more or less the same as with Intels Hyperthreading, where a core can run 2 threads simultaneously and is seen by the OS as 2, but is actually only one core.
> So maybe AMDs implementation of CMT/SMT in the form of modules is a step further in the direction of independent cores than Intel is with Hyperthreading. But all that doesn't really matter at all. At the end of the day, what counts is the performance you get out of the CPU (or performance per dollar or performance per watt, whatever matters most to you).
> 
> As far as I'm concerned, they should advertise these as 6 modules / 12 threads and 8 modules / 16 threads like Intel does with for instance the 8 core / 16 threads (8c/16t) nomenclature...


The problem with that statement is that there is enough shared hardware to run two threads in tandem where hyper-threading won't always because it depends on parts of the CPU that are not being used.

Intel uses unused resources in the CPU to get extra multi-threaded performance. AMD added extra hardware for multi-threaded performance as opposed to using just the extra resources available. The performance of a module vs the performance of a single core with HT has costs and benefits of their own. With an Intel CPU, that second thread doesn't nearly have as much processing power that the first thread does, where with AMD, the amount of performance that second "thread" or "core" if you will has much more tangible gains than the HT thread does.

It's worth mentioning that the integer units do have enough hardware to run two full threads side-by-side. It's the floating point unit that doesn't but even still, FMA is supposed to give some ability to decouple the 256-bit FP unit to do two 128-bit ops at once.

I think AMD's goal is to emphasize what CPUs do best, integer math, and let GPUs do what they do best, FP math. Not to say that a CPU shouldn't do any FP math, but if there is a lot of FP math to be done, a GPU is better optimized to do those kinds of operations.

Also, I should add that I'm pretty sure that AMD clocks are controlled on a per-module basis but parts of each module can be power gated to improve power usage. One of the biggest benefits of having a module is that you save die space to add that second thread without too much of a hit on single-threaded performance (relatively speaking).



Prima.Vera said:


> Wow. You must be very smart for insulting and flaming users. Please, go on...


Please don't feed the ducks trolls.


----------



## Prima.Vera (Jan 22, 2014)

Aquinus said:


> Also, I should add that I'm pretty sure that AMD clocks are controlled on a per-module basis but parts of each module can be power gated to improve power usage. One of the biggest benefits of having a module is that you save die space to add that second thread without too much of a hit on single-threaded performance (relatively speaking).



Aq, agree with you. 
However I have a question. Don't you think this approach is somehow not ideal for AMD, because in this way, a core is having a lot less transistors than Intel's, therefore the bad performance in single-threaded applications, like games for example?
I don't understand why AMD is still going for strong GPU performance, even on the so called top CPU's, instead of having a GPU with only basic stuff to run the Win 7 desktop, then with the available space to increase the transistor count for each of the cores?? This way I think they will finally have a CPU to compete with the I7. Just some thoughts.


----------



## Aquinus (Jan 22, 2014)

Well, AMD has always pushed the "future is fusion" motto. HSA has always been a constant theme of theirs. I will be thrilled when AMD has an APU where CPU and iGPU compute units are shared, further blurring the distinction between massively parallel workloads on GPUs and fast serial workloads on CPUs.

Either way, CPUs are fast enough where there definitely is a point of diminishing returns. A CPU will only go so fast and you can only cram so many transistors in any given area. Also, on games that can utilize multi-core systems well, AMD isn't trailing behind all that much. Considering upcoming consoles have 8c CPUs in them, there will be more of a push to utilize that kind of hardware. It's completely realistic for a machine to have at least 4 logical threads now and as many as 8 for a consumer CPU. This wasn't the case several years ago.


----------



## Breit (Jan 22, 2014)

Prima.Vera said:


> Aq, agree with you.
> However I have a question. Don't you think this approach is somehow not ideal for AMD, because in this way, a core is having a lot less transistors than Intel's, therefore the bad performance in single-threaded applications, like games for example?
> I don't understand why AMD is still going for strong GPU performance, even on the so called top CPU's, instead of having a GPU with only basic stuff to run the Win 7 desktop, then with the available space to increase the transistor count for each of the cores?? This way I think they will finally have a CPU to compete with the I7. Just some thoughts.



I guess that's because it's technically very challenging and AMD might simply not be able to come up with something better? Just a guess...


----------



## Steevo (Jan 22, 2014)

Dual socket with 16 cores, can run 32 VM's in one rackmount tray, company X has 320 employees running thin clients, 10 racks plus one and assuming same drive/memory/board cost the AMD will win for $$$ reason alone. Data entry jobs don't need Xeon core performance for 10 key and typing.


----------



## Breit (Jan 22, 2014)

Aquinus said:


> I think AMD's goal is to emphasize what CPUs do best, integer math, and let GPUs do what they do best, FP math. Not to say that a CPU shouldn't do any FP math, but if there is a lot of FP math to be done, a GPU is better optimized to do those kinds of operations.



Sure? In theory you might be right, but most of at least consumer grade hardware is not that great at FP math (I'm talking about DP-FP of course).
An ordinary Core i7-4770K quad-core has a DP performance of about 177 GFLOPS. Thats for a 84W CPU (talking TDP). NVidia's 780Ti though is rated at 210 GFLOPS DP performance (DP is crippled on consumer chips, I know), but this comes at a cost of a whopping 250W TDP, which is about 3x the power draw! So simple math tells me that the Haswell i7 is about twice as efficient in DP-FP calculations as the current-gen GPU hardware is...
Single precision might be a totally different story though.


----------



## Nordic (Jan 22, 2014)

Breit said:


> Sure? In theory you might be right, but most of at least consumer grade hardware is not that great at FP math (I'm talking about DP-FP of course).
> An ordinary Core i7-4770K quad-core has a DP performance of about 177 GFLOPS. Thats for a 84W CPU (talking TDP). NVidia's 780Ti though is rated at 210 GFLOPS DP performance (DP is crippled on consumer chips, I know), but this comes at a cost of a whopping 250W TDP, which is about 3x the power draw! So simple math tells me that the Haswell i7 is about twice as efficient in DP-FP calculations as the current-gen GPU hardware is...
> Single precision might be a totally different story though.


An amd 7970 has ~1060 GFLOPS DP performance at 225 tdp. Amd gpu's are pretty darn great at compute and amd apu's will use amd gpu's not nvidea gpu's. So your comparison with a 780ti is silly.


----------



## Breit (Jan 22, 2014)

james888 said:


> An amd 7970 has ~1060 GFLOPS DP performance at 225 tdp. Amd gpu's are pretty darn great at compute and amd apu's will use amd gpu's not nvidea gpu's. So your comparison with a 780ti is silly.



Even it its way of topic:
A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.


----------



## Aquinus (Jan 22, 2014)

Breit said:


> Even it its way of topic:
> A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
> All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.



Might read into APUs again. There are benefits to be had by having HUMA on an APU, which solves the memory copying problem. The simple point is that CPUs are good at serial processing and GPUs are good at massively parallel ops. Depending on your workload, one may be better than the other. More often than not though, CPUs are doing integer math and GPUs are doing floating point math (single or double).

Basically CPUs are good at working with data that changes a lot (relatively small amounts of data that change a lot). GPUs are good at processing (or transforming if you will) a lot of data in a relatively fixed way.

So a simple example of what GPUs do best would be something like.

```
add 9 and multiply by 2 to every element of [1 2 3 4 5 6 7 8 9 ... 1000]
```

Where a CPU would excel at something like adding all of those elements, or doing something that reduces those values, as opposed to transforming it to a set of the same size as the input.



> GPUs can only process independent vertices and fragments, but can process many of them in parallel. This is especially effective when the programmer wants to process many vertices or fragments in the same way. In this sense, GPUs are stream processors – processors that can operate in parallel by running one kernel on many records in a stream at once.
> 
> A _stream_ is simply a set of records that require similar computation. Streams provide data parallelism. _Kernels_ are the functions that are applied to each element in the stream. In the GPUs, _vertices_ and _fragments_ are the elements in streams and vertex and fragment shaders are the kernels to be run on them. Since GPUs process elements independently there is no way to have shared or static data. For each element we can only read from the input, perform operations on it, and write to the output. It is permissible to have multiple inputs and multiple outputs, but never a piece of memory that is both readable and writable.[_vague_]
> 
> ...


See Stream Processing on Wikipedia.


----------



## Nordic (Jan 22, 2014)

Breit said:


> Even it its way of topic:
> A nVidia Titan has ~1300 GFLOPS DP at 250W TDP, but that was not the point.
> All that compute power on your GPU is pretty useless unless you have a task where you have to crunch numbers for an extended period of time AND your task can be scheduled in parallel, but I guess you know that. The latencies for copying data to the GPU and after processing there from the GPU back to the main memory / CPU are way to high for any mixed workload to perform well, so strong single-threaded FP performance will always be important in some way.


Isn't that what amd's HSA and HUMA are meant to solve?
Edit: Aquinius you speedy guy beat me to it with more eloquence.


----------



## Breit (Jan 22, 2014)

Aquinus said:


> Might read into APUs again. There are benefits to be had by having HUMA on an APU, which solves the memory copying problem.



True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...


----------



## Nordic (Jan 22, 2014)

Breit said:


> True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...


I just used a 7970 since I have a 7970 and knew its DP off hand. The architectural potential is there.

I got curious and looked for what DP an apu can get. This is the only thing I can find on current a10 7850k but its from WCCFtech so who knows its validity. 5800k on left, 7850k on right. Overclocked.


----------



## eidairaman1 (Jan 22, 2014)

ive come to note all businesses will go with the cheapest parts available, plus most companies or people let alone dont know who AMD is.

But I say this is really good news for them, now they just need to make the 8core desktop parts more efficient


----------



## Aquinus (Jan 22, 2014)

eidairaman1 said:


> ive come to note all businesses will go with the cheapest parts available, plus most companies or people let alone dont know who AMD is.
> 
> But I say this is really good news for them, now they just need to make the 8core desktop parts more efficient



Yeah! They need something to compete with Intel's 8 core Atom SoC. 20-watt TDP for an 8-core SoC isn't too shabby. There is a slower variant that offers lower clocks and less power usage but still retains 8 cores as well. I kind of want one.



Breit said:


> True, but the performance of a 7970 on an APU is not going to happen any time soon, I guess...



As long as PCI-E is your bus and you have memory that is completely segregated from the CPU, you're going to have that issue. Remember how gimped Intel CPUs were when they used an MCH and how the CPU needed to communicate with the MCH to get anything out of memory. As soon as the memory controller was moved next to the CPU cores, memory access speeds started flying and latency dropped like a rock. The issue is that no software can take advantage of having stream processors and CPU cores both working on the same data. Sharing data between different CPU cores is problematic enough, forget sharing it with an array of SIMD cores.


----------



## HumanSmoke (Jan 23, 2014)

NC37 said:


> I dunno. In multithreading AMD was beating Intel. Xeons are another story


Well, in this instance since the article concerns Opteron, Xeon would actually be the story worth considering as counterpoint.






NC37 said:


> but when it comes to price for the performance. That I'd be interested to see.


Undoubtably, but then AMD are obviously going to make a concession for processor upgrade pricing in order to make ageing C32/G34 platforms at least somewhat palatable.
[Chart source]


----------



## Thefumigator (Jan 23, 2014)

Assimilator said:


> And, sadly, the Xeons will still beat the ever living crap out of these.



Except in the price factor, maybe.


----------



## FordGT90Concept (Jan 23, 2014)

Aquinus said:


> Intel uses unused resources in the CPU to get extra multi-threaded performance. AMD added extra hardware for multi-threaded performance as opposed to using just the extra resources available. The performance of a module vs the performance of a single core with HT has costs and benefits of their own. With an Intel CPU, that second thread doesn't nearly have as much processing power that the first thread does, where with AMD, the amount of performance that second "thread" or "core" if you will has much more tangible gains than the HT thread does.


In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores.  There's really no discernible difference between them.

It may bog down faster than AMD's SMT implementation but comparing Intel Xeon 6-core processors to AMD Opteron 6-core processors really doesn't show that to be the case either.  Put bluntly, there's really no evidence to support AMD's SMT is any better than Intel's SMT.


----------



## Thefumigator (Jan 23, 2014)

FordGT90Concept said:


> In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores.  There's really no discernible difference between them.



HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it. 
I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.


----------



## buildzoid (Jan 23, 2014)

Thefumigator said:


> HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it.
> I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.


it doesn't when I turn HT off on my 3960X I lose at worst 40% of my multi threaded performance but it boost single threaded performance a little(1% maybe) as it takes load off of the data management part of the CPU.


----------



## Aquinus (Jan 23, 2014)

FordGT90Concept said:


> In the testing I did, disabling HTT really crippled my 920 and as far as software is concerned, performance from the logical cores is expected to match performance from the physical cores.  There's really no discernible difference between them.
> 
> It may bog down faster than AMD's SMT implementation but comparing Intel Xeon 6-core processors to AMD Opteron 6-core processors really doesn't show that to be the case either.  Put bluntly, there's really no evidence to support AMD's SMT is any better than Intel's SMT.



A while back I did some testing with my i7 and started disabling cores, leaving HT on and turning it off, to see how the performance difference between 4c/4t would be on my i7 and 2c/4t would be. In all honesty, the numbers don't agree with you. I can try and find it again, but generally speaking, hyper threading didn't yield much more than 30% improvement over a real core.

I'm curious, how did you test the performance of your CPU between disabling/enabling hyper threading?

Edit: Here, I found it.


----------



## FordGT90Concept (Jan 23, 2014)

Thefumigator said:


> HT got better and better since day one. The 920 is a monster of CPU (lovely, yes), but I really can't believe a HT core matches performance to its physical core asociated to it.
> I mean, if you make use of core 0 up to 100%, then core 1 should drop performance -quite- dramatically while core 2 wouldn't be affected at all.


Because that's not how it works.  It functions a lot like virtualization where the physical core is never exposed to the operating system.  Instead, there are four physical cores handling eight virtual cores--each physical core is responsible for two virtual cores.  The physical cores are designed to work on each virtual core up to 50% of the time.  This is why a quad core with SMT behaves very much like a slower eight physical core processor without SMT when handling heavy multithreaded loads.


buildzoid said:


> it doesn't when I turn HT off on my 3960X I lose at worst 40% of my multi threaded performance but it boost single threaded performance a little(1% maybe) as it takes load off of the data management part of the CPU.


Yeah, the tests were multithreaded (four or eight threads).  I didn't test anything single threaded.



Aquinus said:


> A while back I did some testing with my i7 and started disabling cores, leaving HT on and turning it off, to see how the performance difference between 4c/4t would be on my i7 and 2c/4t would be. In all honesty, the numbers don't agree with you. I can try and find it again, but generally speaking, hyper threading didn't yield much more than 30% improvement over a real core.
> 
> I'm curious, how did you test the performance of your CPU between disabling/enabling hyper threading?
> 
> ...


It looks like they got it fixed which is good (30% is about where it should be).  My 920 is three generations older than yours.

My test method was this custom application.  Basically it does var++ for 1 second across how many threads you tell it to of what type of variable and it tells you what it reached, repeats it 10 times and gives you the total and the results for every thread.  It shows basic compute power of any given processor.

Is it even possible to disable SMT on AMD processors?  Repeating the test you did with an AMD processor would tell us definitively what sort of difference SMT makes on them.


----------



## xenocide (Jan 23, 2014)

Aquinus said:


> I would like everyone to remember what the equivalent Xeon is at that price point. I'm willing to bet that the Opteron is more cost effective, considering a 10 Core Xeon starts at 1600 USD, I think everything needs to be put into perspective. I would rather take two 16c Opterons than a single 10c Xeon, but that's just me.



The biggest problem with that is that the Opteron solutions tend to consume a lot more power.  A Xeon (the benchmarks I see are mostly for E5-2660's) will consume about 95W, and a similar Opteron (in terms of performance--the Opteron 6380) consumes 151W under full load.  That's a staggering difference.  I would say over about a year the difference is made up in terms of cost, but the additional performance of the Xeon is not.  People need to stop looking at exclusively initial investment costs and start considering things like heat generation, power consumption, and performance over time given a set cost.


----------



## xorbe (Jan 23, 2014)

And 7-zip is likely memory bound, making it an ideal case for HT.


----------



## Thefumigator (Jan 23, 2014)

FordGT90Concept said:


> Because that's not how it works.  It functions a lot like virtualization where the physical core is never exposed to the operating system.  Instead, there are four physical cores handling eight virtual cores--each physical core is responsible for two virtual cores.  The physical cores are designed to work on each virtual core up to 50% of the time.  This is why a quad core with SMT behaves very much like a slower eight physical core processor without SMT when handling heavy multithreaded loads.


I disagree... I think you are wrong or I am misunderstanding you. HT will enable virtual cores, we agree with that. Windows will see those as cores, windows won't care if they are real or virtual, if its there it will assign any task or process to that free core. Performance on the other side will be impacted as soon as you use the virtual cores... do you mean that core 1 is not the "virtual core" from core 0?



FordGT90Concept said:


> Is it even possible to disable SMT on AMD processors?  Repeating the test you did with an AMD processor would tell us definitively what sort of difference SMT makes on them.


You can't disable half "module", but well, you can still use affinity option in windows. On my FX8320 after installing the windows patch, processes began being assigned differently in the CPU


----------



## xorbe (Jan 23, 2014)

Thefumigator said:


> You can't disable half "module"



There was definitely an ASUS motherboard with a BIOS that would allow just that.


----------



## FX-GMC (Jan 23, 2014)

xorbe said:


> There was definitely an ASUS motherboard with a BIOS that would allow just that.



More information on that here: http://www.xtremesystems.org/forums/showthread.php?275873-AMD-FX-quot-Bulldozer-quot-Review-(4)-!exclusive!-Excuse-for-1-Threaded-Perf

Their results show that if you were running an FX 8 core as a quad core, it is better to disable one core per module rather than disabling two whole modules.


----------



## buildzoid (Jan 23, 2014)

xenocide said:


> The biggest problem with that is that the Opteron solutions tend to consume a lot more power.  A Xeon (the benchmarks I see are mostly for E5-2660's) will consume about 95W, and a similar Opteron (in terms of performance--the Opteron 6380) consumes 151W under full load.  That's a staggering difference.  I would say over about a year the difference is made up in terms of cost, but the additional performance of the Xeon is not.  People need to stop looking at exclusively initial investment costs and start considering things like heat generation, power consumption, and performance over time given a set cost.


Your wrong and here is why:
A The Xeon 2697 v2 pulls 130W not 95 so 2 will pull(260W) almost as much as 3 of these Opterons(297W)
B These Opterons in the news pull 99W not 151
C The Cinebench R11.5 chart shows performance with perfect HT scaling so if you're using the server for data management and tasks that use the same part of the CPU over and over again the Xeons will be 40% slower than what Cinebench shows 
D The Xeon 2697 v2 cost 2100$ more than one of these new Opterons which is a difference so big that the Xeon won't close the price gap any time soon definitely not in a year or 2
E The Xeon 2660 cost 700$ more than the Opterons in the news and pulls 95W while being barely faster than the old Opterons which the new ones will either match or beat so again the price gap won't be closed in less than 2 years


----------



## FordGT90Concept (Jan 23, 2014)

Thefumigator said:


> I disagree... I think you are wrong or I am misunderstanding you. HT will enable virtual cores, we agree with that. Windows will see those as cores, windows won't care if they are real or virtual, if its there it will assign any task or process to that free core. Performance on the other side will be impacted as soon as you use the virtual cores... do you mean that core 1 is not the "virtual core" from core 0?


Assuming virtual core 0 and 1 are assigned to physical core 0, virtual core 2 and 3 are assigned to physical core 1, and so on and then you run 4 heavy threads on even numbered virtual cores and shift it to odd number virtual cores, the performance for both tests will be more or less equal.  The physical core itself prioritizes threads from each virtual core and it tries to give each virtual core about equal processor time.  This is why, from the software perspective, virtual core or physical core is moot.



FX-GMC said:


> Their results show that if you were running an FX 8 core as a quad core, it is better to disable one core per module rather than disabling two whole modules.


Well, yeah...
Normal: 8 ALUs, 4 FPUs
Disable half modules: 4 ALUs, 4 FPUs
Disable two modules: 4 ALUs, 2 FPUs

I just wonder how much of a performance hit it takes in a generic benchmark between the first two senarios (half of the ALUs disabled).  HTT loses about 30-35%.  Looking at the URL, WinRAR is the closest to what Aquinus posted and 8 ALU/4 FPU scores about 47.6% higher than 4 ALU/4 FPU.  That's slightly better than what Aquinus got but again, that's more of a memory benchmark than a compute power benchmark.

Edit: One of the users gave a range of: 33-59%.  For HTT, it looks anywhere from 2-33%: http://semiaccurate.com/2012/04/25/does-disabling-hyper-threading-increase-performance/

Another article largely mirrors these results:
http://www.extremetech.com/computin...e-effects-of-hyper-threading-software-updates

I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not.  This is sad.


----------



## Aquinus (Jan 23, 2014)

FordGT90Concept said:


> I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not. This is sad.



I wouldn't call it sad. Intel's current micro-architecture has evolved a bit since the Core and Core 2. AMD's design just isn't as mature. AMD CPUs keep up. They might not be better, but they're adequate. I wouldn't really call that sad.

What is sad is how AMD doesn't have more low power CPUs. For example, Intel has a 8c/8t Atom now, it's an SoC, and has a 20-watt TDP. I don't mean to contradict you, but performance isn't what's sad about AMD CPUs lately. In all honesty, if AMD cpus were a bit lighter on the power we probably wouldn't care as much about single-threaded performance being lacking.


----------



## cyneater (Jan 24, 2014)

Thefumigator said:


> Except in the price factor, maybe.



Thats about the only thing AMD can win at now is the price factor.


----------



## FordGT90Concept (Jan 24, 2014)

Aquinus said:


> I wouldn't call it sad. Intel's current micro-architecture has evolved a bit since the Core and Core 2. AMD's design just isn't as mature. AMD CPUs keep up. They might not be better, but they're adequate. I wouldn't really call that sad.
> 
> What is sad is how AMD doesn't have more low power CPUs. For example, Intel has a 8c/8t Atom now, it's an SoC, and has a 20-watt TDP. I don't mean to contradict you, but performance isn't what's sad about AMD CPUs lately. In all honesty, if AMD cpus were a bit lighter on the power we probably wouldn't care as much about single-threaded performance being lacking.


But that's my point.  Except price, there's no where AMD wins.  Intel has higher performance, less heat output, and lower power consumption.  When it comes to servers and HPC where Opterons are found, upfront cost is not a selling point because they save money on lower power and cooling bills from Xeons.  The only situation where AMD wins is if you only have X amount of money to spend right now and AMD is below that threshold while Intel is not.


----------



## Breit (Jan 24, 2014)

Just the right article posted yesterday: http://www.anandtech.com/show/7711/...f-kaveri-and-other-recent-amd-and-intel-chips



> It is no secret that AMD's Bulldozer family cores (Steamroller in Kaveri and Piledriver in Trinity) are no match for recent Intel cores in FP performance due to the shared FP unit in each module. As a comparison point, one core in Haswell has the same floating point performance per cycle as two modules (or four cores) in Steamroller.



That means an AMD CPU needs four times the core-count to be equal clock-for-clock in FP performance to a Intel CPU! That makes this 16-core Opteron exactly as fast as an ordinary Intel quad-core clock-for-clock regarding FP performance! Didn't expect that...


----------



## Thefumigator (Jan 24, 2014)

FordGT90Concept said:


> I just wonder how much of a performance hit it takes in a generic benchmark between the first two senarios (half of the ALUs disabled).
> I guess the morale of the story is that an AMD module struggles to keep up with an Intel core, HTT enabled or not.  This is sad.



I will test it out with my own benchmark software, when I get home


----------

