# AMD Dragged to Court over Core Count on "Bulldozer"



## btarunr (Nov 6, 2015)

This had to happen eventually. AMD has been dragged to court over misrepresentation of its CPU core count in its "Bulldozer" architecture. Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores. 

The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would. Dickey is suing for damages, including statutory and punitive damages, litigation expenses, pre- and post-judgment interest, as well as other injunctive and declaratory relief as is deemed reasonable.





*View at TechPowerUp Main Site*


----------



## john_ (Nov 6, 2015)

I think AMD created Bulldozer to be a HyperThreading alternative that can be marketed as a "double cores" chip and not as a "double threads" chip and at the same time to be immune to lawsuits like this one. Their marketing department even thought adding GPU "cores" in the mix and now they are talking about "compute cores" in AMD's APUs. Now if we add to that the high frequencies, those two make Bulldozer more of a marketing chip than a performance chip.

I would say that under a dictatorship, AMD's marketing department would have been shoot and everyone would be celebrating that decision. Even if there where rebels fighting that dictatorship, they would have also celebrated that decision.

Under a democracy the question is, do these people still work at AMD?


----------



## the54thvoid (Nov 6, 2015)

Hmm....

I should imagine AMD will win.  It's all semantics when they can show thread heavy tasks in a 4 core, 8 module chip, being as good (close enough) as a 4 core, 8 threaded chip (i7 2600).

There's plenty of graphs to support performance.  The dick in court is probably pissed at it's sub standard single threaded performance.


----------



## robert3892 (Nov 6, 2015)

If any company advertises an 8 core processor when in fact it has 4 cores then that is misrepresentation of the product. A core must work independently.


----------



## FordGT90Concept (Nov 6, 2015)

Pretty sure he's going to win.  I don't think there's any nomenclature to properly describe Bulldozer's design and even if it had existed, AMD wasn't using it.



the54thvoid said:


>


x264 HD Benchmark runs on GPU and AMD undeniably has a stronger GPU in FX-8150 than Intel has in i7-2600K.  The problem stems from floating point operations executed on the CPU.  If you heavily load the FPUs in one core, the FPU performance of both cores will effectively half.


----------



## Joss (Nov 6, 2015)

This has nothing to do with justice or technology, it's a clever dick trying to make money.


----------



## RCoon (Nov 6, 2015)

So in 10 years time when the court battle is finally over, AMD FX purchasers will be able to apply for a $5 rebate on their FX purchase assuming they can remember where they bought it from and kept their proof of purchase.

Holding a fork in a world of soup, comes to mind.


----------



## FordGT90Concept (Nov 6, 2015)

I don't think it is class action.  I think it is just him (representing himself) versus AMD.  AMD will probably just settle with him for $500 or something.


----------



## Aquinus (Nov 6, 2015)

The problem here is that Bulldozer does have 8 integer cores. We all know that. Tony is an idiot because never has a floating point unit in a CPU ever been directly referred to as a core.


----------



## RCoon (Nov 6, 2015)

FordGT90Concept said:


> I don't think it is class action.  I think it is just him (representing himself) versus AMD.  AMD will probably just settle with him for $500 or something.



It is:

_"filed a class-action lawsuit on Oct. 26 in the U.S. District Court for the Northern District of California"_

SOURCE


----------



## the54thvoid (Nov 6, 2015)

FordGT90Concept said:


> Pretty sure he's going to win.  I don't think there's any nomenclature to properly describe Bulldozer's design and even if it had existed, AMD wasn't using it.
> 
> 
> x264 HD Benchmark runs on GPU and AMD undeniably has a stronger GPU in FX-8150 than Intel has in i7-2600K.  The problem stems from floating point operations executed on the CPU.  If you heavily load the FPUs in one core, the FPU performance of both cores will effectively half.



No GPU usage here....






or here?






etc

I'm not saying it has 8 cores but I am saying it matches an 8 threaded CPU.  Semantics.

And I'm an Intel/Nvidia guy.

I just think this case is all about a fucking douchebag trying to make money.  Now, if I could sue Lisa Su (or whoever said it) for an overclocker's dream comment at Fiji's launch.  Unless by dream, they meant nightmare???


----------



## gaximodo (Nov 6, 2015)

the54thvoid said:


> No GPU usage here....
> 
> 
> 
> ...



which the 8 threaded CPU has only 4 cores and marketed as 4 cores, with faster single core performance. 

That reminds me Intel's motorcycle vs tricycle ad. loved it.


----------



## Aquinus (Nov 6, 2015)

the54thvoid said:


> I just think this case is all about a fucking douchebag trying to make money.


Most lawsuits in the US of A seem to work that way.


----------



## FordGT90Concept (Nov 6, 2015)

Aquinus said:


> The problem here is that Bulldozer does have 8 integer cores. We all know that. Tony is an idiot because never has a floating point unit in a CPU ever been directly referred to as a core.


A "core" is a complete computing unit.  AMD proves it is not complete in their 6-"core" Bulldozer processors.  The two units packaged together are inseparable or they would have sold 7-"core" Bulldozer processors having only gated off the one that was defective.



the54thvoid said:


> No GPU usage here....
> 
> 
> 
> ...


Because compression is mostly integer-based where Bulldozer performs more like an 8-core processor.  Even considering the widely different architectures and Bulldozer having a design ideal for it, it doesn't win by a very large margin.  The lawsuit is about the worst case scenario (saturated FPUs) and you're citing the best case scenario (saturated ALUs) where Bulldozer's non-traditional design shines.  The latter doesn't forgive the former.


----------



## Aquinus (Nov 6, 2015)

FordGT90Concept said:


> Because compression is mostly integer-based where Bulldozer performs more like an 8-core processor. Even considering the widely different architectures and Bulldozer having a design ideal for it, it doesn't win by a very large margin. The lawsuit is about the worst case scenario (saturated FPUs) and you're citing the best case scenario (saturated ALUs) where Bulldozer's non-traditional design shines. The latter doesn't forgive the former.


Since when was the FPU the determining factor for what constitutes a core? Does that mean that old x86s without FPUs were CPUs without cores? That's a bad argument.


----------



## the54thvoid (Nov 6, 2015)

FordGT90Concept said:


> A "core" is a complete computing unit.  AMD proves it is not complete in their 6-"core" Bulldozer processors.  The two units packaged together are inseparable or they would have sold 7-"core" Bulldozer processors having only gated off the one that was defective.
> 
> 
> Because compression is mostly integer-based where Bulldozer performs more like an 8-core processor.  Even considering the widely different architectures and Bulldozer having a design ideal for it, it doesn't win by a very large margin.  The lawsuit is about the worst case scenario (saturated FPUs) and you're citing the best case scenario (saturated ALUs) where Bulldozer's non-traditional design shines.  The latter doesn't forgive the former.



I appreciat your higher level retort.  I'm simply trying to illustrate that Bulldozer isn't bad enough to drag through court becasue some guy didn't read some mother hubbard reviews.  



Spoiler



(It's too early to say mother fucking)


----------



## FourtyTwo (Nov 6, 2015)

the54thvoid said:


> No GPU usage here....
> 
> 
> 
> ...



Your reply just proves the challenger's point.
It matches a 4 core CPU with hyper-threading, not an 8 core.
Intel doesn't market the i7-2600K as having 8 physical cores, contrary to AMD's marketing of the FX-8150.

Over-all there's no doubt that AMD's marketing of Bulldozer is very misleading.


----------



## qubit (Nov 6, 2015)

Finally. It was such an effing con.


----------



## FordGT90Concept (Nov 6, 2015)

Aquinus said:


> Since when was the FPU the determining factor for what constitutes a core? Does that mean that old x86s without FPUs were CPUs without cores? That's a bad argument.


You have to go back over 20 years to find an x86 processor without FPUs (80386).  13 lifespans of computer hardware is hardly relevant today.


----------



## Aquinus (Nov 6, 2015)

FourtyTwo said:


> Your reply just proves the challenger's point.
> It matches a 4 core CPU with hyper-threading, not an 8 core.
> Intel doesn't market the i7-2600K as having 8 physical cores, contrary to AMD's marketing of the FX-8150.
> 
> Over-all there's no doubt that AMD's marketing of Bulldozer is very misleading.


You can still have crap for performance and still have real cores. Bulldozer's cruddy performance is due to caching and the pipeline. When you make the pipeline as long as Netburst, of course your single threaded performance is going to suck but, that doesn't change that fact that there are two integer cores per module.


FordGT90Concept said:


> You have to go back over 20 years to find an x86 processor without FPUs (80386).  13 lifespans of computer hardware is hardly relevant today.


It is when most of what computers are doing is integer math and can't operate without it. Nothing says you can't have 8 cores but still have shitty performance.

Computers can operate without a dedicated FPU, they can't without integer cores.


----------



## FordGT90Concept (Nov 6, 2015)

Aquinus said:


> Computers can operate without a dedicated FPU, they can't without integer cores.


Not today they can't.  You couldn't even open a JPEG image without an FPU.

Well, I suppose you technically could if you multiplied all of the floats by several million and divided to convert back.  Rounding errors would be aplenty and performance would be poor by comparison.  Everything that decodes it would also need special libraries for a non-IEEE754-compliant processor.


FYI, IEEE 754 was defined in 1985, the same year that 80386 debuted.  Intel didn't have the FPU hardware available yet to process IEEE 754.  80486DX was the first processor to be IEEE 754 compliant.  Because of IEEE 754, JPEG was made possible and created in 1987.


----------



## gaximodo (Nov 6, 2015)

I never really paid attention to AMD's CPU lineup, and I have been under this impression that AMD's single core performance is THAT bad and their 8-cores (FX8XXX) hardly keeps up intel's i5, 4-cores.

Finally now I know they are really also 4 cores and am actually quite impressed that AMD could "keep up" with intel.

Their marketing team should have been shoot.


----------



## the54thvoid (Nov 6, 2015)

Back on track - he'll still lose.

His case can be argued away quite easily by the block diagram of the integer cluster of two cores per module.  Four modules = eight bulldozer cores irrespective of performance.  The design parameter and AMD's naming of such can't be used against them in a court of law.  Without wishing to digress, fundamentally this (if the plaintiff won) would mean Nvidia with the GTX970 will lose any similar lawsuit on advertised units of 'x'.

To prove the point, if i can demonstrate that my '4 core' bulldozer part is not as good as my '8 core' bulldozer part and that my 8 core bulldozer part matches an 8 threaded Intel part (and remember, the 8 cores in BD are 8, not sold as 16) then I can prove function is apparent.  The technical breakdown of workloads, as being espoused eloquently above is irrelevant in a court room with people who will be utterly lost in the language.

_It doesn't matter how badly it performed in certain cases_, all AMD has to do to win is show it performs a heavily threaded workload (*due* to it's 8 'cores') as well as an 8 threaded Intel CPU.  AMD never sold it as hyper-threaded, they used 'cores' instead.  4 modules, each model has 2 cores, therefore it HAS 8 cores (as AMD defined them).  The case isn't about what it can and cannot do it's about if it has 8 cores.  And it technically does.  Same way a GTX 970 has 4 GB of VRam.

This thread isn't about performance as such, its about a nomenclature and it's pretty damn hard to prove there are not 8 integer 'cores' in a 4 module BD part.

From Wiki (forgive me, I'm a tech noob but I know logic, or should it be 'Law'gic)



> In terms of hardware complexity and functionality, this module is equal to a dual-core processor in its integer power, and to a single-core processor in its floating-point power



So it's there, each module has two integer cluster cores.  Doesn't matter if FP is equal to single core - that's a design limitation.  It has two cores per module.....  It really is all in a name.

EDIT: I had comically and erroneously called myself a tech 'nob' but have since rephrased to the correct tech noob - sue me


----------



## FordGT90Concept (Nov 6, 2015)

I have no confidence in tech businesses being able to explain anything to the sheeple.  Case in point: Seagate losing the 1 GB = 1,000,000,000 bytes case.

AMD won't win because AMD's 8-core processors isn't the same as Intel, IBM, Qualcomm, etc. 8-core processors and they never bothered to explain why to the same sheeple that will be hearing the case.

Seagate should have been open and shut (Microsoft is lying to you because they divide by 2^30) and they still lost; AMD's case is far from open and shut--they don't stand a chance.  Just look at how many words you're using to try to explain it.  Now imagine being cut off by someone saying "objection" every three words.  Courts don't do technical; they do testimony by a technical expert.




the54thvoid said:


> _It doesn't matter how badly it performed in certain cases_, all AMD has to do to win is show it performs a heavily threaded workload (*due* to it's 8 'cores') as well as an 8 threaded Intel CPU.


i7-3930K is a 6-core versus FX-8150...
Spoiler alert: 3930K won everything except a Photoshop benchmark (reviewer says it is because 8150 has higher clocks).


----------



## the54thvoid (Nov 6, 2015)

FordGT90Concept said:


> I have no confidence in tech businesses being able to explain anything to the sheeple.  Case in point: Seagate losing the 1 GB = 1,000,000,000 bytes case.  AMD won't win because AMD's 8-core processors isn't the same as Intel, IBM, Qualcomm, etc. 8-core processors.



Let's make a one pound/dollar paypal bet on this.  If AMD doesnt win, I'll send you a dollar (or pound equivalent) and vice versa.  Though going by the poll, I'm in the minority and may lose.

As far as lies go - when the hell will smartphones stop being sold with 'x' GB of memory then?  OS always takes up room that cannot be used by the consumer so it should be sold as space.

Editted for wrong bet


----------



## john_ (Nov 6, 2015)

If there are 8 cores in there, AMD will win. I am pretty sure when they decided to design and market Bulldozer they knew what they where doing, even if that design was a failure. No one says that an integer core needs to come with an FPU attached. Until Intel 486 (or was it 386DX?), FPU was an external co processor. The fact that today is an internal part, doesn't mean that it is part of the definition "core". Today all CPUs come with IMC. That doesn't mean that a processor without IMC is not a complete processor. And the same consumer who doesn't know that Bulldozer comes with half FPUs, doesn't know what an FPU is and even if he does, doesn't know if it is necessary to be included to talk about cores.

Anyway, one last thing. big.LITTLE. It is advertised as an eight core design, but only four of them are used at once.


----------



## Aquinus (Nov 6, 2015)

FordGT90Concept said:


> Not today they can't.  You couldn't even open a JPEG image without an FPU.
> 
> Well, I suppose you technically could if you multiplied all of the floats by several million and divided to convert back.  Rounding errors would be aplenty and performance would be poor by comparison.  Everything that decodes it would also need special libraries for a non-IEEE754-compliant processor.
> 
> ...


You do realize that integer cores can do floating point math and that having an FPU is not a prerequisite to do floating point math. Yes, it's less efficient because it requires several more cycles because of the number of uOps involved but, that's how it used to be done before there was a dedicated FPU. The FPU simply made floating point math faster. That's my point.


----------



## john_ (Nov 6, 2015)

FourtyTwo said:


> It matches a 4 core CPU with hyper-threading, not an 8 core.


So? Performance has nothing to do here. If that was the case then anyone could sue Intel for the performance of a quad core Atom compared to a quad core Haswell. Not to mention ARM cores and how far behind they are in IPC.


----------



## john_ (Nov 6, 2015)

FordGT90Concept said:


> Not today they can't. You couldn't even open a JPEG image without an FPU.


I was opening JPEG files on my Atari STe a few millenniums ago and that  didn't had an FPU. The Motorola 68000 didn't had an FPU. You just decode JPEG in software, while... making dinner.


----------



## FordGT90Concept (Nov 6, 2015)

Aquinus said:


> You do realize that integer cores can do floating point math and that having an FPU is not a prerequisite to do floating point math. Yes, it's less efficient because it requires several more cycles because of the number of uOps involved but, that's how it used to be done before there was a dedicated FPU. The FPU simply made floating point math faster. That's my point.


My point is that FPU has been a staple of x86 since 1989...which we aren't even using today.  AMD64 (2003) instruction (which Bulldozer uses) set requires an FPU for processing SSE instructions.  AMD took the cheap route by sharing FPU resources for each physical core ("core" defined as having one FPU and two distinct ALUs).

Ever since Athlon X2, "cores" have been very clearly defined as being complete processors on the same package.  Bulldozer does not fit AMD's own definition of a 8-core.  It's a 4-core with hybridized symmetrical multithreading.




john_ said:


> I was opening JPEG files on my Atari STe a few millenniums ago and that  didn't had an FPU. The Motorola 68000 didn't had an FPU. You just decode JPEG in software, while... making dinner.


Sure it didn't have a floating point co-processor?

Edit: It did.


----------



## john_ (Nov 6, 2015)

FordGT90Concept said:


> Sure it didn't have a floating point co-processor?
> 
> Edit: It did.


I have that Mega STe in my basement. Still works great. Do you know what I was thinking of buying in the past, and eBay does give me an opportunity today if I decide to do so?

This:Motorola MC68881FN16 / MC68881FN16B

Mega STe had an open slot to insert a math co processor. MC68000 doesn't have an integrated FPU







Motorola 68000 - Wikipedia, the free encyclopedia


> These processors have* no* floating point unit and it is difficult to implement an FPU coprocessor (MC68881/2) with one because the EC series lacks necessary coprocessor instructions.



I could even render images. Of course this little image took about 8 hours to complete. I don't remember if it was with the 1040 STe of the Mega STe that I bought latter. With a math co processor it would have done probable only one or two hours.





That's the difference. The time it does to complete something. Not if it can complete something.


----------



## moproblems99 (Nov 6, 2015)

If this guy wins anymore than his money back and enough to cover his actual legal fees, then I am not going to do anything.  But I would be disappointed.


----------



## Aquinus (Nov 6, 2015)

FordGT90Concept said:


> My point is that FPU has been a staple of x86 since 1989...which we aren't even using today.  AMD64 (2003) instruction (which Bulldozer uses) set requires an FPU for processing SSE instructions.  AMD took the cheap route by sharing FPU resources for each physical core ("core" defined as having one FPU and two distinct ALUs).
> 
> Ever since Athlon X2, "cores" have been very clearly defined as being complete processors on the same package.  Bulldozer does not fit AMD's own definition of a 8-core.  It's a 4-core with hybridized symmetrical multithreading.
> 
> ...


When was the last time you tried using a CPU with only a FPU and no integer cores? It doesn't happen because memory accesses require an integer ALU. My point is that a computer isn't a computer as we know it without integer cores and that the FPU doesn't define the CPU. Yes, it's used more now because it's there but, there are requirements for the CPU to operate that need integer operations. A CPU doesn't need a FPU to operate even if we use it a lot now and there are plenty of microcontrollers that lack a FPU.

A FPU doesn't make the a CPU, the integer cores and the ALU does because a CPU can't function when you can't access memory which uses integer addressing since floating point numbers aren't very good at storing exact values unlike integers.

Either way, this is getting a little tiresome, simple fact is that a a CPU needs integer cores to operate *any kind of workload* whereas a FPU is only required for *fast floating point operations*. That's all I'm saying, so to say the FPU is what "makes or breaks" a core, is a little absurd in my opinion.


----------



## Uplink10 (Nov 6, 2015)

FourtyTwo said:


> Over-all there's no doubt that AMD's marketing of Bulldozer is very misleading.


A lot of marketing is misleading nowadays and I think the public (and the jury) started to automatically accept that misleading and exaggerating marketing is something normal.

I myself would crucify every company which used even slightly misleading marketing. But marketing has a lot of freedom, you can say that a 4 Core Intel CPU with HT has actually 8 Cores which would be true because there are 8 Virual cores available to OS.

If it is possible to disable 7/5/3/1 core(s) in a firmware then he should lose but if it is not then he should win.



FordGT90Concept said:


> Seagate should have been open and shut (Microsoft is lying to you because they divide by 2^30) and they still lost;


Yes, Microsoft shows the numbers in 2^X and the units in 10^X, a big mistake which they did not correct in Windows 10, tells you what shitty OS it is, it can't even get the basics right.


----------



## FordGT90Concept (Nov 6, 2015)

@Aquinus: going to have to agree to disagree.  For the record, an FPU can be designed to manage memory too but it would do so at a much slower rate than an ALU.  Both ALU and FPU are critical to all modern CPUs.



john_ said:


> MC68000 doesn't have an integrated FPU


Didn't say it did.  I said Atari Mega STE had an Motorola 68881 or Motorola 68882 co-processor.

If there was no co-processor installed, 68020 and 68030 would run the instruction through an FPU emulator (not unlike what I described when first referring JPEG).


----------



## john_ (Nov 6, 2015)

FordGT90Concept said:


> Didn't say it did. I said Atari Mega STE had an Motorola 68881 or Motorola 68882 co-processor.
> 
> If there was no co-processor installed, 68020 and 68030 would run the instruction through an FPU emulator (not unlike what I described when first referring JPEG).


Yes that's what you said. Please don't try to hide behind your finger. It's not that bad to say "maybe I was wrong in this".
The FPU in the Mega Ste was an optional upgrade, as it says in the link that YOU posted "optional FPU". And forget Mega STe if it confuses you. There was NO FPU on the Atari ST/Atari STe/Amiga 500.


----------



## FordGT90Concept (Nov 6, 2015)

How does this not describe the Atari ST?


FordGT90Concept said:


> Well, I suppose you technically could if you multiplied all of the floats by several million and divided to convert back.  Rounding errors would be aplenty and performance would be poor by comparison.  Everything that decodes it would also need special libraries for a non-IEEE754-compliant processor.


"special libraries" = FPU emulator

ALU can do everything FPU can do at a performance cost; FPU can do everything ALU can do at a performance cost.  Why does that matter, at all, when both have been central to CPU design for the past two decades?  AMD can't win that argument.


There's only one question here and it is this: are the two logical cores of the Bulldozer design separable?  If yes, they are "cores."  If not, the combined unit is a "core" with SMT.


----------



## Aquinus (Nov 6, 2015)

FordGT90Concept said:


> For the record, an FPU can be designed to manage memory too but it would do so at a much slower rate than an ALU.


Due to the nature of floating point numbers, the loss of precision is the exact reason why the FPU can't be used to manage memory because exact values are required for memory locations. Memory addressing *requires integers*. There is no getting around that. The simple fact is that the FPU is an addition to the CPU, not the heart of it. Floating point math simply is not going to give you the atomic values that memory addressing requires. The simple fact is that there is a reason why CPUs started with integer cores and later were expanded to improve other kinds of math. This is a "how computers fundamentally work," thing and I have to call it out.


----------



## FordGT90Concept (Nov 6, 2015)

We're talking about 16-bit processors.  A 32-bit FPU (24-bit mantissa) can easily handle all memory addressing needs without rounding.  Even modern 64-bit processors have a 128-bit FPU (113-bit mantissa).  Motorola 68881 has 80-bit FPUs (64-bit mantissa).

CPUs started with integers because the concept of non-integers is alien to binary.  IEEE 754 was required to put all hardware and software developers on the same page.


----------



## happita (Nov 6, 2015)

Every time I see FordGT, Aquinus, or some other known member who likes debating about technicalities, I think to myself "I'm so happy my brain isn't wired like theirs is" 

Then I grab my popcorn and kick back while enjoying the back & forths 

PS Love you guys


----------



## R-T-B (Nov 6, 2015)

FordGT90Concept said:


> You have to go back over 20 years to find an x86 processor without FPUs (80386).  13 lifespans of computer hardware is hardly relevant today.



You however don't have to look any farther than the modern era of arm cores and even some server archictecutres (SPARC comes to mind) to find modern "octacore" CPUs with only one FPU.

This is nonsense.  It's a core if it can math at all.  Heck earlier CPUs lacked a MULTIPLICATION FUNCTION.  They were still considered a core.


----------



## GLD (Nov 6, 2015)

The person suing is a greedy jerk trying to fleece AMD. Poop on him. If he isn't happy with AMD, then stop being a cry baby and just switch to Intel.


----------



## GhostRyder (Nov 6, 2015)

Its a frivolous lawsuit, there are 8 cores there they just share resources with each other which is why they were not up to par with Intel in many ways.

When the chip is utilized to its max it does perform very well.   The problem is using it to the max part as that requires some work on the coders part to max the app very heavily threaded.


----------



## the54thvoid (Nov 6, 2015)

FordGT90Concept said:


> Spoiler alert: 3930K won everything except a Photoshop benchmark (reviewer says it is because 8150 has higher clocks).



I like you but you're completely ignoring my point to prove a technical aspect of the architecture which isn't relevant to a case where a guys says the chip doesn't have 8 cores.  It's got 8 cores, albeit badly designed.

Also, a 3930k has 6 cores and 12 threads.  BD has no more than 8 of anything.  My whole point was to say that you cannot compare Intel's cores with Bulldozer - they're different architectures.   What Intel call a core, AMD call a module.  What intel call 'threads', AMD call 'Cores'.  Again, non tech court talk.

But to prove a simple point using 'expert' evidence:

http://www.tomshardware.co.uk/cpu-performance-comparison,review-32592-12.html



> AMD’s Bulldozer FX-8170 exhibits the best performance thanks to its *eight integer cores*


  What, 8 cores thrashing the 12 threads of a 3960X?  I know you can pick this apart by saying 'blah' this 'blah' that but it doesn't matter.  AMD can prove their product uses 8 integer cores when doing 'certain' tasks, it is therefore not misleading to sell it as such.  

And FTR, BD 8 core = 1500, BD 6 cores = 1009. BD 4 core = 751.  100%, 66%, 50% performance drops, roughly what you'd expect with a drop in cores, especially relevant given the 4 core FX4170 is at 4.1Ghz and the 8 core FX8350 is at 4Ghz.  Same speeds, half the cores, half the score.


----------



## FordGT90Concept (Nov 6, 2015)

R-T-B said:


> You however don't have to look any farther than the modern era of arm cores and even some server archictecutres (SPARC comes to mind) to find modern "octacore" CPUs with only one FPU.


SPARC T5 has 16 cores with 8 threads per core (SMT).  Its design is more similar to HyperThreading than Bulldozer.



GhostRyder said:


> Its a frivolous lawsuit, there are 8 cores there they just share resources with each other which is why they were not up to par with Intel in many ways.


So why didn't AMD call it what it is?  A quad core.  Cores, prior to Bulldozer, don't share resources except low level cache.  Even calling them "modules" is misleading because" modules" implies modularity.  They have none.

Frankly, I'm shocked the lawsuit is happening now and not four years ago.



the54thvoid said:


> It's got 8 cores, albeit badly designed.


No, it has four cores with hybridized symmetrical multithreading.  Note how it all rams through one scheduler/dispatcher:




Each core should have its own scheduler/dispatcher (posted twice for a dual-core comparison):








You can lop off an Intel core and the other will still function perfectly.  You cannot do the same to Bulldozer without cutting off two of what AMD calls "cores" (because they are not cores).


"Core" was defined because two logical processors were combined on one package (CPU) as a means to distinguish two processors from two logic processors inside of one package.  AMD tried to redefine "core" with Bulldozer and they cannot redefine what has already been defined.


----------



## R-T-B (Nov 6, 2015)

happita said:


> Every time I see FordGT, Aquinus, or some other known member who likes debating about technicalities, I think to myself "I'm so happy my brain isn't wired like theirs is"
> 
> Then I grab my popcorn and kick back while enjoying the back & forths
> 
> PS Love you guys



It's ok, I'm happy my brain isn't wired like yours too.

Love all around!



> SPARC T5 has 16 cores with 8 threads per core (SMT). Its design is more similar to HyperThreading than Bulldozer.



And up until T3 they all shared 1 FPU.  They may still even if they reverted to their old ways, I heard rumblings about that at one point  Regardless, they certainly do share a single FPU on many arm chips in cellphones.  Heck, some ARM CPUs lack an FPU entirely.

Technology terminology is not restricted to the desktop space.

EDIT:  Just saw your diagram above.  Am I to understand that it's more than the FPU per core that bulldozer is lacking?  If so I recant.


----------



## FordGT90Concept (Nov 6, 2015)

SPARC is a database processor.  The only time it uses FPU is when it has to do floating point math inside of a table/function.  90% of their workload involves playing with memory so it absolutely makes sense that they went scant on FPU.  Bulldozer is a general processor.  It shouldn't nerf itself in the way that SPARC does because that's not the way people use it.


The whole thing is ridiculous because AMD calls each "integer cluster" a "core."  Who _really_ agrees with that logic?


----------



## DeOdView (Nov 6, 2015)

Oh, bloody hell...!  I'm filing a class action lawsuit against the DEVs  for not developing software that's fully utilized all my 8 cores for all this years.  

Any idiot lawyers in the house?


----------



## cadaveca (Nov 6, 2015)

FordGT90Concept said:


> SPARC is a database processor.  The only time it uses FPU is when it has to do floating point math inside of a table/function.  90% of their workload involves playing with memory so it absolutely makes sense that they went scant on FPU.  Bulldozer is a general processor.  It shouldn't nerf itself in the way that SPARC does because that's not the way people use it.
> 
> 
> The whole thing is ridiculous because AMD calls each "integer cluster" a "core."  Who _really_ agrees with that logic?


Well, see, the thing here is that legal English and real English are not one and the same. So there is something to argue here, and that would be the legal definition of what a "core" is. That in and of itself is a useful thing for consumers, and makes the actual lawsuit details here unimportant. While we could argue this point for days and days, it'll be up to a single judge to define what a core should consist of, and that may not be a task that your everyday judge might want to take upon himself.


----------



## yogurt_21 (Nov 6, 2015)

cadaveca said:


> Well, see, the thing here is that legal English and real English are not one and the same. So there is something to argue here, and that would be the legal definition of what a "core" is. That in and of itself is a useful thing for consumers, and makes the actual lawsuit details here unimportant. While we could argue this point for days and days, it'll be up to a single judge to define what a core should consist of, and that may not be a task that your everyday judge might want to take upon himself.


such a thing would be useful accross the board. Especially now that we have 8 "core" phone/mobile cpus. How is a consumer supposed to rationalize that against what counts for a core on a computer?


----------



## FordGT90Concept (Nov 6, 2015)

The result of this lawsuit is going to be "core = integer cluster" on Bulldozer packaging not unlike the result of the Seagate lawsuit resulting in "1 GB = 1,000,000,000 bytes" on everything that uses the gigabyte measure from hard drives, to SSDs, to flash, to optical disks.

Intel and AMD Zen might do "core = logic processor" just to be safe.


----------



## Patriot (Nov 6, 2015)

For all of you who have obviously forgotten... the FPU on bulldozer is doublewide... it can split itself in half and do two fpu at a time or one double wide one...   It just sucks at it.

Shit performance, not lying specs.


----------



## GorbazTheDragon (Nov 6, 2015)

Is BD a crappily designed architecture: Yes

Is BD an 8 core CPU: Yes


----------



## moproblems99 (Nov 6, 2015)

cadaveca said:


> Well, see, the thing here is that legal English and real English are not one and the same. So there is something to argue here, and that would be the legal definition of what a "core" is. That in and of itself is a useful thing for consumers, and makes the actual lawsuit details here unimportant. While we could argue this point for days and days, it'll be up to a single judge to define what a core should consist of, and that may not be a task that your everyday judge might want to take upon himself.



It depends upon what the meaning of the word 'is' is...


----------



## FordGT90Concept (Nov 6, 2015)

GorbazTheDragon said:


> Is BD an 8 core CPU: Yes


No, four cores with two "integer clusters" each.  AMD marketeers (deliberate misspelling) used the word "core" to describe only the "integer clusters" to make the layfolk believe AMD is selling 8-core processors for the price of an Intel quad-core.  It's false advertising, pure and simple.

Opteron 6386 SE is an example of an AMD 8-core (AMD claims it is 16-core).


Edit: If the block diagram isn't clear enough...count the cores:





source


Now for Intel.  Again, count the cores:




source


----------



## R-T-B (Nov 6, 2015)

FordGT90Concept said:


> The whole thing is ridiculous because AMD calls each "integer cluster" a "core."  Who _really_ agrees with that logic?



I do.  It's a generally accepted concept across a broad spectrum of CPU technology.  The most basic core is an integer cluster.  At least that's how I have always understood it.

Can we all agree however that this is a pretty silly thing to sue about?


----------



## Patriot (Nov 6, 2015)

They are selling an 8 core at the price of a 4 core... because it performs like it.
Also the 16c, 8 module opteron performs better than the previous gen 12 core... so... just because it performs shittily compared to intel, doesn't make it false advertising.

2 int + 1 double wide fpu per module, a fpu that can do 2 fpu at a time... or 1 per int... 
It just didn't play out like they planned.


----------



## FordGT90Concept (Nov 6, 2015)

R-T-B said:


> I do.  It's a generally accepted concept across a broad spectrum of CPU technology.  The most basic core is an integer cluster.  At least that's how I have always understood it.
> 
> Can we all agree however that this is a pretty silly thing to sue about?







An integer cluster without an instruction decoder is what we call a calculator, not a logic processor.

No, I don't agree with that.  AMD should have been sued the day it released.


----------



## moproblems99 (Nov 6, 2015)

FordGT90Concept said:


> No, four cores with two "integer clusters" each. AMD marketeers (deliberate misspelling) used the word "core" to describe only the "integer clusters" to make the layfolk believe AMD is selling 8-core processors for the price of an Intel quad-core. It's false advertising, pure and simple.



Were you this mad when nvidia put 4GB of memory on the 970 but only 3.5GB were useful?


----------



## iO (Nov 6, 2015)

R-T-B said:


> Can we all agree however that this is a pretty silly thing to sue about?



Yes, especially four years after their release...


----------



## R-T-B (Nov 6, 2015)

FordGT90Concept said:


> An integer cluster without a scheduler is what we call a calculator, not a logic processor.



I'd say I'm old and you're just more "with it" than me, except most cellphones/tablets follow this model, and there are probably more of them than Desktop/laptop units now.  And sadly, that is what the cool kids use these days.

Not sure if I should feel hip now, or collapse in a pile of x86 oldness.


----------



## FordGT90Concept (Nov 6, 2015)

moproblems99 said:


> Were you this mad when nvidia put 4GB of memory on the 970 but only 3.5GB were useful?


Yes.



R-T-B said:


> I'd say I'm old and you're just more "with it" than me, except most cellphones/tablets follow this model, and there are probably more of them than Desktop/laptop units now.  And sadly, that is what the cool kids use these days.
> 
> Not sure if I should feel hip now, or collapse in a pile of x86 oldness.


ARM Cortex quad-core:




...four...distinct...cores.  You could chop three of them off and it'll still run as a uniprocessor.


----------



## moproblems99 (Nov 6, 2015)

FordGT90Concept said:


> Yes.



Fair enough.


----------



## R-T-B (Nov 6, 2015)

FordGT90Concept said:


> Yes.
> 
> 
> ARM Cortex quad-core:
> ...




You got me there.

My age IS showing.  ARM got with it, apparently.   NEON is a pretty competent FPU, too.

I'm more familiar with the old TI OMAPs which were A6's if I recall.  They had a single FPU in some cases, particularly the pandaboards I was taught wonderful linux-land on for android builds in a very hipster programming class.

Still, this is a new trend, putting FPUs on everything.  It's not like the term "core" didn't mean something entirely different just 10 years ago, so cut me some slack.

That and there are still processors out there that follow the bulldozer model.

I'm not sure I'm ready to declare the old definition of what a "core" is dead just because everyone decided to pop something new on board.  Does that really change what a computer fundamentally does, which is math?


----------



## FordGT90Concept (Nov 6, 2015)

The definition of "core," in multiprocessor context, changed the day AMD Athlon 64 X2 launched (May 2005): two CPUs on one socket.


----------



## GorbazTheDragon (Nov 6, 2015)

The thing is core count has always been one of the most useless metrics to go by when it comes to CPU performance... Otherwise everyone would still be running core 2 quads.

Anyone who buys a computer based on CPU count is ignorant and stupid.


----------



## HumanSmoke (Nov 6, 2015)

Aquinus said:


> Most lawsuits in the US of A seem to work that way.


All part of the updated Murican Declaration of Independence - "Life, Liberty and the pursuit of Litigation"
I bet AMD's PR department are pissed that they didn't go with the first draft of their advertising


----------



## R-T-B (Nov 6, 2015)

FordGT90Concept said:


> The definition of "core," in multiprocessor context, changed the day AMD Athlon 64 X2 launched (May 2005): two CPUs on one socket.



Technically, the PPC and a few other risc chips beat them to it by a good margin.

Still, I will grant you that the world has changed.


----------



## john_ (Nov 6, 2015)

FordGT90Concept said:


> ARM Cortex quad-core:


How about octa core chips following big.LITTLE design? Are they real octa cores?


----------



## FordGT90Concept (Nov 6, 2015)

R-T-B said:


> Technically, the PPC and a few other risc chips beat them to it by a good margin.


Got a model?  The best I could come up with was PowerPC G5 970MP.  It debuted two months after X2.

Edit: Ah, IBM POWER4 released 2001.



john_ said:


> How about octa core chips following big.LITTLE design? Are they real octa cores?


Technically yes but they need to make it clear how many can run simultaneously.


----------



## HumanSmoke (Nov 6, 2015)

john_ said:


> How about octa core chips following big.LITTLE design? Are they real octa cores?


Well, they all reside on a single die and are defined by less by core count than core power parameters. If big.LITTLE is suspect then you can also expect a flurry of lawsuits for every MCM ever made including Pentium D, Core2Quad, and AMD's G34 server chips, and some of IBM's Power series.


----------



## 2wicked (Nov 6, 2015)

the54thvoid said:


> EDIT: I had comically and erroneously called myself a tech 'nob' but have since rephrased to the correct tech noob - sue me


If someone does sue for that tell them they are pulling a Dickey.


----------



## NC37 (Nov 6, 2015)

Northern Cali...yep, thats probably the only place in the nation he could sue and win on a case like this. For that matter, why the heck did he wait until Zen is almost here? This should have been a thing back when BD launched, not now with Zen coming.


----------



## lilhasselhoffer (Nov 6, 2015)

This is an interesting discussion and all, but for a moment I'm going to ask you to stop.

Now that I've asked the impossible, let me explain why.  You have to argue this with a judge (and depending upon the case structure and local rules a jury).  Said judge can be anything from technically savvy, to barely able to turn on their television set.  Now that we've framed the discussion, let's argue the point.

1) Define what a core is, to the lay person.
_a_ *:*  a basic, essential, or enduring part (as of an individual, a class, or an entity) <the staff had a _core_ of experts> <the _core_ of her beliefs>   -According to Merriam Webster online dictionary-

2) How is this definition applied to the case at hand?
A critical component to the processor is missing, thus preventing you from calling your CPU "8 cores" due to this fundamental component being removed.

3) How do you prove it?
-Fail-  AMD can pull out a wealth of nice pretty graphs that demonstrate their processors perform at the same level as a similar core count offering from their competitors.  These programs can be cited as utilizing the number of cores better than other programs, and the burden of proof falls to the accuser to prove that this is not the case.  The accuser can provide charts all day proving that the processor doesn't work as well on some programs, yet without the source code to prove it they're completely without a valid accusation.

4) What experts support your claims?
-Fail-  AMD can wheel out an assortment of experts in the computing field.  Each of them can attest that in a very specific type of loading their product performs better than their own 4 core offerings, thus invalidating the claim that 8 cores aren't performing better than 4.  If the accuser compares them to Intel, they need only state that they are a competitor with a different and non-comparable product.  It then rests on the accuser to prove they are comparable, which would take enough technical jargon to completely alienate a jury.  The accuser can therefore not really level an accusation here that isn't killed by its own complexity.

5) How are damages being calculated?
-Fail- This is fuzzy at best.  If the accuser says that an 8 core AMD CPU is comparable to an 8 core Intel one they've got to quantify the massive gulf in pricing.  That will be nigh impossible.  If they want to argue an 8 core processor is less efficient than a quad core then they've got to be able to compare like measurements to justify any losses.  AMD needs only wheel out the "we aren't Intel" argument to make this impossible (there is literally nothing that these can be directly compared to).

6) Does the judge have an unbiased opinion?
-Fail- This is a lawsuit in Northern California.  Let's be honest here, damaging AMD isn't in the best interest of the locals, and this is a largely frivolous suit.  It's being filed years after Bulldozer hit the market.  It is initiating class action without having much of a basis from which to stand (given all the technical material available stating that the "core" of Bulldozer was something entirely new).  But worst of all is that the suit's associated lawyers, a new kind of vampire that is more than happy to sue about anything related to technology.  Seriously, read through their site, it's the most atrocious vulture group since the copyright trolls:http://www.edelson.com/in-the-news/



I'm all for making AMD aware that their crap with the Bulldozer "core" is unacceptable, we did that by not buying it.  I don't have a problem with law suits which are directed at reparations directly related to a wrong done by someone, this is what the legal system is designed for.  What we've got here though is utter crap.  Somebody raising a suit too late to be relevant, not seeking reparations for consumer benefits, and worst of all hiring a law firm that banks on technology being too complex for our legal system to wade through as is.  Sorry, but the accuser is a vampiric douche who is using the legal system as a weapon.  If they got hit by a car tomorrow I think the gene pool would be better off (along with the legal offices being swallowed up in a massive sink hole).




Edit:
Let me be even more angry about this stupidity.  Here's what Anandtech said in 2009:
http://www.anandtech.com/show/2881/2

2 years before Bulldozer was released this information was publicly available.  Ignorance of the law is not an excuse not to follow it.  Likewise, this kind of ignorance to what you are buying doesn't make the accuser deserving of reparations for their idiocy.  I wonder if AMD maintains PR release information (I say sarcastically)?  I wonder if this idiot ever read any of them, or when they're introduced in court his legal team will be able to dismiss them as...I can't even imagine how you'd argue your client isn't a moron at that point.


----------



## HumanSmoke (Nov 6, 2015)

NC37 said:


> Northern Cali...yep, thats probably the only place in the nation he could sue and win on a case like this. For that matter, why the heck did he wait until Zen is almost here? This should have been a thing back in BD launched, not now with Zen coming.


Because, with his hoped for settlement he can afford to upgrade. He will then buy Zen and will sue because he couldn't achieve enlightenment with it.


----------



## FordGT90Concept (Nov 6, 2015)

NC37 said:


> Northern Cali...yep, thats probably the only place in the nation he could sue and win on a case like this. For that matter, why the heck did he wait until Zen is almost here? This should have been a thing back when BD launched, not now with Zen coming.


Bulldozer chips are still for sale and he probably bought one recently or someone came to him with it.


----------



## boogerlad (Nov 6, 2015)

FordGT90Concept said:


> Pretty sure he's going to win.  I don't think there's any nomenclature to properly describe Bulldozer's design and even if it had existed, AMD wasn't using it.
> 
> 
> x264 HD Benchmark runs on GPU and AMD undeniably has a stronger GPU in FX-8150 than Intel has in i7-2600K.  The problem stems from floating point operations executed on the CPU.  If you heavily load the FPUs in one core, the FPU performance of both cores will effectively half.




You're hilarious. x264 only uses opencl for frame look ahead and even then, the boost is very small. There is no gpu in the fx-8150 anyways.


----------



## FordGT90Concept (Nov 6, 2015)

lilhasselhoffer said:


> 3) How do you prove it?


Block diagrams.  K8 versus Bulldozer.



lilhasselhoffer said:


> 4) What experts support your claims?


Any "expert" that can interpret the block diagrams before the court.  It likely doesn't require much explaining that Bulldozer is missing a lot of parts in the module to constitute two discreet logic processors.



lilhasselhoffer said:


> 5) How are damages being calculated?


Dickeys likely gave a number using his own formula.  Court will have to decide if that formula is fair or not.



lilhasselhoffer said:


> 6) Does the judge have an unbiased opinion?


Judges aren't supposed to be biased.  If they have a bias, they're supposed to recuse themselves.



boogerlad said:


> You're hilarious. x264 only uses opencl for frame look ahead and even then, the boost is very small. There is no gpu in the fx-8150 anyways.


Tech ARP x264 HD Benchmark, from what I was able to research, doesn't explain its process methods.  All that is abundantly clear is that it benefits from more cores (highest scores go to Xeons with many cores).  Judging by the benchmarks, it appears that it is heavily ALU oriented which plays to Bulldozer's benefit.


----------



## moproblems99 (Nov 6, 2015)

FordGT90Concept said:


> Tech ARP x264 HD Benchmark, from what I was able to research, doesn't explain its process methods. All that is abundantly clear is that it benefits from more cores (highest scores go to Xeons with many cores). Judging by the benchmarks, it appears that it is heavily ALU oriented which plays to Bulldozer's benefit.



So doesn't that just corroborate that a Bulldozer has 8 cores?  The more cores the higher the scores?


----------



## FordGT90Concept (Nov 6, 2015)

No because a) its single threaded performance is very poor and b) the 6-core Phenom II X6 1055T beats the "8-core" FX-8150 in the very same test FX-8150 excels.  6 real cores are better than 4 cores masquerading as 8 (that so sounds like AMD crying "native quad core" when Core 2 Quad debuted and trounced the X4 processors).


----------



## m0nt3 (Nov 6, 2015)

By this same metric, shouldn't he take Intel to court over the 286/386 math co-processor. Which was a floating point unit......


----------



## moproblems99 (Nov 6, 2015)

FordGT90Concept said:


> No because a) it chokes when you feed it floating points and b) a two-way 4-core K8-based Opteron workstation will beat the "8-core" bulldozer in almost every way.



Will it beat it in Tech ARP x264 HD Benchmark that you stated above benefited from more cores?  I'm not trolling, or my intent anyway.  Just trying to figure out your thought process.


----------



## FordGT90Concept (Nov 6, 2015)

I edited and the answer is yes.  It only takes six K10 cores to beat eight Bulldozer "cores" even at significantly lower clocks (2.8 GHz versus 3.6 GHz).

I see it coming: "oh, but Bulldozer is technically only 4-core so a 6-core should be it!"  My point, exactly.




m0nt3 said:


> By this same metric, shouldn't he take Intel to court over the 286/386 math co-processor. Which was a floating point unit......


FPUs were spotty around 1990s simply because it was brand new technology.  You could argue Bulldozer was brand new technology too but, at that point, the definition of "core" was pretty well established for 6 years prior to that.  The use of the word "core" where it isn't appropriate is why this lawsuit has merit.


----------



## moproblems99 (Nov 6, 2015)

FordGT90Concept said:


> I edited and the answer is yes.  It only takes six K10 cores to beat eight Bulldozer "cores" even at significantly lower clocks (2.8 GHz versus 3.6 GHz).
> 
> I see it coming: "oh, but Bulldozer is technically only 4-core so a 6-core should be it!"  My point, exactly.
> 
> ...



I see your point but honestly, if someone buy things solely on advertising, they deserve what they get.


----------



## dorsetknob (Nov 6, 2015)

FordGT90Concept said:


> 6 real cores are better than 4 cores masquerading as 8


And that's why i went Xeon rather than I7

Lawsuit in America =  I Farted now i'm being sued in a class Action Because i polluted breathing Air


----------



## lilhasselhoffer (Nov 6, 2015)

FordGT90Concept said:


> ...
> Any "expert" that can interpret the block diagrams before the court.  It likely doesn't require much explaining that Bulldozer is missing a lot of parts in the module to constitute two discreet logic processors.
> 
> 
> ...



The counter arguments are simple.  Who defined that a CPU required certain things?  Your own earlier statements comparing various architectures prove that the point being made is invalid.  You can't justify that a component is necessary, unless you can prove it directly influences end results, which they can't reliably do if even one instance proves the contrary.  The plaintiff accuses AMD of removing a critical component, yet demonstrably it is not critical.  Kinda hard to have an argument when the basis for said argument is impossible to justify.

Dickey is full of crap here, based on claims.  This is a civil suit, and filed based upon a California law which doesn't have many parallels universally recognized throughout the country.  As others have stated, this guy is basically taking what may be a couple of hundred dollars of processor and suing AMD for it, magically lost time, legal bills, and everything else.  I'm sorry, but if this was actually about lost performance that is being claimed they'd have something more than that.  I understand that a judge will only consider the plaintiff's request, but there's a difference between bargaining like this is a used car lot and asking for fair reparations.  Whenever somebody asks for $5, and the cost of the original product was $1 they've got to either have an exceptional case or exceptional proof.  Their "proof" as yet is a bunch of technical data sheets and block diagrams.  https://www.pacermonitor.com/public/case/9674725/Dickey_v_Advanced_Micro_Devices,_Inc  Hell, the filing fee in this court is $400, which could have bought a new system with an Intel quad core.  This isn't about helping consumers, and the money proves it.

Really?  I understand wanting to believe that judges have no bias, but what sort of world do you live in?  The one I live in has people being named as judges.  These people have motivations, such as seeing the best thing done for their community, and delivering their own form of "justice."  To the former, suing an ailing company into the ground will have a negative impact on locals.  If this were MS, Samsung, or Seagate I'd be less concerned with impartiality.  To the later, you have to weigh timing.  This person made no effort to get refunded, waited until years after official marketing material was released, hired lawyers from Chicago to represent them in Oakland, and has yet to show any desire or interest in the public good.  I'm sorry, but with all that easily demonstrable, it's impossible for a technological hermit to not have an underlying bias when dealing with someone.  Judges are human, above all other things.



One last point here.  Intel had Pentium 4, and the nutburst...ahem...netburst issues.  They got sued, so theoretically you can use that as a basis for the AMD suit.  Except, you can't.  The reason Intel lost that suit was they manipulated benchmarks to sell their product.  They LIED to customers: http://www.zacks.com/stock/news/153085/intel-settles-pentium-4-lawsuit-by-paying-15-to-customers  AMD didn't lie.  They may have been optimistic to think that changing the architecture around would allow performance to universally be better, but they released benchmarks which were confirmed by outside sources.  Yes, calling them octo-cores is sleazy, but it isn't a lie or marketing altering the truth.  AMD's already paid for Bulldozer being a turd with years of poor sales, this is an opportunist trying to make money because AMD is likely to settle and make this go away.  Zen is too big of a component of AMD's future to allow a pending lawsuit to tarnish the name.  The Chicago lawyers know that, and they're using it to get functionally free money.

Again, read through the lawyer's own page.  If you don't want to punch them in the face afterwards you're a far more tolerant person than I.


Edit:


FordGT90Concept said:


> I edited and the answer is yes.  It only takes six K10 cores to beat eight Bulldozer "cores" even at significantly lower clocks (2.8 GHz versus 3.6 GHz).
> 
> I see it coming: "oh, but Bulldozer is technically only 4-core so a 6-core should be it!"  My point, exactly.
> 
> ...



So let me get this straight.

On one hand the plaintiff is smart enough to know what components a core entails, based upon the CPU architecture.

On the other hand, the plaintiff is not responsible enough to seek out any information on what is advertised as a completely new architecture.  They are assumed to never have seen any information from 2009 to 2015 (look back to the Anandtech link I posted).


This person exists in such a narrow bubble of knowledge and ignorance that they can't possibly exist.  It's be like saying a person has eaten hamburgers their entire life, and because of the name they assumed that they were made out of pork.  They are now suing McDonalds because they were in fact a unique branch of Hindu, and killing pigs was acceptable but killing cows wasn't.

To say that preposterous statement hurts my cognitive faculties.  They want me to drive a rusty spoon through my brain and scoop out my frontal lobe.  The US is full of stupid lawsuits, but that doesn't mean we need to find the few examples of when they're true.


----------



## Pill Monster (Nov 6, 2015)

Bulldozer does have 8 cores but only 4 L2 cache chips so they are arranged in pairs (4 modules).

Which if anyone is intersted is the reason why MS released a hotfix for the scheduler. in W7


Default core scheduling in Windows is 1,2,3,4,5,6,7   but BD/PD ideal scheduling is 1,3,5,7,2,4,6,8  due to the shared cache.  They are real cores btw, HT is purely logical.

Just some fyi.


----------



## truth teller (Nov 6, 2015)

> Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would


it will be dismissed once amd proves that even a "true dual core" processor is able to execute more than 8 instruction in parallel (instruction pipelining, out-of-order execution, etc.), still they deserve the bad rep for advertising modules as real cores (they are closer to real cores than HT modules, but nonetheless still not full cores)


----------



## HumanSmoke (Nov 6, 2015)

dorsetknob said:


> Lawsuit in America =  I Farted now i'm being sued in a class Action Because i polluted breathing Air


Unless you own up to it, you'll probably get taken to court by the EPA for violation of hazardous natural gas disposal.


----------



## lilhasselhoffer (Nov 6, 2015)

HumanSmoke said:


> Unless you own up to it, you'll probably get taken to court by the EPA for violation of hazardous natural gas disposal.



Good god.  Your jokes are surprisingly close to the truth:
http://www3.epa.gov/airquality/sulfurdioxide/

75 ppb is the official emissions level which cannot be exceeded.  This is why the EPA is a good idea, but so poorly implemented as to be a joke.

Edit:
Also @HumanSmoke, you should watch your jokes.  From Wikipedia: Since New Zealand produces large amounts of agricultural products, it is in the unique position of having high methane emissions from livestock compared to other greenhouse gas sources. The New Zealand government is a signatory to the Kyoto Protocol and therefore attempts are being made to reduce greenhouse emissions. To achieve this, an agricultural emissions research levy was proposed, which promptly became known as a "fart tax" or "flatulence tax". It encountered opposition from farmers, farming lobby groups and opposition politicians

https://en.wikipedia.org/wiki/Flatulence

I'd be crying, if I wasn't laughing so hard.


----------



## dorsetknob (Nov 6, 2015)

lilhasselhoffer said:


> 75 ppb is the official emissions level which cannot be exceeded.


I suspect every fart contains more than 75 parts per billion of Methane
and so for you Americans i fully expect the EPA to prosecute  (on behalf of Obunnya) every american for Air Pollution
lets just call this a living Tax to help clear the Deficit


----------



## HumanSmoke (Nov 6, 2015)

lilhasselhoffer said:


> Good god.  Your jokes are surprisingly close to the truth:
> http://www3.epa.gov/airquality/sulfurdioxide/
> 
> 75 ppb is the official emissions level which cannot be exceeded.  This is why the EPA is a good idea, but so poorly implemented as to be a joke.
> ...


Yep, this country has had love/hate relationship with its primary producers for years. 
Anyhow, I think a discussion on farts makes more sense than whether Mr. Dickey thinks a CPU core has to intrinsically execute one floating point operation per cycle. Can't say I've ever seen that as a prerequisite of a CPU core....which is why I haven't been taking this thread at all seriously.


----------



## Steevo (Nov 6, 2015)

So some jackass thinks he will win a technical lawsuit against the worlds second largest (see what I did there?) X86-64 CPU producer for his own feels?


That's a bold move cotton, lets see how it plays out!


----------



## FordGT90Concept (Nov 6, 2015)

lilhasselhoffer said:


> The plaintiff accuses AMD of removing a critical component, yet demonstrably it is not critical.


It is critical to differentiate between uni-core and dual-core.  Bulldozer is the only processor I know of that shares compute resources among cores.  The rest generally only share memory.



lilhasselhoffer said:


> They LIED to customers: http://www.zacks.com/stock/news/153085/intel-settles-pentium-4-lawsuit-by-paying-15-to-customers  AMD didn't lie.


How is Bulldozer "8-core" processors not lying?  They're quad cores, with SMT, and an extra integer cluster.  Literally the only difference between HyperThreading and Bulldozer is the addition of extra integer cluster.  We don't call any processors that feature SMT by the thread count so why does Bulldozer get a pass?

Picture John Doe walking into [insert computer store here] and tells the clerk I want an 8-core processor.  The clerk hooks John Doe up with a Bulldozer.  He gets home and starts encoding videos on it.  He quickly discovers it is no faster than his old Phenom II X6 1055T and starts looking for the reason.  He stumbles upon threads like this, block diagrams of Bulldozer, reviews saying Bulldozer underperforms, benchmarks proving the poor performance, and--most importantly--he discovers Intel Core i7-5960X which thoroughly trounces his Bulldozer "8-core."  How does John Doe not feel that he was mislead by the clerk, whom was mislead by AMD calling their processors "8-core?"



lilhasselhoffer said:


> They may have been optimistic to think that changing the architecture around would allow performance to universally be better, but they released benchmarks which were confirmed by outside sources.


AMD was thinking DirectCompute would negate the need for FPUs.   AMD had a sense of euphoria after buying out ATI thinking that it will drastically change how computing is done.  They couldn't have had it more wrong.



lilhasselhoffer said:


> On one hand the plaintiff is smart enough to know what components a core entails, based upon the CPU architecture.


Plaintiffs don't walk into lawsuits not doing their research.  There is plenty of failure analyses all over the internet explaining why Bulldozer is a steaming pile of shit.

I'm not going to discuss (rather, attack) the plaintiff.  Like I said, there is merit to the complaint and I'm shocked it wasn't done much sooner.




Pill Monster said:


> Bulldozer does have 8 cores but only 4 L2 cache chips so they are arranged in pairs (4 modules).


There's only 4 L2 caches because there is only 4 cores. The two threads running on the same core require access to all of the L2 because the required data can exist anywhere in there.



Pill Monster said:


> They are real cores btw, HT is purely logical.


HT has hardware just like Bulldozer.  The only major difference between HD and Bulldozer is AMD added some hardware to the SMT implementation so that integer performance does not suffer.  It really shows their lack of knowledge of SMT; hence the horrible implementation.  Jim Keller, whom knows a thing or three about SMT came in to set AMD straight with Zen.  More cores with SMT is better than cores with SMT that has extra hardware attached.




Steevo said:


> So some jackass thinks he will win a technical lawsuit against the worlds second largest (see what I did there?) X86-64 CPU producer for his own feels?
> 
> 
> That's a bold move cotton, lets see how it plays out!


Again, I cite the hard drive lawsuit.  Seagate had the technical win (correct use of "GB") but still lost on the surface (Windows doesn't show what Seagate claims).  I think Dickey has the technical (>50% of the core is shared between two threads causing bottlenecks) and surface win (nothing suggests Bulldozer "8-core" is really an 8-core).

AMD will try to use "8 integer cores" as a defense.  It won't stick because outside of technical documents, "integer" is left out.


----------



## RealNeil (Nov 6, 2015)

dorsetknob said:


> Lawsuit in America =  I Farted now i'm being sued in a class Action Because i polluted breathing Air





HumanSmoke said:


> Unless you own up to it, you'll probably get taken to court by the EPA for violation of hazardous natural gas disposal.



Or they'll make you walk around with a tiny flame next to your ass to burn it off at once.


----------



## Steevo (Nov 6, 2015)

FordGT90Concept said:


> It is critical to differentiate between uni-core and dual-core.  Bulldozer is the only processor I know of that shares compute resources among cores.  The rest generally only share memory.
> 
> 
> How is Bulldozer "8-core" processors not lying?  They're quad cores, with SMT, and an extra integer cluster.  Literally the only difference between HyperThreading and Bulldozer is the addition of extra integer cluster.  We don't call any processors that feature SMT by the thread count so why does Bulldozer get a pass?
> ...




WIndows sees 8 cores, and launches 8 hardware accessible threads, so by the same token that Seagate lost......


----------



## FordGT90Concept (Nov 6, 2015)

They're "logical processors," not "cores."  Windows, as far as I know, only uses "core" in relation to processor power management...and that setting is hidden via the system registry.


----------



## the54thvoid (Nov 6, 2015)

FordGT90Concept said:


> They're "logical processors," not "cores."  Windows, as far as I know, only uses "core" in relation to processor power management...and that setting is hidden via the system registry.



In the end it doesn't matter what any of us here say.

AMD say this:

There are two independent integer cores on a single Bulldozer module

Plaintiff says:

Not

I'm outta here.


----------



## Uplink10 (Nov 6, 2015)

the54thvoid said:


> AMD say this:
> 
> There are two independent integer cores on a single Bulldozer module





FordGT90Concept said:


> There's only one question here and it is this: are the two logical cores of the Bulldozer design separable? If yes, they are "cores." If not, the combined unit is a "core" with SMT.





Uplink10 said:


> If it is possible to disable 7/5/3/1 core(s) in a firmware then he should lose but if it is not then he should win.


----------



## R-T-B (Nov 6, 2015)

> If it is possible to disable 7/5/3/1 core(s) in a firmware then he should lose but if it is not then he should win.



That's a good point.  I'll agree with that conclusion.


----------



## Pill Monster (Nov 6, 2015)

FordGT90Concept said:


> There's only 4 L2 caches because there is only 4 cores. The two threads running on the same core require access to all of the L2 because the required data can exist anywhere in there.
> 
> 
> HT has hardware just like Bulldozer.  The only major difference between HD and Bulldozer is AMD added some hardware to the SMT implementation so that integer performance does not suffer.  It really shows their lack of knowledge of SMT; hence the horrible implementation.  Jim Keller, whom knows a thing or three about SMT came in to set AMD straight with Zen.  More cores with SMT is better than cores with SMT that has extra hardware attached.
> t.


Nah man there are 8 cores, 8 integer cores if want to break it down, but 8 physical units which can execute 8 threads similtaneeously.
HT isn't hardware multithreading HT is based around the OS scheduler which can schedule 2 threads to one core using spare cycles, something like that.. I don't knopw the exact science but it's done in software anyway..


The main differnce between BD and Thurban is Thurban has 6 dedicated L2 cache banks each with 1 fetch/decode unit, that's basically it. L2/L1 ram is freaking expensive, plus it takes up room on the die.

Btw the 2 way shared L2 is one of Piledrivers biggest handicaps, if not the biggest.  Round trip time between CPU and L3 is super slow,  about double that of Thurban or Deneb.

Piledriver has around 27ns L3 latencey, Phenom II is what, 8ns or something? I could check.....
Hence overclocking L3 does nothing to improve performance as it did on Phenom II..

If you put an X6 and an 8320 head to head I imagine the X6 will rape the 8320, as you pointed out earlier, but only up to 6 threads. After that PD is gonna pull ahead.

I wouldn't be surprised if x264 wasn't running 8 threads...? Did they say how many?
But  hey that was 4-5 years ago, now with everything mutithreaded......things have changed the archetecture was way ahead of it's time......

Also corect me uif I'm wrong but wasn't BD originally designed as a server  chip?




sorry about my spelling, spellcheck isn't working..


----------



## FordGT90Concept (Nov 7, 2015)

My processor (6700K) and the processor before it (920) can execute 8 threads simultaneously as well.  That's what SMT means, after all.

Thuban has 6 dedicated L2 caches because it is a legitimate 6-core processor.

The TechARP x264 benchmark comparing FX-8150 and 1055T would have ran 8 threads on the former and six threads on the latter.  FX-8150 did not pull ahead.  FX-8150 would likely do better in single threaded simply because of the 800 MHz clockspeed advantage.

Bulldozer was not ahead of its time considering the older Thuban architecture can best it in some scenarios and Intel bests it in most scenarios.

Bulldozer's design does look more like SPARC (huge ALU performance, little FPU performance) than a desktop CPU should.  Even so, Bulldozer isn't exactly competitive with comparative Xeons.


----------



## Xuper (Nov 7, 2015)

FordGT90Concept said:


> My processor (6700K) and the processor before it (920) can execute 8 threads simultaneously as well.  That's what SMT means, after all.
> 
> Thuban has 6 dedicated L2 caches because it is a legitimate 6-core processor.
> 
> ...



You try hard and will lose forever.True =/= Performance, You're using Intel's Definition that's why you say 4 core.
From anandtech :



			
				By anandtech said:
			
		

> Architecturally Bulldozer is a significant departure from anything we've ever seen before. We'll go into greater detail later on in this piece, but the building block in AMD's latest architecture is the Bulldozer module. Each module features two integer cores and a shared floating point core. FP hardware is larger and used less frequently in desktop (and server workloads), so AMD decided to share it between every two cores rather than offer a 1:1 ratio between int/fp cores on Bulldozer. * AMD advertises Bulldozer based FX parts based on the number of integer cores*. Thus a two module Bulldozer CPU, has four integer cores (and 2 FP cores) and is thus sold as a quad-core CPU. A four module Bulldozer part with eight integer cores is called an eight-core CPU. There are obvious implications from a performance standpoint, but we'll get to those shortly.


----------



## Pill Monster (Nov 7, 2015)

FordGT90Concept said:


> My processor (6700K) and the processor before it (920) can execute 8 threads simultaneously as well.  That's what SMT means, after all.
> 
> Thuban has 6 dedicated L2 caches because it is a legitimate 6-core processor.
> 
> ...



HT is not SMT, your CPU multithreads only on 4 cores, the rest is what Windows sees.   That's why it's called Hyperthreading.  
Though  I guess thats another debate in itself.....

Ahead of it's time as in AMD gambled code was going be optimized for multicores but it didn't happen.  Ironically  Vishera has improves with age..... never thougfht I'd say that about a CPU...

Not looking forward to Fallout 4 chugging along at 45fps though  lol





Imo it boils down to 2 points,

a) what is a "core" exactly?

and

b) 8 threads/ 8 cores?  isn't that the same thing?.......Does it matter?


----------



## eidairaman1 (Nov 7, 2015)

the CPU is 8 cores, just some resources are shared between 2 cores


----------



## Fluffmeister (Nov 7, 2015)

I don't see what the fuss is all about, it's not like Fiji wasn't an "overclocker's dream".


----------



## Aquinus (Nov 7, 2015)

Jeez people, lets use some fucking block diagrams here and explain exactly what arguments exist for there being and not being real "cores".






One instruction/data fetch unit per two Integer cores and one floating point core (with two 128-bit FMAC units.)
I personally don't buy this one, mainly because how much can be fetched per cycle can vary depending on the CPU. It's possible that it's a limiting factor but I doubt it.

Decoder, the initial Bulldozer had only one uOp decoder per module.
There could be some argument here as AMD went from the Phenom II being able to decode 3 per cycle as opposed to Bulldozer which could only do 4 *per module*. AMD revised this in Steamroller (as they should have, damn it!) and a reasonable performance gain came out of it. Obviously nothing to truly counteract the size of the pipeline.

Dedicated schedulers and and L1. Nothing "shared" about that.
Floating point unit... We've discussed this and I still think that it's laughable that this and only this can be a measure of how many "cores" a CPU has.
L2 cache. Lets remember that the Core 2 Duo had a shared L2. Not going to go further into that one.
In all seriousness, consider Xen coming up. A lot of hardware with the SMT integer core (which is bigger than your run of the mill integer core in a bulldozer module,) is still shared with the FPU. The flaw with Bulldozer through excavator is the length of the pipeline, the IPC is a dead simple indicator for this. It's the very reason why Intel moved to a 14-stage architecture from a >30 stage one with Netburst, just as AMD's current lineup is now.

I want to say this again, Bulldozer doesn't suck because of shared resources, it sucks because the pipeline is too damn long.

The problem is when a hazard is encountered, it is much harder for a longer pipeline to recover from the stall it generates and as a result, IPC suffers and higher clock speeds are required to overcome it, (just like the Pentium 4.)

Stop whining about the FPU and focus on the damn pipeline.


----------



## HumanSmoke (Nov 7, 2015)

behrouz said:


> You try hard and will lose forever.True =/= Performance, You're using Intel's Definition that's why you say 4 core.
> From anandtech :.......*AMD advertises Bulldozer based FX parts based on the number of integer cores*


I don't agree with the whole lawsuit thing - as should be readily apparent from the tone of my previous posts, but your argument in this instance is a reach. While AMD added a ton of footnotes to the PPS slide deck, the retail selling mentions nothing of shared resources. I just dug out(and ripped apart in the name of barely interested fact finding) an old BD retail box ( I don't use them but I've built a few systems with them), and nowhere is any mention of integer cores or FPU contingency






You might also have a hard time picking up the same information from any ODM/OEM's advertising


Spoiler













Aquinus said:


> I want to say this again, Bulldozer doesn't suck because of shared resources, it sucks because the pipeline is too damn long.


That was brought up at the time. The branch misprediction penalty effectively scuppered the architecture.


----------



## FordGT90Concept (Nov 7, 2015)

behrouz said:


> From anandtech :


AMD says...so they can rope a dope...which is why AMD is getting sued.



Pill Monster said:


> HT is not SMT


Yes, it is.  Two threads on one core at the same time is the very definition of symmetrical multithreading.



Pill Monster said:


> Ahead of it's time as in AMD gambled code was going be optimized for multicores but it didn't happen.  Ironically  Vishera has improves with age..... never thougfht I'd say that about a CPU...


AMD knew it wouldn't be but went ahead with it anyway.  



Pill Monster said:


> a) what is a "core" exactly?


Your typical x86-64 core has a instruction decoder, some type of scheduler/dispatcher, ALUs, FPUs, L1 cache for instructions and data, and an L2 cache.  Under that may or may not lie an L3 cache, memory controller, and some kind of interface to communicate with the rest of the system.  In short, the core can function in its entirety on its own (excluding the "under that" portion anyway).



Pill Monster said:


> 8 threads/ 8 cores?  isn't that the same thing?.......Does it matter?


No, 8 symmetrical threads indicates 8 logical processors.  If this was a SPARC T5, that would mean there's only one underlying core processing it.  In the case of Core i7-6700K, that means quad-core with Hyper-Threading enabled.  Both scenarios get you 8 logical processors but with widely different technology under it.  Bulldozer has 8 logical processors but, using the definition above, it only contains four cores with symmetrical multithreading.  The L1 instruction cache, instruction decoder, dispatcher, FPU and L2 cache in each core handle all threads that pass through the core.

It matters because when AMD markets something as having "8 cores" when it largely has the guts of a 4 core, performance underwhelms.  Intel i7-5960X is an example of 8-core processor.  Compare the relative performance of that compared to FX-8###.  They are leagues apart--especially in multithreading.  It's like comparing a 4790K (4 cores) to 5960X (8 cores) in heavily multithreaded benchmarks: 5960X runs away.  There's a huge difference between an actual 8-core processor compared to a 4 core with SMT.



Aquinus said:


> ...


The suit is about misleading consumers.  Those of us in the know see through AMD's BS (divide core count by two) but those not in the know have to learn the hard way.

By the way, this slide better shows how much is shared:




Pretty much everything except the actual number crunching.



HumanSmoke said:


>


I actually have an FX-6300 box right next to me.  You're missing the sticker (can see the residue) which is the only place that says what is in the box.


> FX 6300
> AMD FX 6-core.


OEMs are repeating AMD's lie.



Zen omits the nonsensical second "integer core" as well as fattening up the ALUs and FPUs:


----------



## HumanSmoke (Nov 7, 2015)

FordGT90Concept said:


> By the way, this slide better shows how much is shared....


This one might offer a better indication of workflow







FordGT90Concept said:


> I actually have an FX-6300 box right next to me.  You're missing the sticker (can see the residue) which is the only place that says what is in the box.


The sticker on mine was still attached to the top of the box - just mentions the SKU, cache, frequency, serial and part numbers, socket, and QR code. Nothing earth shattering in the way of architectural revelations.


----------



## Roph (Nov 7, 2015)

R-T-B said:


> You however don't have to look any farther than the modern era of arm cores and even some server archictecutres (SPARC comes to mind) to find modern "octacore" CPUs with only one FPU.
> 
> This is nonsense.  It's a core if it can math at all.  Heck earlier CPUs lacked a MULTIPLICATION FUNCTION.  They were still considered a core.



Pretty much this - I can run 8 processes doing integer based tasks (even some FP too) completely independently of each other on an FX-8xxx. There's definitely more independence. I sometimes play a game while streaming it while also encoding movies in the background 

*FordGT90Concept*, you have a huge chip on your shoulder.


----------



## FordGT90Concept (Nov 7, 2015)

HumanSmoke said:


> This one might offer a better indication of workflow


I posted that one several pages back.  Yes, it's probably the best.

Related note: block diagrams for Intel processors seem scarce. 




HumanSmoke said:


> The sticker on mine was still attached to the top of the box - just mentions the SKU, cache, frequency, serial and part numbers, socket, and QR code. Nothing earth shattering in the way of architectural revelations.


Top has...
-QR code, fancy AMD logo that changes under light, and AMD logo
-Model number
-*Model description*
-Clockspeed, # MB Total Cache
-Serial # Barcode
-Serial # Text

---tear to open the box----

Back has...
-Black Edition
-Socket AM3+, Includes Heat Sink Fan
-Part #
-UPC barcode
-UPC printed


----------



## Pill Monster (Nov 7, 2015)

FordGT90Concept said:


> AMD says...so they can rope a dope...which is why AMD is getting sued.
> 
> 
> Yes, it is.  Two threads on one core at the same time is the very definition of symmetrical multithreading.


No, it isn't.  And nowhere on the page you linked to does it mention SMT. (Similtaneous btw, not Symmetrical.)

Intel simply state mutiple threads can run on one core, and not similtaneously.












I'm with Aquinous in so far as this debate is getting a little repetitious.


Much ado about nothing imo...


----------



## R-T-B (Nov 7, 2015)

Pill Monster said:


> No, it isn't.  And nowhere on the page you linked to does it mention SMT. (Similtaneous btw, not Symmetrical.)
> 
> Intel simply state mutiple threads can run on one core, and not similtaneously.



During the same clock cycle, which might as well be simultaneously in practical terms.


----------



## lilhasselhoffer (Nov 7, 2015)

@FordGT90Concept, I'm going to make this clear.  Your argument is stupid, because to make it you have to find a person who is both a genius and an idiot.

If you are to argue that performance figures are what the plaintiff is using (which as per my earlier links, they are), then you've got to argue against some standard.  Intel is not a direct competitor, and thus isn't a standard.  If you're arguing Thuban as a comparison, then you've got to explain monetary discrepancy and an architecture change.  Neither of these things is grounds for a lie, or Netburst should have had two lawsuits filed against it.  As AMD published information well in advance of the release of Bulldozer, there is no reasonable assertion that they lied about the core count.  Heck, I could make my own CPU, wherein each processor is single bit and have a 20 core processor.  To argue that AMD lied about core count, when they previously clearly defined what a core was, is to acquiesce to being a moron.  I don't think the lawyers are that stupid, because the case would be immediately dismissed by the judge.

If you argue that AMD lied, then prove it.  They didn't release factually wrong benchmarks, they just cherry picked the best results.  That's been considered fair game for decades.

If your argument is that the removed components are necessary, you need to be an idiot savant.  You have to completely understand processor architecture, have future knowledge about how coding will use what you are developing, and you have to be so moronic as to not read the technical information put out by the company releasing the product (per the 2009 Anandtech article).  Find me that idiot savant, and I'll find you the person who can single hand design the successor to Zen.



FordGT90Concept said:


> ...
> Picture John Doe walking into [insert computer store here] and tells the clerk I want an 8-core processor.  The clerk hooks John Doe up with a Bulldozer.  He gets home and starts encoding videos on it.  He quickly discovers it is no faster than his old Phenom II X6 1055T and starts looking for the reason.  He stumbles upon threads like this, block diagrams of Bulldozer, reviews saying Bulldozer underperforms, benchmarks proving the poor performance, and--most importantly--he discovers Intel Core i7-5960X which thoroughly trounces his Bulldozer "8-core."  How does John Doe not feel that he was mislead by the clerk, whom was mislead by AMD calling their processors "8-core?"...



This type of idiot does not need to be protected by the legal system.  The clerk behind the counter is culpable for recommending that they buy a processor.  The consumer is responsible for not educating themselves on the purchase.  AMD has made the information they require to make an informed decision publicly available for literally years, yet they decided not to inform themselves.  Our legal system does not exist to help those with retarded mental processes; it exists to mete out reparations for those who have done things which the law forbids, to mete punishment for those which haven't done what the law requires, and most importantly determine when one is guilty of either of these things.


What you're arguing is that you feel bad.  I agree, I feel that the marketing was atrocious and misleading.  At the same time all of the relevant data was widely available, and AMD published their data well in advance of the Bulldozer launch.  Your argument for culpability on AMD's part is an argument made via emotion.  Your staunch defense of said points, despite ample proof that AMD never lied, exemplifies this denial.  Saying that you know it'll be thrown out, despite wanting it to happen, is asking for a massive waste of resources to no real end.



Let me be fair though.  Looking at @HumanSmoke's pictures, I can't find a single mention of core count.  I'm now looking at the box for an Intel processor (4790k).  That box proudly states "4 Cores / 8-Way Multitask Processing."  You've spent the better part of a page arguing out the core count crap, but haven't even tried to justify your point.  The core count isn't listed on the processor box of AMD.  The core count on Intel's box is only 4 (despite HT).  Neither of which define what a core is.  Neither of which promise a numeric performance level.  Most problematically, the Intel processor lists 4 cores despite having 8 logical cores with HT.  Neither of the companies have demonstrably lied on their packaging.  The only chance this suit has is if the judge decides to rule on the advertising material...Oh wait, they can't do that.  The FTC rules on fairness in advertising.

Sorry, but your entire argument is based upon the false premise that this is a fact and logic based argument.  It isn't.  This is some idiot trying to cash out because they think that everyone complaining about its performance on the internets just haven't decided to cash in yet.

That particular bit of anger comes courtesy of my distaste for the law firm handling this.  Seriously, if you do any research into them at all you'd see that they are the next incarnation of copyright trolls.  All of their cases are arguing about high end technologies, where no precedence is set, and their track records is...spotty.  Basically, like any slimy lawyer they are willing to sue anybody and represent anyone willing to cough up cash.  These are the kind of leeches who give lawyers a bad name, whenever public defenders (also a type of lawyer) do so much good that it isn't funny.  This is why our legal system is a joke, and it takes years just to get something to happen if you're wronged.

Should AMD be held accountable for misleading advertising; maybe.  Should this be in court; absolutely not.  Should this have been filed before this year; if it was actually in the public interest it should have been filed in 2011.  The argument that there exists any grounds for this, given the information presented by the plaintiff, is a joke.  You can argue technicalities all night, but the plaintiff must prove damages and lies (that's the point of innocent until proven guilty).  Every argument you make has a simple counter.  I think you're in the wrong here, because your heart is leading your head.


Edit:
Added quote and framed it.


----------



## HumanSmoke (Nov 7, 2015)

lilhasselhoffer said:


> Let me be fair though.  Looking at @HumanSmoke's pictures, I can't find a single mention of core count.  I'm now looking at the box for an Intel processor (4790k).  That box proudly states "4 Cores / 8-Way Multitask Processing."


Just to clarify, the packaging for all the processors is generic (as Ford mentioned), any relevant info is on the seal/sticker. The FX-8120 I have here states 8-Core (albeit the actual sticker is a bit munched up), and the info layout is the same as this.

All seems a little storm in a teacup to my way of thinking. If AMD had provided a white paper with every processor and emblazoned every package with a screed of info about shared resources and a sea of asterisks, I doubt it would have altered the buying habits of most people, any more than the GTX 970 kerfuffle seems to have deterred its uptake.


----------



## lilhasselhoffer (Nov 7, 2015)

HumanSmoke said:


> Just to clarify, the packaging for all the processors is generic (as Ford mentioned), any relevant info is on the seal/sticker. The FX-8120 I have here states 8-Core (albeit the actual sticker is a bit munched up), and the info layout is the same as this.
> 
> All seems a little storm in a teacup to my way of thinking. If AMD had provided a white paper with every processor and emblazoned every package with a screed of info about shared resources and a sea of asterisks, I doubt it would have altered the buying habits of most people, any more than the GTX 970 kerfuffle seems to have deterred its uptake.



Fair, but what I said isn't incorrect.  An SKU can have whatever information you'd like on it, and it isn't advertising.  It's a stock keeping unit.

Look at some of the barcodes out there, and tell me they're lying.  There's a wealth of them which shorten "assorted" to "ass," and the fun ensues.


Intel and AMD don't advertise their barcodes.  They label product with barcodes to identify it.  This suit is about advertising, not barcodes.


Edit:
For reference, the 4790 SKU and barcode have no mention of core count.  They only refer to frequency (max turbo of 4.0 GHz), cache (8MB), and socket.


----------



## Pill Monster (Nov 7, 2015)

Aquinus said:


> Jeez people, lets use some fucking block diagrams here and explain exactly what arguments exist for there being and not being real "cores".
> 
> 
> *I want to say this again, Bulldozer doesn't suck because of shared resources, it sucks because the pipeline is too damn long.*
> ...


Well actually it's both, shared resouces and pipeline.. depends on the workload.   

I'll leave it there .....had enough debates for today. 









OT anyone know why spell ckeck wouldn't be working in Fiirefox?


----------



## HumanSmoke (Nov 7, 2015)

lilhasselhoffer said:


> Fair, but what I said isn't incorrect.  An SKU can have whatever information you'd like on it, and it isn't advertising.  It's a stock keeping unit.
> Look at some of the barcodes out there, and tell me they're lying.  There's a wealth of them which shorten "assorted" to "ass," and the fun ensues.
> Intel and AMD don't advertise their barcodes.  They label product with barcodes to identify it.  This suit is about advertising, not barcodes.
> Edit:
> For reference, the 4790 SKU and barcode have no mention of core count.  They only refer to frequency (max turbo of 4.0 GHz), cache (8MB), and socket.


As I mentioned, I just added the information for clarification - not to contend any point.


----------



## Schmuckley (Nov 7, 2015)

I bet the guy wins,and rightfully so.
I don't like that it's kicking AMD when they're down but..they brought it on themselves.
Victim of Corporate shenanigans.


----------



## Sempron Guy (Nov 7, 2015)

I'll just let the die shot do the talking. The FX-8000 series has 4 modules, each module contains 2 cores with shared resource. You can see cores in the die shot so "physically" it's there. On the other hand, you can't see hyper threading on the die shot so it doesn't count as a core. You may argue on the definition of what a core is. Me on the other I'd stick on what can be seen.


----------



## uuuaaaaaa (Nov 7, 2015)

The main problem lies in what is the definition of a core?? I'll be surprised if AMD looses this case... AMD is a quite fragile position atm and what I am worried about is the bad publicity that this generates...


----------



## Xuper (Nov 7, 2015)

http://www.cpu-monkey.com/en/cpu-intel_core_i7_2600k-6
http://www.cpu-monkey.com/en/cpu-amd_fx_8350-7
http://www.cpu-monkey.com/en/cpu-intel_core_i7_5820k-440

Cinebench R11.5, 64bit (Multi-Core)
Intel core i7 2600k = 6.83
AMD FX-8350        = 6.94
Intel core i7 5820   = 11.05

-------------------------------------
Cinebench R11.5, 64bit (Single-Core)
Intel core i7 2600k = 1.66
AMD FX-8350        = 1.11
Intel core i7 5820  = 1.73
-------------------------------------
Multi thread/Single thread
Intel core i7 2600k = 4.1145 (6.83/1.66)
AMD FX-8350        = 6.2522 (6.94/1.11)
Intel core i7 5820   = 6.387 (11.05/1.73)

Multi thread doesn't mean scalar liner , but my calc shows that AMD FX-8350 acts as 8 Core with very poor IPC.If AMD's IPC was 1.66 , Number of Cinebench R11.5, 64bit (Multi-Core) would be 10.378 , almost 52% faster than Core i7 2600K.

Edit: 
AMD lawsuit over false Bulldozer chip marketing is bogus


----------



## BiggieShady (Nov 7, 2015)

FordGT90Concept said:


> block diagrams for Intel processors seem scarce.


Behold


 
from the xbitlabs article http://www.xbitlabs.com/articles/cpu/display/sandy-bridge-microarchitecture_3.html
Also important for intel architectures since nehalem is ring interconnect bus for l3 cache


 
from the same article http://www.xbitlabs.com/articles/cpu/display/sandy-bridge-microarchitecture_4.html


----------



## Aquinus (Nov 7, 2015)

Pill Monster said:


> Well actually it's both, shared resouces and pipeline.. depends on the workload.
> 
> I'll leave it there .....had enough debates for today.
> 
> ...


Yeah but that's not because the L2 is shared, that's because AMD sucks at making fast SRAM cache stores. The Core 2 chips had a shared L2 between two full cores and they didn't suck.


----------



## NC37 (Nov 7, 2015)

Well either way at the end of the day...AMD will still have the first consumer 8 core once Zen comes out. Unless Intel decides to jump on it too. They did finally bring in a 6 core. Can't expect Intel to sit quietly as AMD unleashes an 8 core 16 thread monster on them. 

Actually, I hope Intel does because right now I suspect AMD wouldn't price it competitively enough unless Intel had something for them to undercut.


----------



## FordGT90Concept (Nov 7, 2015)

Pill Monster said:


> Intel simply state mutiple threads can run on one core, and not similtaneously.


You underlined simultaneously on the screenshot.


@lilhasselhoffer: Most of that I covered already but I want to be very clear about something.  In AMD's slides, they always say "8 integer cores" (accurately describes the product) and everywhere that isn't engineering slide, they omit that important word "integer."  FX-8### prominently display "8-core" on the box, FX-6### prominently display "6-core" on the box, and FX-2### prominently displays "4-core" on the box.  That's an outright lie.  It doesn't have 8 cores; it has 4 cores with "8 integers cores."  AMD is going to get nailed for false advertising.  The plaintiff can easily make the argument that if everyone that bought it received half the processor they thought they were going to get, the other of plaintiff's charges fall into place:


Spoiler: legalese/definitions



Consumer Legal Remedies Act: 


> (a) The following unfair methods of competition and unfair or deceptive acts or practices undertaken by any person in a transaction intended to result or which results in the sale or lease of goods or services to any consumer are unlawful:
> ◦(8) Disparaging the goods, services, or business of another by *false or misleading representation of fact*.


California’s Unfair Competition Law


> 17200.  As used in this chapter, unfair competition shall mean and
> include any unlawful, unfair or fraudulent business act or practice
> and unfair, deceptive, untrue or misleading advertising and any act
> prohibited by Chapter 1 (commencing with Section 17500) of Part 3 of
> Division 7 of the Business and Professions Code.


Fraud:


> 1903. Negligent Misrepresentation
> 
> Tony Dickey claims he was harmed because AMD negligently misrepresented an important fact. To establish this claim, Tony Dickey must prove all of the following:
> 
> ...


breach of express warrant:


> (1)The warranty of fitness for a particular purpose


negligent misrepresentation: 


> A judgment that may be rendered in a contract misrepresentation case involving false statements that induced one party to enter into a contract. In negligent representation, the defendant is judged not to have known that the statements made were false, but not to have had reasonable grounds for believing they were true.


unjust enrichment:


> The retention of a benefit conferred by another, that is not intended as a gift and is not legally justifiable, without offering compensation, in circumstances where compensation is reasonably expected.





As I said before, I can't see AMD winning.  Their exclusion of the word "integer" is misleading to the point of being fraud and they did so knowing it wasn't an accurate statement.



BiggieShady said:


> Behold
> View attachment 69049
> from the xbitlabs article http://www.xbitlabs.com/articles/cpu/display/sandy-bridge-microarchitecture_3.html
> Also important for intel architectures since nehalem is ring interconnect bus for l3 cache
> ...


I posted that earlier.  I was talking more Haswell, Devil's Canyon, and Skylake.  I can only find pictures of old architects.




behrouz said:


> http://www.cpu-monkey.com/en/cpu-intel_core_i7_2600k-6
> http://www.cpu-monkey.com/en/cpu-amd_fx_8350-7
> http://www.cpu-monkey.com/en/cpu-intel_core_i7_5820k-440
> 
> ...


2600K (95W) is a quad core.  FX-8350 (125W) barely edges out  (1.6%! hardly noteworthy) Intel's competitive quad core in multithreading (it should be 70-90% faster if it were really a 8 core).  AMD's quad core falls way behind in single threaded performance.  Even going off your theoretical 52%, that's a lot closer to Intel's Hyper-Threading Technology (boosts 30% in some benchmarks) than an actual 8-core processor (70-90%).  AMD FX-8### is not an 8-core.  There's no empirical data to prove it.  It is a quad-core with a more advanced version of SMT.

And that link repeatedly proves my point: At no point does FX-8### look like an actual 8-core processor in benchmarks.  It looks like a quad-core with SMT.




NC37 said:


> Well either way at the end of the day...AMD will still have the first consumer 8 core once Zen comes out. Unless Intel decides to jump on it too. They did finally bring in a 6 core. Can't expect Intel to sit quietly as AMD unleashes an 8 core 16 thread monster on them.
> 
> Actually, I hope Intel does because right now I suspect AMD wouldn't price it competitively enough unless Intel had something for them to undercut.


What is 5960X?  And if you believe AMD's BS (which clearly you don't), AMD put out the first 8-core consumer CPU in 2011 (FX-8150).


----------



## Xuper (Nov 7, 2015)

FordGT90Concept said:


> 2600K (95W) is a quad core.  FX-8350 (125W) barely edges out  (1.6%! hardly noteworthy) Intel's competitive quad core in multithreading (it should be 70-90% faster if it were really a 8 core).  AMD's quad core falls way behind in single threaded performance.  Even going off your theoretical 52%, that's a lot closer to Intel's Hyper-Threading Technology (boosts 30% in some benchmarks) than an actual 8-core processor (70-90%).  AMD FX-8### is not an 8-core.  There's no empirical data to prove it.  *It is a quad-core with a more advanced version of SMT.*
> 
> And that link repeatedly proves my point: At no point does FX-8### look like an actual 8-core processor in benchmarks.  It looks like a quad-core with SMT.



Nope , It's not 4 core with SMT.I can't Call it as advanced SMT.whether you try or not , you can not apply it as SMT.SMT is different story.there is no word "advanced version of SMT" in CPU World.It's you that define base on definition of a core from the aspects of Intel's processors.


----------



## FordGT90Concept (Nov 7, 2015)

I prefer the term "hybridized simultaneous multithreading."  Instead of the two threads being funneled into one pipeline, they generally stay in their own pipelines.  The pipelines are inseparable; however, which makes the entire package a core.

A core, in the context of CPUs and GPUs, usually refers to a complete computation unit that exists more than once in multiprocessor designs--each individually programmable with discreet outputs.  Bulldozer "module" fits that definition, not "integer core."


----------



## Xuper (Nov 7, 2015)

You say it because you compare it to Intel Core i7-5960X Haswell-E 8-Core, Because you think if it was 8 core , at least this should match Intel core i7 5960x or close.on other hand your reference is Intel.even Phenom II 1070 is worse than Core i3-4370.
Think that Intel is dead! Think that Intel went to God and lives along with our great grandfather and watches us.All your defined are base on Intel.I bet if Bulldozer's Performance was near Intel 5960x you wouldn't bring this flame war into this thread.



FordGT90Concept said:


> It doesn't have 8 cores; it has 4 cores with "8 integers cores."


You can have 8 Int+ 4 FPU 256 or 8 Int + 8 FPU 128.


----------



## VulkanBros (Nov 7, 2015)

So that means that what the OS reporting is wrong? It clearly says Cores: 4,  Logical processors: 8


----------



## rtwjunkie (Nov 7, 2015)

moproblems99 said:


> Were you this mad when nvidia put 4GB of memory on the 970 but only 3.5GB were useful?



Why would he be?  Look at his GPU in System Specs.  But yes, as a general principle he took issue with that too.

I just do NOT understand why people assume and imply fanboyism just because they see someone making an argument against a product in a brand name?  It's crazy!


----------



## FordGT90Concept (Nov 7, 2015)

behrouz said:


> I bet if Bulldozer's Performance was near Intel 5960x you wouldn't bring this flame war into this thread.


If an 8 core Bulldozer behaved like an 8 core Bulldozer, I'd be laughing at Dickey.



VulkanBros said:


> So that means that what the OS reporting is wrong? It clearly says Cores: 4,  Logical processors: 8


Windows says what it sees and that is absolutely correct.  It's only AMD that is pulling everyone's leg.


Took some digging but finally found a Phenom (K10) block diagram and die shot:
http://www.tomshardware.com/reviews/spider-weaves-web,1728-2.html


----------



## Pill Monster (Nov 7, 2015)

FordGT90Concept said:


> You underlined simultaneously on the screenshot.
> 
> 
> .


Are you taking the piss or what?  It says runs applications simultaneously. Not related to SMT in any way shape or form.


----------



## Xuper (Nov 7, 2015)

I have Cpu that doesn't have FPU unit.how many core does it have ? you made FPU unit as reference.



VulkanBros said:


> So that means that what the OS reporting is wrong? It clearly says Cores: 4,  Logical processors: 8



Why does windows say "AMD FX(tm)-9590 Eight-Core Processor" ? Windows 10 should say 4 Module not 4 Core.


----------



## FordGT90Concept (Nov 7, 2015)

Pill Monster said:


> Are you taking the piss or what?  It says runs applications simultaneously. Not related to SMT in any way shape or form.


*sigh* Virtually every article written about SMT mentions Intel Hyper-Threading Technology by name.

Here's three scholarly articles:
http://meseec.ce.rit.edu/eecc722-fall2012/722-9-3-2012.pdf page 2
http://www.d.umn.edu/~salu0005/smt.pdf page 21 (lower right corner)
http://www.cs.washington.edu/research/smt/index.html

Arstechnica (cached): http://webcache.googleusercontent.c.../10/hyperthreading/+&cd=1&hl=en&ct=clnk&gl=us



behrouz said:


> I have Cpu that doesn't have FPU unit.how many core does it have ? you made FPU unit as reference.


What processor do you have?  If it was made after 1995, it most likely does have a dedicated FPU in each core (excluding Bulldozer's definition of "core," of course).




behrouz said:


> Why does windows say "AMD FX(tm)-9590 Eight-Core Processor" ? Windows 10 should say 4 Module not 4 Core.


What's the difference besides AMD's marketing?  It is disengenious on three fronts: calling integer clusters "cores," calling two integer clusters and an FPU a "module" when it is really a core, and calling the core a "module" when it is not modular (certainly no more modular than every other core out there).


----------



## Uplink10 (Nov 7, 2015)

VulkanBros said:


> So that means that what the OS reporting is wrong? It clearly says Cores: 4, Logical processors: 8


I am a bit skeptical about Windows. I have 1 TB drive and it shows I have 931.51 *Giga*bytes when it is actually 931.51 *Gibi*bytes.


----------



## Xuper (Nov 7, 2015)

LOL! he believes Module = Core.Good luck.

You don't get what I say,I want to tell you : You made FPU unit as Reference that's why you said 4 Core With 8 Int Core.Core can be different base on different architecture.there is no defined standard , Not even close to a commonly accepted standard.base on my CPU's architecture, I can define CPU as 4 Core that contains 4 Int Unit with just one FPU unit that is capable of running 4 FPU Thread.
You're trying hard.


----------



## Pill Monster (Nov 7, 2015)

FordGT90Concept said:


> *sigh* Virtually every article written about SMT mentions Intel Hyper-Threading Technology by name.
> 
> Here's three scholarly articles:


A scholarly article on SMT mentioned Intel. So what?  Could u be any more vague. 

And how is this related to you associating SMT with mulititasking in Windows as in tyour last post.


Have u got something specific to point out because it seems like a strawman escape from the battle. 
Tbh I thought better of you.......


----------



## FordGT90Concept (Nov 7, 2015)

behrouz said:


> I can define CPU as 4 Core that contains 4 Int Unit with just one FPU unit that is capable of running 4 FPU Thread.


Negative.  Cores are complete compute units.  It would be classified as a single core with 4 threads per core by anyone that isn't AMD.

@Pill Monster: Since you clearly don't like scholarly articles, try Wikipedia on for size: https://en.wikipedia.org/wiki/Simultaneous_multithreading
All of the above explain SMT in detail.  Some describe Hyper-Threading in detail.


----------



## Xuper (Nov 7, 2015)

Nope.You're wrong.UltraSPARC T1 has 8 core with one FPU Unit.you can't define core by your logic.like I said there is no defined standard.

https://en.wikipedia.org/wiki/UltraSPARC_T1


----------



## FordGT90Concept (Nov 7, 2015)

The design of UltraSPARC T1 is completely different from Bulldozer.  Namely, the FPU isn't directly attached/associated with any core.  It's more akin to the FPU co-processors from circa-1990.  Everything that isn't an integer, it outsources to the FPU via the processor crossbar.  There is no sharing of resources inside each core besides cache.

Block diagram of core (note each core accepts 4 threads; not SMT, it only works on one thread at once but rapidly switches between them):





Processor layout:





Do realize that SPARC processors are specifically engineered for databases.  It was discussed previously in this thread.

UltraSPARC T1 is a true 8 core, 32 thread processor.

Edit: JBUS...HA!


----------



## Xuper (Nov 7, 2015)

Whether you like or not , We talk about Core definition.like I said i can define core myself base on my architecture.I can say that The design of Bulldozer is completely different from Intel Haswell.there is no Rule that Core should have a dedicated FPU or a Shared FPU or at least one FP instruction per cycle.AMD never said Bulldozer have 8 FP cores!
*8 Core =! 8 FPU*
Period.


----------



## Aquinus (Nov 7, 2015)

behrouz said:


> Whether you like or not , We talk about define of Core.like I said i can define core myself base on my architecture.I can say that The design of Bulldozer is completely different from Intel Haswell.there is no Rule that Core should have a dedicated FPU or a Shared FPU or at least one FP instruction per cycle.AMD never said Bulldozer have 8 FP cores!
> *8 Core =! 8 FPU*
> Period.


An x86 CPU isn't really an x86 CPU without integer cores. If there is dedicated hardware for driving an integer core, the I would call that a core. The fact that floating point math can be written in software to be done on an integer core (and is on embedded applications that lack FPUs,) is reason enough (for me,) to say that the shared FPU is not a significant enough factor to exclude a "core" designation. Between that and the fact that how well using 4 threads versus 8 threads on an FX CPU scales versus on an i7 shows very clearly how they're real cores. HT will never give you near linear scaling where FX cores do (for purely parallel workloads.)

If 4 FPUs isn't enough for you then, GPGPU probably could be your friend.


----------



## FordGT90Concept (Nov 7, 2015)

FPU isn't the only component shared.  The entire instruction decoder and associated L1 cache covers FPU and both integer clusters.  The only thing that makes Bulldozer unique is the fact it has two integer clusters instead of one big one.  The whole cohesive unit is still a core.


HT better utilizes existing hardware.  It doesn't add much hardware to accomplish that.  Bulldozer, on the other hand, added a lot of hardware to accelerate SMT.  This is why Bulldozer benefits more from heavy multithreaded load but you're still better off having an actual 8 core (or even a 6 core, as Phenom II X6 demonstrates).


----------



## bobjr94 (Nov 7, 2015)

By his logic, he is also going to need to sue companies like Microsoft, windows says it has 8 cores, falsely reporting core count. So does cpu-z.


----------



## FordGT90Concept (Nov 7, 2015)

Windows 8 and newer says FX-8### and FX-9### has 4 cores, 8 logic processors.  Microsoft is not falsely reporting core count; AMD is on their retail packaging.


----------



## Aquinus (Nov 7, 2015)

FordGT90Concept said:


> FPU isn't the only component shared. The entire instruction decoder and associated L1 cache covers FPU and both integer clusters. The only thing that makes Bulldozer unique is the fact it has two integer clusters instead of one big one. The whole cohesive unit is still a core.


The only shared components are the fetch/decode units, L1 instruction cache, L2 cache, and FPU.

Come piledriver, AMD went from a 4-way decoder to two 2-way decoders which both either server up one of the integer cores or the floating point unit, which leaves the fetch unit, L1i, L2, and the FPU.

The Core 2 had a shared L2 cache and it is considered to have two cores, so I consider the L2 argument moot, which leaves the fetch unit, the L1i, and the FPU.

The fetch unit, testing seems to indicate that it is not a bottleneck and that improving it won't yield much tangible benefits:


> Agner’s tests, however, may shed some light on the problem. According to his work, the fetch units on Bulldozer, Piledriver, and Steamroller, despite being theoretically capable of handling up to 32 bytes (16 bytes per core) tops out in real-world tests at 21 bytes per clock. This implies that doubling the decode units _couldn’t_ help much — not if the problem is farther up the line. Steamroller does implement some features, like a very small loop buffer, that help take pressure off the decode stages by storing very small previously decoded loops (up to 40 micro-instructions), but the fact that doubling up on decoder stages only modestly improved overall performance implies that significant bottlenecks still exist.


source

So that would leave the L1i and the FPU. The FPU is undoubtably shared, not denying that and the L1i cache is shared because it makes sense when the fetch units are also shared. So that leaves just L1i and FPU for shared components that may make a difference.

What blows my mind is that people forget that AMD went from the Phenom II being able to execute 3 integer operations per clock cycle to two on the current architecture, which could have some serious implications for purely integer code. However, I think the source I provided earlier seems to sum it up best:



> According to Agner, ” Two of the pipes have all the integer execution units while the other two pipes are used only for memory read instructions and address generation (not LEA), and on some models for simple register moves. This means that the processor can execute only two integer ALU instructions per clock cycle, where previous models can execute three. This is a serious bottleneck for pure integer code. The single-core throughput for integer code can actually be doubled by doing half of the instructions in vector registers, even if only one element of each vector is used.”
> 
> This has been the case since Bulldozer debuted — but issues here could explain why integer performance on Steamroller is so low compared to other cores. *This is where things become frustratingly opaque — each of the areas we’ve identified could be the principle bottleneck — or it’s possible that the bottleneck is a combination of multiple factors (long pipelines, low fetch, cache collisions and low integer performance)*.



I'm not disagreeing that Bulldozer's performance sucks, that's why I got my 3820 but, I'm not convinced that it's the shared components but rather *skimpy dedicated components* that could be impacting performance. Xen, having a beefier integer core, very well might make up for the shortcomings of the dedicated components in these CPUs.

That's my only point. There is nothing to stop the dedicated hardware from being the bottleneck, even more so if they chopped it down to fit two of any given component in.

With that all said, I still think the really long pipeline is probably the main issue.


----------



## FordGT90Concept (Nov 7, 2015)

The lawsuit is about false advertising...


			
				LegalNewsLine said:
			
		

> In claiming that its new Bulldozer CPU had “8-cores,” which means it can perform eight calculations simultaneously, *AMD allegedly tricked consumers into buying its Bulldozer processors by overstating the number of cores contained in the chips*. Dickey alleges the Bulldozer chips functionally have only four cores—not eight, as advertised.


...with lacking performance used as evidence of being damaged...


			
				LegalNewsLine said:
			
		

> The suit alleges AMD built the Bulldozer processors by stripping away components from two cores and combining what was left to make a single “module.” In doing so, however, the cores no longer work independently. As a result, Dickey argues that *AMD’s Bulldozer CPUs suffer from material performance degradation, and cannot perform eight instructions simultaneously and independently* as claimed. He alleges that average consumers in the market for computer CPUs lack the requisite technical expertise to understand the design of AMD's processors and trust the company to convey accurate specifications regarding its CPUs. Because *AMD did not convey accurate specifications*, Dickey argues that tens of thousands of consumers have been misled into buying Bulldozer CPUs that *cannot perform the way a true eight-core CPU would*.


...everyone and their dog knows Bulldozer performance was underwhelming.  The lawsuit explicitly targets AMD marketing FX-8### and FX-9### as having twice the number of cores when the guts simply aren't there to make two complete cores.

Note: I would dispute the underlined statements.  There are circumstances where it can work on 8 threads simultaneously.


----------



## Pill Monster (Nov 7, 2015)

Aquinus said:


> Yeah but that's not because the L2 is shared, that's because AMD sucks at making fast SRAM cache stores. The Core 2 chips had a shared L2 between two full cores and they didn't suck.


Well admitidly I'm speculating somewhat there.    Though I will say the hotfix was relased to avoid a performance hit from L2 sharing in lightly threaded workloads.    Y


OS default is sequential assignment, not ideal for PD because under 4 threads they run on  0,1,2,3. which are the first 2 modules.  The scheduling was updated to 0,2,4,6, 1,3,5,7 so up to 4 would all have exlusive acess to fetch/decode L2 etc..


Man this spellcheck is pissing me off... anyway if there's any inrease in performance it's not noticble to me eihter way....   but I have noticed something in SuperPi.  SuperPi on one core is much faster than on 8, like about 100x faster.  
so there;s some food for thought..

I know AMD can't match Intel for latency or banwidtgh but wtf, BD/PD cache access is 4 times slower than Phenom or Athlon???  I looked at my old Phenom times, 5ms L3 access  with 2400mhz IMC.   
PD is around 30ms at 2800mhz wtf 0_o lol


Does anyone have a fix for spellcjeck not wotkinng?








But


----------



## HumanSmoke (Nov 7, 2015)

Pill Monster said:


> Does anyone have a fix for spellcjeck not wotkinng?


Secondary school English classes?

j/k


----------



## Serpent of Darkness (Nov 8, 2015)

VulkanBros said:


> So that means that what the OS reporting is wrong? It clearly says Cores: 4,  Logical processors: 8



If you right-click on the graph, left click and highlight "graphs to show >," and select Logical Processors, it might show 16 little graphs of "FordGT90Concept please insert what you want to call it in here because you feel strongly that AMD must burn in a fire for their fraudulent misrepresentation of cores" to further prove your point, Vulkan.  If it doesn't show 16 little graphs, then on an ironic side, it would support Ford's point.  Maybe? Doubt it.

Now that I think about it, looking at your performance tab, Vulkan, maybe AMD just markets it's CPUs as having X amount of Logical Processors, but they call it "Core Processors.  Intel market CPUs based on y-amount of Cores which are really 2 Logical Processors to 1 Core.  I think that's one part of where the problem occurs between FordGT90Concept and others having their disagreements.

If we go back to the original Topic of Discussion, the Lawsuit is probably dead on arrival.  I don't believe the court is going to award damages.  At best, it's a marketing blunder on AMD's end.


----------



## FordGT90Concept (Nov 8, 2015)

It'll show 8.  Logical Processors graphs always matches Logical Processors text.







Serpent of Darkness said:


> ...maybe AMD just markets it's CPUs as having X amount of Logical Processors...


And that's the problem.  Only AMD does that and only for the Bulldozer/Piledriver/Steamroller series of processors; hence, sued.


----------



## Blue-Knight (Nov 8, 2015)

WC6uvCqaSipxUfaXoMJihyM2xZE2mGkqtuqLhW949FaDJNXa27UxszBi6WebcMRu3yeNNevJXQxhoW8gbwWRasT1fU4CuE21R2


----------



## Pill Monster (Nov 8, 2015)

FordGT90Concept said:


> Negative.  Cores are complete compute units.  It would be classified as a single core with 4 threads per core by anyone that isn't AMD.
> 
> @Pill Monster: Since you clearly don't like scholarly articles, try Wikipedia on for size: https://en.wikipedia.org/wiki/Simultaneous_multithreading
> All of the above explain SMT in detail.  Some describe Hyper-Threading in detail.


I love scholarly articles.

What I don't like is the strawman approach of referencing materialto validate a point without specifying the actual content which supposedly validates the point
The strawman doesn't know if there is any supported evidence in the article, but he's hoping there is and other guy will find it and be disproven.   


I don't have time for this shit.


----------



## RealNeil (Nov 8, 2015)

FordGT90Concept said:


> It'll show 8.  Logical Processors graphs always matches Logical Processors text.
> 
> 
> 
> ...



So if you guys will,....help me to understand why this PC has eight Cores (little green boxes) that all go to work when I run the CPU-Z  multi-thread Benchmark? Is it eight,....or just a tricky four?

AMD FX-9590


----------



## Pill Monster (Nov 8, 2015)

2mins on Google and look what I found -  Intel whitepapers with a crystal clear explanation of HT, just for you.

"A single processor appear as 2 logical processors".  APPEAR.  How does SMT work when there is only one physical core?  Because it's  using software, obviously.

See where it says OS can SCHEDULE a thread.  Note SCHEDULE.  Not excecute. SCHEDULE.  

If u don't get that then u never will.


----------



## lilunxm12 (Nov 8, 2015)

Pill Monster said:


> 2mins on Google and look what I found -  Intel whitepapers with a crystal clear explanation of HT, just for you.
> 
> "A single processor appear as 2 logical processors".  APPEAR.  How does SMT work when there is only one physical core?  Because it's  using software, obviously.
> 
> ...



within the same document, 4 pages later, you can easily reach it by searching "5%" as keyword


			
				intel said:
			
		

> This implementation of Hyper-Threading Technology added less than 5% to the relative chip size and maximum power requirements, but can provide performance benefits much greater than that.


it doesn't explicitly say additional transistors but most likely there're some extra stuff within the chip compared with implementation without HT.
I don't think the addition hardware (if any) is for computing though. Can't be an additional core at all.


----------



## FordGT90Concept (Nov 8, 2015)

RealNeil said:


> So if you guys will,....help me to understand why this PC has eight Cores (little green boxes) that all go to work when I run the CPU-Z  multi-thread Benchmark? Is it eight,....or just a tricky four?
> 
> AMD FX-9590
> 
> View attachment 69065


Because FX-9590 has 8 logical processors and 4 cores (two-threads per core).  Windows 8 and newer shows the break down (sockets, cores, and logical processors) in Task Manager where Windows 7 and older do not.



Pill Monster said:


> "A single processor appear as 2 logical processors".


That's the nature of SMT.  It tells the OS to send it more threads.  I've seen 2- (e.g. Pentium 4 w/ HT through Core i7-6700K), 4- (e.g. UltraSPARC T1), and 8-way (e.g. SPARC T5) SMT implementations.



Pill Monster said:


> How does SMT work when there is only one physical core?


The scheduler in each core shuffles between the two threads filling in gaps in core resources better utilizing the hardware resources in the core.  As the name implies, the threads execute simultaneously like a dual core would but it lacks the hardware resources to get dual-core-like performance.  SMT (all versions)  is designed to improve throughput.

Bulldozer is unique in that they put more hardware resources in it to improve SMT performance which is a good thing; AMD's mistake was calling it an "8-core" when it clearly is not.



Pill Monster said:


> Because it's  using software, obviously.


There is zero software involved beyond the usual when dealing with a multiprocessor.



Pill Monster said:


> See where it says OS can SCHEDULE a thread.  Note SCHEDULE.  Not excecute. SCHEDULE.


All threads need to be scheduled before they can execute.  For HTT to be of any use, a thread has to be scheduled for two logical processors of the same core at the same time.



lilunxm12 said:


> within the same document, 4 pages later, you can easily reach it by searching "5%" as keyword
> 
> it doesn't explicitly say additional transistors but most likely there're some extra stuff within the chip compared with implementation without HT.
> I don't think the addition hardware (if any) is for computing though. Can't be an additional core at all.


Yeah, Hyperthreading performance, depending on task, can be something like -2.5% to 30%.  5% average sounds fair.

It does take some extra transistors (I don't think Intel ever said how many) to add SMT to a core but the goal is to utilize far more transistors that would otherwise not be used down the pipeline.


----------



## Roph (Nov 8, 2015)

The Bulldozer family groups pairs of cores into modules. Within these modules, some of the hardware is shared between the two cores. Performance problems can occur when both cores try to use this shared hardware heavily at the same time. This is similar, if less pronounced, to the per-thread performance penalty of running two threads on the same physical core using Intel's Hyperthreading.

In the early members of the family (Bulldozer and Piledriver), the instruction decoder (capable of decoding four instructions per cycle) is shared. It decodes instructions for one core each cycle, switching to the other core (if it's active) on the next cycle. In later members of the family (Steamroller and Excavator), a separate decoder is provided for each core, eliminating this bottleneck.

In all members of the family, the L1 I-cache and D-cache are shared. Since these caches are quite small (compared to Phenom II), this causes cache thrashing at a higher level when both cores are active than when only one is. The L1 caches are larger in Excavator than in previous members of the family, which contributes to its better efficiency.

The FPU is also shared in all members of the family. Most FPU instructions are multiplies or adds, so they use the FMAC pipelines, of which there are two per module. When both cores are running FPU-heavy code, effectively only one FMAC pipeline is available to each core. This is however no worse than in Phenom II, which had one multiplier and one adder in its FPU, in separate pipelines.







This is a diagram for one module, which has 2 cores. It has 2 integer units, 1 FPU, and shares an L2 cache. Conceptually, it is twice as fast at integer math in a thread, and half as fast in floating point math.

Since most server/rendering workloads are integer based, CMT scales well in multi-threading - AS LONG AS the threads are being run correctly on modules, and not split between multiple modules unnecessarily.

Windows 7 has an issue with how Bulldozer-based processors get processor threads scheduled. W7 treats them like fully independent cores, and will willy-nilly schedule threads wherever. This can cause tasks that otherwise should share FPU resources, to split across multiple modules and will cause performance degradation.

This was changed in Windows 8/8.1/10, by treating the processor as a 4 core, 8 thread chip (instead of 8 core, 8 thread) in order to properly schedule threads. On a high level, this actually emulates SMT (Hyperthreading) and results in a decent performance boost in W8/W10 for AMD processors.

There is a patch (manual install) for windows 7 that makes it schedule in the same manner, though doesn't change the appearance of task manager. You still see all 8 cores.

I don't know who pissed in your tea Ford. You also keep stating your opinion as fact.


----------



## lilunxm12 (Nov 8, 2015)

FordGT90Concept said:


> Because FX-9590 has 8 logical processors and 4 cores (two-threads per core).  Windows 8 and newer shows the break down (sockets, cores, and logical processors) in Task Manager where Windows 7 and older do not.
> 
> 
> That's the nature of SMT.  It tells the OS to send it more threads.  I've seen 2- (e.g. Pentium 4 w/ HT through Core i7-6700K), 4- (e.g. UltraSPARC T1), and 8-way (e.g. SPARC T5) SMT implementations.
> ...



in the document, intel claimed 5% additional die size. I personally interpret this as ~5% additional transistors.
The 5% figure here isn't for performance but I agree you claim for -2.5% ~ 30% performance increment though.


----------



## FordGT90Concept (Nov 8, 2015)

Roph said:


> This was changed in Windows 8/8.1/10, by treating the processor as a 4 core, 8 thread chip (instead of 8 core, 8 thread) in order to properly schedule threads. On a high level, this actually emulates SMT (Hyperthreading) and results in a decent performance boost in W8/W10 for AMD processors.


Got a source for that?  I get that Bulldozer has a different logical processor order (*0*, *1*, 2, 3, *4*, *5*, 6, 7) compared to HTT (*0*, 1, *2*, 3, *4*, 5, *6*, 7) but the underlined part doesn't make sense.

By the way, the fact the order matters is proof they aren't cores. Legitimate cores have no preference; operating systems can park and unpark them as much as it wants as well as schedule whatever it wants wherever it wants.



Roph said:


> I don't know who pissed in your tea Ford.


I don't drink tea. 



lilunxm12 said:


> in the document, intel claimed 5% additional die size. I personally interpret this as ~5% additional transistors.
> The 5% figure here isn't for performance but I agree you claim for -2.5% ~ 30% performance increment though.


My bad; thanks for the clarification.


----------



## Pill Monster (Nov 8, 2015)

lilunxm12 said:


> within the same document, 4 pages later, you can easily reach it by searching "5%" as keyword
> 
> it doesn't explicitly say additional transistors but most likely there're some extra stuff within the chip compared with implementation without HT.
> I don't think the addition hardware (if any) is for computing though. Can't be an additional core at all.


by software I meant Windows, assigming threads to non existent cores lol  OS is oblivious to what's going on...as usual it belives what the BIOS says.


And yeah from the whitepaper OOE  (out of order execuction) has to do with it, and some extra transistors.... 
They even say (on page 8 I think) the primary differnce between HT and True is true cores never switch between threads, the HT cores halt the execution and switch threads constantly.  


And of course *all *resources are shared, lol   










Nothing to see here .....nothing new at least.


----------



## lilunxm12 (Nov 8, 2015)

Pill Monster said:


> by software I meant Windows, assigming threads to non existent cores lol  OS is oblivious to what's going on...as usual it belives what the BIOS says.
> 
> 
> And yeah from the whitepaper OOE  (out of order execuction) has to do with it, and some extra transistors....
> ...



I'm not saying HT has extra compute resource that's not being shared. I just disagree your claim that HT is a purely software implementation. Intel did claim there'e a 5% die size increment and I personally interpret this as ~5% additional transistors.


----------



## Pill Monster (Nov 8, 2015)

lilunxm12 said:


> I'm not saying HT has extra compute resource that's not being shared. I just disagree your claim that HT is a purely software implementation. Intel did claim there'e a 5% die size increment and I personally interpret this as ~5% additional transistors.


I didn't "claim" HT was purely software so don't mince my words.     I said as far as SMT goes  cores it's software which it is. 

The cores don't exist except in Windows Task Manager. The kernel schedular assigns threads to cores which in turn get scheduled, not executed...   

You're detracting from the main debate anyway which is SMT.    SMT is not here, as Intel said, - it's a concept.


Nothing else to say really,.


----------



## FordGT90Concept (Nov 8, 2015)

Pill Monster said:


>


The part you circled is where Bulldozer is unique.  There's a large chunk of silicon that isn't shared.  That's also why it is not technically SMT--at least not entirely (I like to call it "hybridized").  But that's not what the lawsuit is about; the lawsuit is about AMD advertising "8-core" when the definition of "core" has to be stretched beyond the breaking point in order to describe a Bulldozer "integer core."



Pill Monster said:


> I said as far as SMT goes  cores it's software which it is.


HTT isn't software--no SMT implementation is.



Pill Monster said:


> The cores don't exist except in Windows Task Manager.


If SMT is enabled, they appear everywhere from BIOS to operating system.


----------



## lilunxm12 (Nov 8, 2015)

Pill Monster said:


> I didn't "claim" HT was purely software so don't mince my words.     I said as far as SMT goes  cores it's software which it is.
> 
> The cores don't exist except in Windows Task Manager. The kernel schedular assigns threads to cores which in turn get scheduled, not executed...
> 
> ...



yes the extra core doesn't exist. However the extra thread isn't a software implementation. It's not some pieces of code within BIOS makes OS recognize the extra thread. The extra hardware within CPU Die does the job. 
In fact, the HT of Xeon Phi (which is a rare case) can't be disabled. By all means the extra thread is a hardware implementation other than software implementation.


----------



## Pill Monster (Nov 8, 2015)

Quick note before vacating this thread ,  I just remembered the reason SMT even came up was to illustrate how BD does have 8 real cores.  

In that Intyel doc it states Xenon whitepaper HT threads canot SMT iover  it's there in tyhe article plain it staesXenon has a limit of 6 threads


lilunxm12 said:


> yes the extra core doesn't exist. However the extra thread isn't a software implementation. It's not some pieces of code within BIOS makes OS recognize the extra thread. The extra hardware within CPU Die does the job.
> In fact, the HT of Xeon Phi (which is a rare case) can't be disabled. By all means the extra thread is a hardware implementation other than software implementation.



I'm not stupid mate, troll harder.... you sound ridiculous. 







Anyway back to why SMT came up to begin with. It's to illustrate to u Ford if u even care, why BD does have 8 real cores.  


In that intel whitepaper it says a 12 core Xenon with HT is capable of sending up to 6 commands per clock cycle,. Naturally because it has only 6 cores to execute them on.   SMT on 6 cores, not 12. = 6 real cores.


An 8 core Vishera send 8 commands per clock cycle because it executes 8 threads on 8 cores.   That's 8 threads similtanesly= SMT = real cores.   




Time to vacate the thread ......


----------



## qubit (Nov 8, 2015)

Pill Monster said:


> An 8 core Vishera send 8 commands per clock cycle because it executes 8 threads on 8 cores. That's 8 threads similtanesly= SMT = real cores.


I get you're pissed off at people seeing this differently to yourself, but I don't think it's as clear cut as that and maybe you'll respond to my post.

Why? Take the case of a Bulldozer CPU with just one module enabled. Execute a thread on one integer unit and it gives a certain performance. Now execute a second thread on the other core and the performance of each thread is less than one thread on one core and there's no way that this performance hit can be alleviated by better support hardware either. It's even worse with the FPU as there's only one of them, giving truly crap performance. This only happens because of the siamesed nature of the cores. It doesn't happen on AMD's older multicore CPUs nor on Intel's, potential memory bus bottlenecks aside.

Note that a similar performance hit happens on Intel too when HT is engaged and you don't see Intel calling that second virtual core a full CPU like AMD does. I remember seeing this graphically illustrated when I used to run the SETI@Home project on my single core Pentium 4 with HT more than a decade ago, before they went BOINC. Running two threads would case each individual thread to have something like 75-85% of the performance of just one, but overall performance was higher as two were being worked on at once.

Oh yeah, AMD pulled a number on us all right by making such a hybrid processor and confusing the definition of what a core is.  I remember the real "wtf?!" feeling I had when I first looked at that Bulldozer architecture diagram and it turns out I was right from its lack of performance. AMD tried to save a few pennies with this strategy and it bit them in the ass with poor performance, poor sales and now a lawsuit. Someone over there was very stupid.


----------



## Pill Monster (Nov 8, 2015)

qubit said:


> I get you're pissed off at people seeing this differently to yourself, but I don't think it's as clear cut as that and maybe you'll respond to my post.
> 
> Why? Take the case of a Bulldozer CPU with just one module enabled. Execute a thread on one integer unit and it gives a certain performance. Now execute a second thread on the other core and the performance of each thread is less than one thread on one core and there's no way that this performance hit can be alleviated by better support hardware either. It's even worse with the FPU as there's only one of them, giving truly crap performance. This only happens because of the siamesed nature of the cores. It doesn't happen on AMD's older multicore CPUs nor on Intel's, potential memory bus bottlenecks aside.
> 
> ...




Oh please...just go away will you.  I see things differently?  You mean me and 95% of the other people who posted in in this  abortion of a thread.  Theres nothing to talk about, BD/PD has 8 cores, and the frivilous lawsuit  will die




The topic was discussed at length when the chip got released.  Seems like a bad case of dejavu..... sorry if I don't feel like revisiting the past....


----------



## Devon68 (Nov 8, 2015)

Well the guy can sue all he wants. If they said that the FX 8xxx has 8 cores and a software reads it as 8 cores it's not false advertising. The fact that an intel quad core can beat it is not relevant in this case.

I don't think he will win in court. Either way I'm happy with my choice. I didn't have money for anything better at the time.


----------



## qubit (Nov 8, 2015)

Pill Monster said:


> Oh please...just go away will you.  I see things differently?  You mean me and 95% of the other people who posted in in this  abortion of a thread.  Theres nothing to talk about, BD/PD has 8 cores, and the frivilous lawsuit  will die
> 
> 
> 
> ...


No, it has 4 _siamesed_ cores who's performance suffers when running two threads on one. I tried to sympathize with you, but you got one hell of a shit attitude. I gave you a reasonable explanation of how I see it, so all you had to do is discuss it in a reasonable manner with me and show some respect. I'm not going anywhere.

Yeah, go on, just get off this thread like you were gonna.


----------



## the54thvoid (Nov 8, 2015)

Someone close this tired useless roundabout thread.  We'll find out in about 'god knows how many' years which way it goes.

Until then it's just an utterly futile piss take of a discussion with two distinct camps, smashing their heads against walls.

You say potato, I say potato core.


----------



## qubit (Nov 8, 2015)

the54thvoid said:


> Someone close this tired useless roundabout thread.


Thing is it can't be closed because it's a front page article.


----------



## FordGT90Concept (Nov 8, 2015)

Devon68 said:


> If they said that the FX 8xxx has 8 cores...










Devon68 said:


> ...and a software reads it as 8 cores it's not false advertising.







"Cores" is "kerner" in Danish.


----------



## qubit (Nov 8, 2015)

@FordGT90Concept That's one battered box, lol.  And good point with the task manager display.


----------



## blobster21 (Nov 8, 2015)




----------



## HCT3000 (Nov 8, 2015)

FordGT90Concept said:


> Pretty sure he's going to win.  I don't think there's any nomenclature to properly describe Bulldozer's design and even if it had existed, AMD wasn't using it.
> 
> 
> x264 HD Benchmark runs on GPU and AMD undeniably has a stronger GPU in FX-8150 than Intel has in i7-2600K.  The problem stems from floating point operations executed on the CPU.  If you heavily load the FPUs in one core, the FPU performance of both cores will effectively half.



FX chips have no IGPU.....


----------



## FordGT90Concept (Nov 8, 2015)

I know that.  Displays need some kind of GPU driving them and software can use DirectCompute on them to accelerate computing.


----------



## R-T-B (Nov 8, 2015)

the54thvoid said:


> You say potato, I say potato core.



No, this is a potato core:


----------



## the54thvoid (Nov 8, 2015)

R-T-B said:


> No, this is a potato core:
> 
> View attachment 69076



No, that's a Potato Chip!

Thank you, you've been a wonderful audience - I'm here all week.

181 posts to set up TPU joke of the year.  Thanks go to Ford and his tireless efforts at core definition.


----------



## xChoice (Nov 8, 2015)

cof cof im not reading the whole comments, and i doubt someone will read mine.

Just in case no one mentioned it, what happens with this 8 core arm processors that some companies are advertising while there are in fact 4 cores + 4 cores gpu, i mean, they are "cores" but hey! they are not the cpu itself.

As for the top of the line fx processor, I thought they were real 8 complete cores, not 4 complete cores + some left overs that were good enough for running something like a hardware HT......


----------



## FordGT90Concept (Nov 8, 2015)

xChoice said:


> Just in case no one mentioned it, what happens with this 8 core arm processors that some companies are advertising while there are in fact 4 cores + 4 cores gpu, i mean, they are "cores" but hey! they are not the cpu itself.


http://www.techpowerup.com/forums/t...count-on-bulldozer.217327/page-3#post-3367326



xChoice said:


> As for the top of the line fx processor, I thought they were real 8 complete cores, not 4 complete cores + some left overs that were good enough for running something like a hardware HT......


That is the reason why this thread has 183 posts: it is subject to debate.


----------



## xChoice (Nov 8, 2015)

nono im not talking about big.little, those are 8cores, with 4 that have more resources and work faster, and 4 that are more... littles  .

anyway, if its an 8 core chip, you should be able to power gate those true 4 cores, and let the other 4 online, if they are full complete cores they should work.... isnt it that way?


----------



## FordGT90Concept (Nov 8, 2015)

http://www.realworldtech.com/bulldozer/10/


> Llano is also the first announced CPU from AMD to use power gating. The power gates are implemented as a footer ring of NFETs around the periphery of a core and L2 cache, using the package plane as a virtual ground. While Bulldozer’s hierarchical and shared microarchitecture improves area efficiency and throughput, it does complicate power gating. Conceptually there are five major circuit regions of each Bulldozer module – the shared front-end, the two integer cores, the floating point cluster and the L2 cache. Unfortunately, if a single core is active all of these regions (perhaps save the other integer core) must be active. The benefit of power gating a lone integer core is not worth the complexity of the implementation problems, especially since the operating system scheduler should be power-aware. As a result, *Bulldozer’s granularity of power gating is at the module level*. Each Interlagos die incorporates at least 4 power gates, one for each Bulldozer module.


It can't drop to 7, 5, 3, or 1 logical processors like a real 8-core processor, if that is what you mean.

There are some articles that say it can power down unused components (like the FPU) but it was phrased in a way that was theoretical so I don't know if it really does.


----------



## xChoice (Nov 8, 2015)

*cof* *cof*

module= core and the integer cores= excution units??

But those integer cores have an alu, a control unit, and registers, and L1 caches... sooo they are microprocessors on their own...... arent they?

<.< i starting to fuck up my brain


----------



## FordGT90Concept (Nov 8, 2015)

According to AMD, FX-8### and FX-9### has four "modules" and each module has two "integer cores" for a total of eight.  The plaintiff alleges that it only has four cores.

The "integer cores" do not understand x86 instructions.  They have to be decoded and dispatched to FPU/"integer core" first and in Bulldozer's design, that happens in a shared component.  The "integer cores" without the rest of the module, therefore, is quite useless.


----------



## xChoice (Nov 8, 2015)

well I agree with you in that, they are not x86 cores.

so in context, that is what we are buying, an x86 cpu that has n ammount of x86 cores, as those are not x86 cores by their own... well thats it for me


----------



## Pill Monster (Nov 8, 2015)

qubit said:


> No, it has 4 _siamesed_ cores who's performance suffers when running two threads on one. I tried to sympathize with you, but you got one hell of a shit attitude. I gave you a reasonable explanation of how I see it, so all you had to do is discuss it in a reasonable manner with me and show some respect. I'm not going anywhere.
> 
> Yeah, go on, just get off this thread like you were gonna.


Bullshit buddy.  Do you take me for an idiot?

First u gave me an underhanded insult insinuating I was pissed off due to people disagreeing with me, and made out i was the only one arguing the point. .  Which is a garbage as you know.  I don't mind a good healthy discuission at all. This isn't.

I was pissed off because I was being trolled and right at the time I was over this debate  And it's not the first time it's happened either.
So don't act like you're the victim here..


----------



## Roph (Nov 8, 2015)

FordGT90Concept said:


> I know that.  Displays need some kind of GPU driving them and software can use DirectCompute on them to accelerate computing.



x264 does not use directcompute, it can (optionally) use OpenCL. You must specifically tell it to though or swap in an OCL build, and in many cases it results in slower encoding. Few parts of x264 w/ OpenCL are accelerated.

I encode without OpenCL "acceleration"  as for example, when doing 1080p to 720p with OpenCL enabled I cap out at about 70fps. CPU only, about 130fps.


----------



## qubit (Nov 8, 2015)

Pill Monster said:


> Bullshit buddy.  Do you take me for an idiot?
> 
> *First u gave me an underhanded insult* insinuating I was pissed off due to people disagreeing with me, and made out i was the only one arguing the point. .  Which is a garbage as you know.  I don't mind a good healthy discuission at all. This isn't.
> 
> ...


I was not insulting you and was just being nice to you as I could see you were pissed off. I then quite reasonably explained where I think AMD has gone wrong here and hoped to have a reasonable discussion with you about it.

Obviously you've misunderstood my intention and have given me one hell of a douchebag response instead of responding reasonably - and you're compounding it by continuing it. Yes, it's you making a personal attack against me not the other way round. You're the one getting all drama queen threatening to leave the thread, but then just can't let go, lol. Get a grip FFS. 

Whatever, I don't give a fuck. This discussion is over.


----------



## Captain_Tom (Nov 8, 2015)

FordGT90Concept said:


> A "core" is a complete computing unit.  AMD proves it is not complete in their 6-"core" Bulldozer processors.  The two units packaged together are inseparable or they would have sold 7-"core" Bulldozer processors having only gated off the one that was defective.
> 
> 
> Because compression is mostly integer-based where Bulldozer performs more like an 8-core processor.  Even considering the widely different architectures and Bulldozer having a design ideal for it, it doesn't win by a very large margin.  The lawsuit is about the worst case scenario (saturated FPUs) and you're citing the best case scenario (saturated ALUs) where Bulldozer's non-traditional design shines.  The latter doesn't forgive the former.




Hey genius, then why are there no 3-core intel CPU's?  By your logic Intel not making some 3-core i5/i7 proves that Intel doesn't have 4 cores in an i7.


----------



## VulkanBros (Nov 8, 2015)

Serpent of Darkness said:


> If you right-click on the graph, left click and highlight "graphs to show >," and select Logical Processors, it might show 16 little graphs of "FordGT90Concept please insert what you want to call it in here because you feel strongly that AMD must burn in a fire for their fraudulent misrepresentation of cores" to further prove your point, Vulkan.  If it doesn't show 16 little graphs, then on an ironic side, it would support Ford's point.  Maybe? Doubt it.
> 
> Now that I think about it, looking at your performance tab, Vulkan, maybe AMD just markets it's CPUs as having X amount of Logical Processors, but they call it "Core Processors.  Intel market CPUs based on y-amount of Cores which are really 2 Logical Processors to 1 Core.  I think that's one part of where the problem occurs between FordGT90Concept and others having their disagreements.
> 
> If we go back to the original Topic of Discussion, the Lawsuit is probably dead on arrival.  I don't believe the court is going to award damages.  At best, it's a marketing blunder on AMD's end.


----------



## FordGT90Concept (Nov 8, 2015)

Captain_Tom said:


> Hey genius, then why are there no 3-core intel CPU's?  By your logic Intel not making some 3-core i5/i7 proves that Intel doesn't have 4 cores in an i7.


Good point.  Intel could do that but they don't.  They're sold as dual-cores. AMD presumably sells them as six-cores.


----------



## Captain_Tom (Nov 8, 2015)

FordGT90Concept said:


> Good point.  Intel could do that but they don't.  They're sold as dual-cores. AMD presumably sells them as six-cores.



Wrong - something in Intel's architecture doesn't allow them to do that, because they clearly would.  Same with AMD's construction archs - how they built it allows them to save TDP, money, and die space by having the cores share some resources.


----------



## Aquinus (Nov 8, 2015)

No rebuttle to my point made earlier? People seem to be content spouting off the same thing they've been saying for pages but not following up anything substantive...

http://www.techpowerup.com/forums/t...count-on-bulldozer.217327/page-6#post-3367789


----------



## FordGT90Concept (Nov 8, 2015)

Captain_Tom said:


> Wrong - something in Intel's architecture doesn't allow them to do that, because they clearly would.


Sure it does.  I distinctly remember being able to disable 1, 2, or 3 cores via the BIOS on Core i7-920.  The reason why Intel doesn't sell tri-cores and why AMD didn't sell Phenom II X5 Thubans is because they a) need enough volume to warrant doing so and b) would have to generate another SKU for it (creates a lot of hassle from additional UPC code to the creation of store front pages at vendors like Newegg and Amazon).



Aquinus said:


> No rebuttle to my point made earlier? People seem to be content spouting off the same thing they've been saying for pages but not following up anything substantive...
> 
> http://www.techpowerup.com/forums/t...count-on-bulldozer.217327/page-6#post-3367789


Every point was already rebutted on its own so I felt no need to repeat myself.  The majority was about performance which is why I replied the way I did: 4-core-like performance is cited in the lawsuit for damages and it's awkward design can be at least partially faulted for that.


----------



## qubit (Nov 8, 2015)

Captain_Tom said:


> *Wrong - something in Intel's architecture doesn't allow them to do that, because they clearly would.*  Same with AMD's construction archs - how they built it allows them to save TDP, money, and die space by having the cores share some resources.


No, I don't think that's the case with Intel. In the BIOS I can set the core count to 1,2,3 or 4 with separate HT on/off. I've experimented with that before and the computer worked just fine, therefore, it would be trivial for Intel to fuse off a core if they wanted to. They already do that with two cores to make dual core CPUs.
Why they don't sell tri core CPUs you'd have to ask them.

@FordGT90Concept Dammit, you beat me to it!


----------



## Frick (Nov 8, 2015)

And when Ford rebutts something it stays rebutted. Especially when he has qubit backing him up.

EDIT: If it wasn't clear I was sarcastic.


----------



## Captain_Tom (Nov 9, 2015)

qubit said:


> No, I don't think that's the case with Intel. In the BIOS I can set the core count to 1,2,3 or 4 with separate HT on/off. I've experimented with that before and the computer worked just fine, therefore, it would be trivial for Intel to fuse off a core if they wanted to. They already do that with two cores to make dual core CPUs.
> Why they don't sell tri core CPUs you'd have to ask them.
> 
> @FordGT90Concept Dammit, you beat me to it!


No you can set it to 1,2, or 4.


----------



## qubit (Nov 9, 2015)

Captain_Tom said:


> No you can set it to 1,2, or 4.


I can set it to 3, I double checked just before posting. Note that I've got a 2700K, so perhaps this restriction applies to later CPUs?

I can even set it to 3 cores + HT if I want to. Weird but true.



Frick said:


> And when Ford rebutts something it stays rebutted. Especially when he has qubit backing him up.


I think I've just been Fricked.


----------



## Captain_Tom (Nov 9, 2015)

qubit said:


> I can set it to 3, I double checked just before posting. Note that I've got a 2700K, so perhaps this restriction applies to later CPUs?
> 
> I can even set it to 3 cores + HT if I want to. Weird but true.
> 
> ...



Are you referring to 3 active cores or limiting boost per core?


----------



## qubit (Nov 9, 2015)

Captain_Tom said:


> Are you referring to 3 active cores or limiting boost per core?


3 active cores. Task Manager actually shows three graphs. Perhaps we're at cross purposes? lol. 

Incidentally, when I first built my system, I of course posted about it on TPU. I then couldn't resist playing a little joke on everyone by saying how pleased I was with my new "5 core" CPU and actually posted a screenshot of TM running 5 threads.  The best bit was that it actually took a little while for people to catch on, lol.


----------



## Captain_Tom (Nov 9, 2015)

qubit said:


> 3 active cores. Task Manager actually shows three graphs. Perhaps we're at cross purposes? lol.
> 
> Incidentally, when I first built my system, I of course posted about it on TPU. I then couldn't resist playing a little joke on everyone by saying how pleased I was with my new "5 core" CPU and actually posted a screenshot of TM running 5 threads.  The best bit was that it actually took a little while for people to catch on, lol.



Well I swear the bios in my PC doesn't let me do it - and it is one of the highest end Z87 motherboards.  Maybe they did just disable it in later archs.

Still though , I maintain my position that if they could, they would have made 3 core cpu's.


----------



## Xuper (Nov 9, 2015)

This thread broke new world record?


----------



## Dieinafire (Nov 9, 2015)

AMD is always up to shady business like this. They need to learn a big lesson and stop trying to cheat


----------



## Dent1 (Nov 9, 2015)

Dieinafire said:


> AMD is always up to shady business like this. They need to learn a big lesson and stop trying to cheat



And hyperthreading wasn't Intel's way of trying to trick their less technology aware consumers into thinking they have a genuine 8 core?

Anyways its up to the judge to determine if AMD is a cheat. Not us.


----------



## Dieinafire (Nov 9, 2015)

Dent1 said:


> And hyperthreading wasn't Intel's way of trying to trick their less technology aware consumers into thinking they have a genuine 8 core?
> 
> Anyways its up to the judge to determine if AMD is a cheat. Not us.


Intel never said it was 8 cores while amd did. BIG difference


----------



## Dent1 (Nov 9, 2015)

Dieinafire said:


> Intel never said it was 8 cores while amd did. BIG difference



Maybe Intel dont say 8 core on the box but its clear that most non enthusiasts think when they see 8 graphs in the task manager they assume it means 8 cores.  Intel have done little in terms of trying to educate the non-enthusiasts consumer about how hyper threading works. Intel are happy for consumers to believe what they believe which is fine.

Anyways AMD might have a genuine 8 core, its all up to interpretation look at this thread alone you can make a good argument either way. None of us here can say definitively whether AMD is in breach of anything. Lets wait for the legal system to decide.


----------



## vega22 (Nov 9, 2015)

FordGT90Concept said:


> Pretty sure he's going to win.  I don't think there's any nomenclature to properly describe Bulldozer's design and even if it had existed, AMD wasn't using it.
> 
> 
> x264 HD Benchmark runs on GPU and AMD undeniably has a stronger GPU in FX-8150 than Intel has in i7-2600K.  The problem stems from floating point operations executed on the CPU.  If you heavily load the FPUs in one core, the FPU performance of both cores will effectively half.




you can overload HT too.

amd will win this if the judge is clued up, if not.....


----------



## Tsukiyomi91 (Nov 9, 2015)

no wonder he's a dick lol


----------



## Dent1 (Nov 9, 2015)

marsey99 said:


> you can overload HT too.
> 
> amd will win this if the judge is clued up, if not.....



In big cases there will be a panel of impartial technology lawyers which will explain the technologies behind the architecture to the judge. I would think AMD would bring their own engineers as part of their defence too


----------



## john_ (Nov 9, 2015)

Wow.... 9 pages? Have the jury come to a decision yet?


----------



## Dieinafire (Nov 9, 2015)

john_ said:


> Wow.... 9 pages? Have the jury come to a decision yet?


Verdict is amd screwed their costumers by lying for years


----------



## john_ (Nov 9, 2015)

Dieinafire said:


> Verdict is amd screwed their costumers by lying for years


They screwed their financials for 5 years and brought the company very close to bankruptcy. They also screwed the hopes of AMD fans with processors that unfortunately are not competitive to Intel and with no money to support their platforms we haven't seen Steamroller FX chips, Excavator APUs, or Beema based AM1 chips. They didn't even had money to create a totally new GPU to take full advantage of HBM1. They could just implement HBM memory on two Tonga cores, that they glued together, and here it is, the new Fiji GPU, with only 32+32=64 ROPs losing from GM200 when it should be beating it.
But no, they didn't screw their customers because they always price their products based on the performance those products offer compared to Intel products. Anyone hoping to get a quad core FX chip and beat a 3-5 times more expensive i5, well, why pay for an FX chip? Go and buy a cheap quad core Braswell tablet/miniPC/Stick/whatever and destroy that i5 Skylake. Right? Riiiiiiiggghhhttttt.........


----------



## Aquinus (Nov 9, 2015)

> AMD has, in a very real sense, been thoroughly punished for the CPU it brought to market in 2011 — and this lawsuit makes claims that don’t hold up to technical scrutiny.


http://www.extremetech.com/extreme/...lse-bulldozer-chip-marketing-is-without-merit



> It's an argument which appears to rest wholly on the presence of only four FPU cores on the eight-core chip, but one which Dickey may struggle to win: floating-point units have not always been integral to processor designs, with early processors being integer-only models which emulated floating-point mathematics internally and the first FPUs themselves being entirely separate chips used as a co-processor, so to argue that the core-count of a chip is tied directly to the number of FPU units present is an interesting tactic - doubly so when it is entirely possible for the processor in question to run eight integer-based threads simultaneously.


http://www.bit-tech.net/news/hardware/2015/11/09/amd-bulldozer-core-count-suit/1

I'm apparently not the only person who thinks the FPU claim for what constitutes a core is bogus.


----------



## FordGT90Concept (Nov 9, 2015)

Dent1 said:


> Maybe Intel dont say 8 core on the box but its clear that most non enthusiasts think when they see 8 graphs in the task manager they assume it means 8 cores.  Intel have done little in terms of trying to educate the non-enthusiasts consumer about how hyper threading works. Intel are happy for consumers to believe what they believe which is fine.


They always refer to # cores w/ HT or # cores w/o HT.  Intel never claims SMT is equal to more cores like AMD does with Bulldozer.



marsey99 said:


> you can overload HT too.


Not really.  The underlying core will report 100% use.  All threads are still simultaneously executed.



Aquinus said:


> http://www.extremetech.com/extreme/...lse-bulldozer-chip-marketing-is-without-merit
> 
> http://www.bit-tech.net/news/hardware/2015/11/09/amd-bulldozer-core-count-suit/1
> 
> I'm apparently not the only person who thinks the FPU claim for what constitutes a core is bogus.


As previously discussed: You have to go back 25 years to find those dinosaurs in the x86 market.  They're dead for a reason and Bulldozer suffered a similar fate.

The FPU uses SMT. I'd argue that any core that uses SMT excludes itself from equating threads to cores.


----------



## Pill Monster (Nov 9, 2015)

FordGT90Concept said:


> They always refer to # cores w/ HT or # cores w/o HT.  Intel never claims SMT is equal to more cores like AMD does with Bulldozer.
> 
> 
> Not really.  The underlying core will report 100% use.  All threads are still simultaneously executed.


No they're scheduled, not executed. It states that in the whitepaper as I said earlier.  12 "core" Xenon can send 6 commands per clock. If it was 12 the scaling would be a lot closer to Piledriver. 
The redeeming feature for Intel is AMD's slow cache.


----------



## FordGT90Concept (Nov 9, 2015)

A lot of every processor in existence takes more than one clock to complete a task.  While the FPU is crunching on something, SMT allows another thread to be processed through the ALU which takes far fewer clocks.  Another example is a thread having to wait because of a cache miss, the other thread keeps executing.  Like Bulldozer, after instructions are decoded, a lot of the processor is out-of-order and that is where SMT occurs.  The only thing different about Bulldozer is that there are two ALUs instead of one.  The rest of the processor mimics SMT.  I would never call a core with two ALUs a dual core.


----------



## Dent1 (Nov 9, 2015)

Dieinafire said:


> Verdict is amd screwed their costumers by lying for years



So your a lawyer or a judge? Maybe you are a clairvoyant and can predict the outcome of the court case?



FordGT90Concept said:


> They always refer to # cores w/ HT or # cores w/o HT.




I agree Intel has done nothing wrong. Just saying Intel doesn't go out of their way to educate consumers what Hyperthreading is and 90% of their customers will assume its a core or functions the same as one.  I'm not saying Intel should be obliged morally or legally to explain either just pointing out this fact.




FordGT90Concept said:


> Intel never claims SMT is equal to more cores like AMD does with Bulldozer.



To be fair it could be equal to, or equivalent to or classified as more cores until the verdict of the court case says otherwise. At the moment AMD has done nothing wrong. Its like a guilty until proven innocent witch hunt.


----------



## Aquinus (Nov 9, 2015)

FordGT90Concept said:


> A lot of every processor in existence takes more than one clock to complete a task.


Bullshit. There are a lot of instructions that not only execute in 1 second cycle, it can sometimes do several of the same instruction at once.

Before I grab part of this document, I will quote it:


> LNN means latency for NN-bit operation. TNN means throughput for NN-bit operation. The term throughput is used to mean number of instructions per cycle of this type that can be sustained. That implies that more throughput is better, which is consistent with how most people understand the term. Intel use that same term in the exact opposite meaning in their manuals.






Source: https://gmplib.org/~tege/x86-timing.pdf
Lets look at Sandy Bridge for a minute:
add, sub, and, or, xor inc, dec, neg, and not all* execute in a single clock cycle and can process 3 of these uOps at once per core*. Haswell expanded that to 4 uOps per cycle from 3 on SB. Even AMD's K10 was the same way but then you look at AMD's BD1 (which is what we're all huffy about,) and you notice that *these same instructions can only do 2 uOps per clock cycle on Bulldozer*. Then there are cases like double shift left and right which has a fraction of the performance on BD versus modern Intel CPUs.

People need to get their information right. Bulldozer is slow because dedicated components are skimped on, the fact that instructions usually take the same number of cycles as its Intel counterpart in many cases however, have much less throughput resulting in uOps having to be run more often than they would otherwise, which increases latency and translates certain full instructions into a longer set of uOps because of the CPU. So you might have an instruction with uOps that an Intel CPU could execute in one clock cycle but the AMD CPU might need two because it doesn't have enough resources in a single core to do it all at once.

For what its worth, Intel cores might not execute instructions "faster" but, it's that they can do more of them in a single clock cycle but both AMD and Intel both have a lot of core x86 instructions that not only occur in one cycle but, can execute multiple of the same uOps in the same cycle, which is where pipelining comes into play for instructions that allow pipelining.

It's also worth noting that there are x86 instructions that are not pipelined for various reasons. That's in this other document:
http://www.intel.com/content/www/us...-ia-32-architectures-optimization-manual.html


----------



## FordGT90Concept (Nov 9, 2015)

Aquinus said:


> Bullshit. There are a lot of instructions that not only execute in 1 second, it can sometimes do several of the same instruction at once.


Obvious typo is obvious: 1 clock != 1 second.  4.0 GHz = 4 billion clocks per second.


It does appear I was backwards...assuming no cache misses; that's the point though, isn't it?  With two threads in the core, more usually gets done.  The difference between HTT and Bulldozer's implementation is that Bulldozer should theoretically (assuming all else was equal) be able to do more integer operations in the same time frame.  That still doesn't change the definition of a core.

I have no problem with Bulldozer's design.  I have a problem with AMD calling it 8-cores (except where appropriate in some Opterons).


I find it ironic K10 beats Bulldozer on pretty much every one.  The only advantage Bulldozer has over K10 is the higher clockspeeds.


----------



## Aquinus (Nov 9, 2015)

FordGT90Concept said:


> Obvious typo is obvious: 1 clock != 1 second. 4.0 GHz = 4 billion clocks per second.


My bad, I corrected it.


FordGT90Concept said:


> It does appear I was backwards...assuming no cache misses; that's the point though, isn't it? With two threads in the core, more usually gets done. The difference between HTT and Bulldozer's implementation is that Bulldozer should theoretically (assuming all else was equal) be able to do more integer operations in the same time frame. That still doesn't change the definition of a core.


The point is that overall poor performance is due to a slim core, not a shared module and the throughput of BD in my last post is a very clear indicator of that.


FordGT90Concept said:


> I find it ironic K10 beats Bulldozer on pretty much every one. The only advantage Bulldozer has over K10 is the higher clockspeeds.


Clock wise, K10 was a significantly larger core but, it also had a lot more under the hood dedicated for one core. I'll give AMD that they were able to squeeze quite a bit of parallel throughput on these CPUs but that's never the kind of workload consumers really need to care about.

The simple fact is that BD has two real cores, the problem is that while uOps execute just as fast, instructions that have certain combinations of uOps is going to impact AMD's BD core a lot more than one of Intel's. Even Intel has shown that they would rather beef up a core and AMD's problem is that two lanky cores isn't going to provide the single-threaded throughput you want. If there are instructions that are taking fewer cycles to complete on Intel CPUs, that's a pretty tell tale sign that it's the cores themselves. Add to that the fact that BD cores scale almost linearly on purely parallel workloads (excluding certain FP applications but, that really depends on the particular instructions being used.)

Nothing here to me says they're not 8 real cores. What people are pissed off about is that they're 8 gimped cores, even for integer operations but, that's not because of shared components. If it was a real implementation of hardware SMT like hyper-threading, we wouldn't see the kind of scaling we're seeing with modules which is near linear for purely parallel workloads. What we're seeing is 8 core CPUs where every core is something like 80% of what it should be. It scales properly and runs properly, with the exception that single threaded performance is 20% less than where it should have been and that people were expecting Phenom II like performance in single-threaded applications but BD performance on multi-threaded applications which wasn't the result.

AMD made some choices and it resulted in focusing on more cores and less on individual core performance. As a result, people got irritated that their skinny cores couldn't bite off enough at once and wanted their fatter cores that were more efficient in single threaded applications back (here comes Xen!)

Our disagreement isn't that Bulldozer blows, it's how it blows, and I think blaming the FPU and shared components is a bit of a stretch given the amount of information that indicates that even integer performance is tailing K10 per clock. They only try to make up for that with clock speeds, as you said. None of this has to do with whether it has 8 real cores or not, it has to do with how shitty the slimmed down integer cores are. Mix that with the shared FPU and added latencies on FP instructions, and you have a recipe lackluster performance. All of which still can happen even if there are 8 real cores.

Take Intel's 8c Atom the C2750 I think it is. It's performance trails core series CPUs at the same clock speed with half as many cores but with SMT, so does it mean that the Atom doesn't have real cores? NO! It means the Atom's core is lacking in performance despite having 8 real cores and doesn't efficiently use every clock cycle like the i5 and i7s, just like Bulldozer.


----------



## eidairaman1 (Nov 9, 2015)

Good Point, Yea I did a CPU z comparison of the 8350 vs Intel, the Intel parts lead AMD by 8% in Single Thread Performance. 16vs24. the OCing of the BD only pushes its multithread performance up.



Aquinus said:


> My bad, I corrected it.
> 
> The point is that overall poor performance is due to a slim core, not a shared module and the throughput of BD in my last post is a very clear indicator of that.
> 
> ...


----------



## Aquinus (Nov 9, 2015)

eidairaman1 said:


> Good Point, Yea I did a CPU z comparison of the 8350 vs Intel, the Intel parts lead AMD by 8% in Single Thread Performance. 16vs24. the OCing of the BD only pushes its multithread performance up.


Then on multi-threaded score it only starts to catch up when the Intel CPU starts relying on hyper threading which goes back to "if they're not real cores, why do they scale like they are?"


----------



## FordGT90Concept (Nov 9, 2015)

@Aquinus: again, performance is peripheral.  AMD says right on the box "8-core" but everything under the hood says otherwise from Microsoft calling it "Cores: 4, Logical Processors 8" to Phenom II X6 beating it in the multithreaded tests where Bulldozer is supposed to excel, to the decoders being shared, to the FPU clearly using SMT, to the die shot looking a whole lot more like a monolithic core than a dual core from any other architecture (excepting Piledriver and Steamroller, of course).  It should have been marketed as a "4 core" or maybe a "4+ core" to indicate it isn't traditional, not an "8 core."  Had AMD called it what it really is, this lawsuit never would have happened.  AMD is going to lose because it is patently obvious they stretched the meaning of "core" beyond the breaking limit.  The only way the plaintiff loses is if he does a terrible job.


I think we can all agree there isn't much more to be said on this topic until there is a verdict.  I'll take my leave until then.


----------



## Dent1 (Nov 9, 2015)

FordGT90Concept said:


> @Aquinus: again, performance is peripheral.  AMD says right on the box "8-core" but everything under the hood says otherwise from Microsoft calling it "Cores: 4, Logical Processors 8" to Phenom II X6 beating it in the multithreaded tests where Bulldozer is supposed to excel, to the decoders being shared, to the FPU clearly using SMT, to the die shot looking a whole lot more like a monolithic core than a dual core from any other architecture (excepting Piledriver and Steamroller, of course).  It should have been marketed as a "4 core" or maybe a "4+ core" to indicate it isn't traditional, not an "8 core."  Had AMD called it what it really is, this lawsuit never would have happened.  AMD is going to lose because it is patently obvious they stretched the meaning of "core" beyond the breaking limit.  The only way the plaintiff loses is if he does a terrible job.
> 
> 
> I think we can all agree there isn't much more to be said on this topic until there is a verdict.  I'll take my leave until then.



When you're a multinational lawsuits happen everyday from intellectual property disputes, workplace hazards, pension breaches, sexual harassment etc. This is a part of business.

If AMD called it a 4-core with physical hyper threading this lawsuit would have been avoided, but somebody could still turn around and sue saying AMD has an "unfair" performance monopoly by deliberately under-spec'ing their processors to outperform the competition.  Not saying they would win, but a lawsuit could still be filed.


----------



## eidairaman1 (Nov 10, 2015)

Aquinus said:


> Then on multi-threaded score it only starts to catch up when the Intel CPU starts relying on hyper threading which goes back to "if they're not real cores, why do they scale like they are?"



Yeah a 8350 In Multithread only leads the 6700 by 4% when the CPU is OCd to 5.0GHz (104%vs 100%). I'd figure HyThr is like how DDR operates


----------



## xenocide (Nov 10, 2015)

I would say if they can drive home the inability to disable 1 "core" and at minimum disabling 1 "module" then they will be able to win.  AMD definitely moved the goal posts on what defines a "core".  I remember the "Real Men Use Real Cores" campaign they ran against Intel in the Athlon X2 days, and going by their own definition Bulldozer didn't have 8 "cores".


----------



## Pill Monster (Nov 10, 2015)

FordGT90Concept said:


> A lot of every processor in existence takes more than one clock to complete a task.  While the FPU is crunching on something, SMT allows another thread to be processed through the ALU which takes far fewer clocks.  Another example is a thread having to wait because of a cache miss, the other thread keeps executing.  Like Bulldozer, after instructions are decoded, a lot of the processor is out-of-order and that is where SMT occurs.  The only thing different about Bulldozer is that there are two ALUs instead of one.  The rest of the processor mimics SMT.  I would never call a core with two ALUs a dual core.






> 12 "core" Xeon can send 6 commands per clock _*cycle*_.



There you go, corrected...



Btw on a different note here's a couple of benchmarks if anyone is curious....

Check out the latency on bottom screenshot....


Phenom II @4.0 (Singlethreaded)








Phenom II @4.4 (Single)








Vishera @3.5 (stock) (Single)







Vishera @5.0 (Singlethreaded)









Vishera @4.7 (Multithreaded)






(No bandwith sorry PC kept locking up with full test).


----------



## lotsofstupid (Jan 1, 2016)

yes so an intel SX Cpu wasnt really a chip so we should sue intel for lying decades ago, the SX had no math co processor and you had to by it as an addon chip the intel DX had the chip built in, same thing here. Make -j4 has no issue working on linux and is twice as fast as make -j2


----------



## qubit (Jan 2, 2016)

lotsofstupid said:


> *yes so an intel SX Cpu wasnt really a chi*p so we should sue intel for lying decades ago, the SX had no math co processor and you had to by it as an addon chip the intel DX had the chip built in, same thing here. Make -j4 has no issue working on linux and is twice as fast as make -j2


Whut?! That makes no sense.

This isn't at all like with AMD. When Intel disabled the FPU to make a 486SX it didn't then market the chip as FPU capable, hence there's nothing to sue them over. In fact they actually marketed an FPU coprocessor to go with it to restore the missing function.

However, Bulldozer doesn't have 8 discrete cores, but rather 4 siamesed ones that share resources and have lower performance as a result. Completely different scenario so no wonder they're getting sued.


----------



## newtekie1 (Jan 2, 2016)

qubit said:


> Whut?! That makes no sense.
> 
> This isn't at all like with AMD. When Intel disabled the FPU to make a 486SX it didn't then market the chip as FPU capable, hence there's nothing to sue them over. In fact they actually marketed an FPU coprocessor to go with it to restore the missing function.



His point is that an FPU isn't required to be considered a full core.  If it was then CPUs that didn't have FPUs would be considered 0-Core processors.  You wouldn't call a 486SX a 0-Core processor, would you?

AFAIK, the 486 line didn't offer a separate FPU, there was no way to restore the lost function of the 486SX.  It was the 386 line that had a separate FPU available.  They made something called a i487SX, but  it was really a full blown 486DX, when installed it would disable the 486SX completely and take over all CPU operations.



qubit said:


> However, Bulldozer doesn't have 8 discrete cores, but rather 4 siamesed ones that share resources and have lower performance as a result. Completely different scenario so no wonder they're getting sued.



If sharing resources results in one core, than all of Intel's current desktop processors are all single-core processors...



FordGT90Concept said:


> Microsoft calling it "Cores: 4, Logical Processors 8



What a different company calls it doesn't really matter here.  Remember when Microsoft used to call single core Pentium 4 processors with hyperthreading "two processors"?  I do.  So what Microsoft says doesn't really matter.

Also, there is other software, ones that are far more geared towards dealing with processor specs, that say they are 8-cores.  CPU-Z says 8-Cores and 8-Threads.  Microsoft's software can't even read the clock speed properly half the time, so I'd say we should listen to CPU-Z over what Windows says.


----------



## FordGT90Concept (Jan 2, 2016)

Do realize that when FPUs were co-processors, the concept of a "core" didn't exist.  Skip forward a few years and co-processors were relegated to history.  Skip forward about a decade and you have AMD producing two processors on one die.  Intel quickly follows with multi-chip modules on a die to get two processors.  Skip forward about half a decade from there and you finally reach AMD sharing the FPU with two ALUs.  You have to dredge up 20 years of history to reach that conclusion.  In computing, that's how many lifetimes?  10?  It doesn't work that way.  AMD is trying to redefine the definition of "core" to mislead consumers into thinking they're getting more than they got.  This lawsuit is completely justified and should have happened years ago.

...but we've already been over all that, haven't we?


----------



## newtekie1 (Jan 2, 2016)

FordGT90Concept said:


> Do realize that when FPUs were co-processors, the concept of a "core" didn't exist.  Skip forward a few years and co-processors were relegated to history.  Skip forward about a decade and you have AMD producing two processors on one die.  Intel quickly follows with multi-chip modules on a die to get two processors.  Skip forward about half a decade from there and you finally reach AMD sharing the FPU with two ALUs.  You have to dredge up 20 years of history to reach that conclusion.  In computing, that's how many lifetimes?  10?  It doesn't work that way.  AMD is trying to redefine the definition of "core" to mislead consumers into thinking they're getting more than they got.  This lawsuit is completely justified and should have happened years ago.
> 
> 
> ...but we've already been over all that, haven't we?



Except Intel redefined what a Core was just the same.  Sharing resources isn't justification for calling something one core.  When Intel decided that 2 of their cores would share a single L2 cache, we all didn't say the cores weren't two cores.  The story of the L2 cache is very similar to the FPU. It was something that started as being completely separate from the processor(just like the FPU), something that was eventually integrated into the processor(just like the FPU), and just like the FPU, each core had its own L2 before Intel decided to have 2 cores share a single L2 cache.


----------



## FordGT90Concept (Jan 2, 2016)

newtekie1 said:


> What a different company calls it doesn't really matter here.  Remember when Microsoft used to call single core Pentium 4 processors with hyperthreading "two processors"?  I do.  So what Microsoft says doesn't really matter.


That was Windows XP and XP only has two states: uniprocessor (one thread at a time) and multiprocessor (two or more threads at a time).  Multiprocessor could mean two physical sockets with one core each, one socket with two cores, or one physical + one logic processor.  It was updated to better handle the three variations.

Bulldozer did the same thing with Vista.  Vista (I believe 7 too) called it eight-cores because it was incapable of distinguishing them but that apparently caused problems because updates were released to fix core parking issues.  Come Windows 8 and newer, Microsoft updated the operating system to definitively account for sockets, cores, and logic processors which is where we see 4 cores and 8 logic processors.



newtekie1 said:


> Also, there is other software, ones that are far more geared towards dealing with processor specs, that say they are 8-cores.  CPU-Z says 8-Cores and 8-Threads.  Microsoft's software can't even read the clock speed properly half the time, so I'd say we should listen to CPU-Z over what Windows says.


CPU-Z doesn't need to schedules threads.  Windows does.  Microsoft did what they did deliberately so the scheduler best utilizes the processor resources.



newtekie1 said:


> Except Intel redefined what a Core was just the same.  Sharing resources isn't justification for calling something one core.  When Intel decided that 2 of their cores would share a single L2 cache, we all didn't say the cores weren't two cores.  The story of the L2 cache is very similar to the FPU. It was something that started as being completely separate from the processor(just like the FPU), something that was eventually integrated into the processor(just like the FPU), and just like the FPU, each core had its own L2 before Intel decided to have 2 cores share a single L2 cache.


Caches have always been tiered.  The closer the tier is to the ALUs and FPUs, the faster it is.  Caches completely lack logic and there's numerous advantages, and virtually no disadvantages, to sharing caches (scheduler will allot the cache evenly when the load is even).

There's only a handful of FPUs shared in the computing world outside of Bulldozer (and derivatives) and all of them are set up in a way that resembles a co-processor.  That is, it has it's own scheduler and all of the cores can queue work to it--effectively its own core.  They don't market it as having an extra core though because that would be misleading.


----------



## newtekie1 (Jan 2, 2016)

FordGT90Concept said:


> That was Windows XP and XP only has two states: uniprocessor (one thread at a time) and multiprocessor (two or more threads at a time). Multiprocessor could mean two physical sockets with one core each, one socket with two cores, or one physical + one logic processor. It was updated to better handle the three variations.
> 
> Bulldozer did the same thing with Vista. Vista (I believe 7 too) called it eight-cores because it was incapable of distinguishing them but that apparently caused problems because updates were released to fix core parking issues. Come Windows 8 and newer, Microsoft updated the operating system to definitively account for sockets, cores, and logic processors which is where we see 4 cores and 8 logic processors.



How Microsoft labels it in their OS to make their OS work better doesn't matter.  The purpose of their software isn't to give CPU specs, and like I said, even in their current OSes they can't even get the CPU clock speeds right a lot of the time.



FordGT90Concept said:


> CPU-Z doesn't need to schedules threads. Windows does. Microsoft did what they did deliberately so the scheduler best utilizes the processor resources.



Yes, but what CPU-Z does do is give CPU specs.  In fact that is all it does, and the program designed to tell you what specs your CPU has says 8 Cores.  Microsoft's software is one of the few that actually says 4 Cores, almost everything else says 8 Core.  Every program that gives CPU specs says 8 cores.  LInux says 8 Cores(well CPUs actually).  



FordGT90Concept said:


> Caches have always been tiered. The closer the tier is to the ALUs and FPUs, the faster it is. Caches completely lack logic and there's numerous advantages, and virtually no disadvantages, to sharing caches.



The disadvantage to sharing cache is that it is slower than not sharing cache.  If each core had 4MB of cache it would be faster than sharing 4MB of cache.

Either way, L2 cache's story is very similar to the FPU.  It was separate, it then was integrated, it then was shared between two cores.  That doesn't make the two cores count as one.

If we are going to let Intel get away with sharing resources and still calling them separate cores, then we have to allow AMD.


----------



## FordGT90Concept (Jan 2, 2016)

newtekie1 said:


> How Microsoft labels it in their OS to make their OS work better doesn't matter.  The purpose of their software isn't to give CPU specs, and like I said, even in their current OSes they can't even get the CPU clock speeds right a lot of the time.


The purpose of their software is to describe what is present.

Microsoft doesn't deal with clock speeds, they deal with processor states.  The clock speed data they do provide is only as a convenience.  That said, it appears accurate to me in Windows 10.  It's pretty obvious Microsoft put a lot of effort into understanding the processor in more recent versions of Windows (probably because of their work with ARM).



newtekie1 said:


> The disadvantage to sharing cache is that it is slower than not sharing cache.  If each core had 4MB of cache it would be faster than sharing 4MB of cache.


Yes, but would also cost a lot more as well as consuming more power and producing more heat.



newtekie1 said:


> Either way, L2 cache's story is very similar to the FPU.  It was separate, it then was integrated, it then was shared between two cores.  That doesn't make the two cores count as one.


Let's use the test of disabling cores.  Where L2 is shared, can you disable half of the cores above it and still have the processor function perfectly normal?  With L2 (and L3, and L4, and so on), the definitive answer is "yes."  Does Bulldozer pass the same test?  The definitive answer is "no."  The former constitutes of legitimate cores while the latter does not.


----------



## Aquinus (Jan 2, 2016)

newtekie1 said:


> Sharing resources isn't justification for calling something one core.


I think this is where it all ends. When push comes to shove, FPU operations can be done on an integer core at the expense of extra cycles. There are many older ARM cores that do this that lack a real FPU, it's just not commonly done anymore because of performance.


FordGT90Concept said:


> That is, it has it's own scheduler and all of the cores can queue work to it--effectively its own core.


Sorry to bust your bubble but, Bulldozer *does* have a scheduler for each FPU per module as well as as a scheduler for every integer core. The only components they share excluding cache are *fetch and decoding units*, something that any core will require. On the original BD chip, that was actually a significant problem because that became a bottleneck in the CPU which is why later in Steamroller, an extra decode unit was added to each module which leaves what, a single shared fetch unit?

People shouldn't be bitching about if they're "real" cores or not. They should be questioning why the integer cores suck in the first place. It's not because of shared components, it's because each core is actually gimped. I posted this earlier but maybe people have short memories. Explain to me why BD can only process practically half as many instructions per clock versus Haswell? That alone will contribute to cruddy performance, you don't even have to look further than the integer cores to figure out that one.

People are blaming one thing, when they should be blaming another. Most operations in a CPU are going to be integer operations. While floating point math is used often, it's not used as often as the integer ALU in most circumstances which is why AMD shared it in the first place. What AMD screwed up is gimping the integer cores.

For those with a short memory or the inability to go back a page or two:


Aquinus said:


> Bullshit. There are a lot of instructions that not only execute in 1 second cycle, it can sometimes do several of the same instruction at once.
> 
> Before I grab part of this document, I will quote it:
> 
> ...



Simply put, Bulldozer didn't suck because of a shared FPU, it sucks because they gimped the integer cores worse than on K10 (per clock).


----------



## de.das.dude (Jan 2, 2016)

this is just stupid :/
have an 8 core processor and its definitely better at multi threaded stuff than others. Plus it cost a fraction of what intel has to offer.


----------



## qubit (Jan 2, 2016)

Sorry NT, I'm gonna write my responses in your quote in a different colour, because it's a bit easier that way than all the fiddly cutting and pasting of tags. 



newtekie1 said:


> His point is that an FPU isn't required to be considered a full core.  If it was then CPUs that didn't have FPUs would be considered 0-Core processors.  You wouldn't call a 486SX a 0-Core processor, would you?
> 
> AFAIK, the 486 line didn't offer a separate FPU, there was no way to restore the lost function of the 486SX.  It was the 386 line that had a separate FPU available.  They made something called a i487SX, but  it was really a full blown 486DX, when installed it would disable the 486SX completely and take over all CPU operations.
> 
> ...


----------



## newtekie1 (Jan 2, 2016)

FordGT90Concept said:


> The purpose of their software is to describe what is present.
> 
> Microsoft doesn't deal with clock speeds, they deal with processor states. The clock speed data they do provide is only as a convenience. That said, it appears accurate to me in Windows 10. It's pretty obvious Microsoft put a lot of effort into understanding the processor in more recent versions of Windows (probably because of their work with ARM).



Yet they list the processor speed(often times wrong), right next to where they list the cores and logical threads, they don't list processor states.  In fact, by your same logic, they aren't actually dealing with cores and threads either, they are dealing with the schedulers on the processor.



FordGT90Concept said:


> Yes, but would also cost a lot more as well as consuming more power and producing more heat.



Just like including more FPUs.



FordGT90Concept said:


> Let's use the test of disabling cores. Where L2 is shared, can you disable half of the cores above it and still have the processor function perfectly normal? With L2 (and L3, and L4, and so on), the definitive answer is "yes." Does Bulldozer pass the same test? The definitive answer is "no." The former constitutes of legitimate cores while the latter does not.



I don't think there is really anything technically stopping an integer core on a bulldozer CPU from being being disabled.  Each integer core has its own scheduler, so technically you just have to disable that scheduler and the integer core will be effectively disabled as well. 



Aquinus said:


> Simply put, Bulldozer didn't suck because of a shared FPU, it sucks because they gimped the integer cores worse than on K10 (per clock).



Exactly.  Their Integer cores were weak on Bulldozer.  On top of this, the integer can in fact do all the work that the FPU does(just much slower).  The fact is the CPU could have no FPUs at all and still technically function.  It would just be slow as hell whenever work would have been accelerated by the FPU.



qubit said:


> No one is arguing that the removal/disabling of the FPU is a reason to sue a company.



Actually, I'm pretty sure that is exactly what most are arguing and what this entire lawsuit is based on.  The fact that the CPU only has 4 FPUs is the basis to claiming it is a 4 core CPU not an 8.



qubit said:


> It only applies if the company inflates the capabilities of the crippled chip, which Intel didn't do, but AMD did.



The entire claim is that AMD called the processor an 8 Core. They didn't inflate the capabilities of the chip.  It is capable of doing 8 things at the exact same time, something a 4 core CPU can never do(even with HypterThreading), so it is an 8 core CPU.



qubit said:


> AMD made parts of the core shared instead in a siamesed way, closer to a form of one core + hyperthreading than two actual cores and called them modules, yet still marketed them all as separate cores. Hence the claims of 8 core processors when they weren't really.



Yeah, but that is the slope we allowed them to slide down.  Intel started sharing resources between cores, so AMD started doing it.  Each did it to varying degrees.  Intel did it with L2 and L3, AMD took it a step further and did it with FPUs. Did they go too far? That is up to the court.  But since an x86 CPU can function completely without any FPU, technically, then calling a processing core that doesn't have its own FPU a x86 core is legitimate, IMO.  But that is left up to the court at this point, so we'll just have to wait and see.


----------



## Aquinus (Jan 2, 2016)

newtekie1 said:


> The entire claim is that AMD called the processor an 8 Core. They didn't inflate the capabilities of the chip. It is capable of doing 8 things at the exact same time, something a 4 core CPU can never do(even with HypterThreading), so it is an 8 core CPU.


Not just that but, I'm willing to bet that given purely parallel workloads, that the CPU will scale almost linearly to the number of threads being utilized. Even doing floating point calculations, there are a lot of integer calculations that come before and after it to handle things like memory addressing and stepping through sets of data. There really is no such thing as "pure floating point workloads," because there are control structures and memory operations that require integers and utilize the ALU.

All in all, I think we can say that bulldozer sucked because of the length of the pipeline and it's reduced ability to execute certain uOps in parallel. The pipeline introduces stalls and increases the amount of work when a stall occurs. Not being able to process as many uOps per cycle could very well mean that certain X86 instructions might require more clock cycles to complete on BD than on K10 or on an Intel CPU. None of which has to do with the FPU.

I won't deny that the CPU's FP performance is lesser than having 8 dedicated FPUs but, the question is would the CPU suck less if it did but, I don't think that's the case. There are a lot of things wrong with the architecture and the shared FPU isn't even among the biggest issues in my opinion. AMD went with slimmed down cores in order to add more of them which was a fatal mistake.


----------



## FordGT90Concept (Jan 2, 2016)

de.das.dude said:


> this is just stupid :/
> have an 8 core processor and its definitely better at multi threaded stuff than others. Plus it cost a fraction of what intel has to offer.


Benchmarks disagree.  Quad-core Intel processors handily beat "8-core" AMD at virtually everything multithreaded.  You put an 8-core Intel up against AMD's "8-core," AMD  gets slaughtered.  You don't have an "8-core" processor, you have a quad-core processor accepting 8 threads.  Hell, you can even put an older AMD 6-core up against these AMD "8-core" processors and the 6-core will give the "8-core" a severe lashing.



newtekie1 said:


> I don't think there is really anything technically stopping an integer core on a bulldozer CPU from being being disabled.  Each integer core has its own scheduler, so technically you just have to disable that scheduler and the integer core will be effectively disabled as well.


"Think."  AMD would have done it if they could.



newtekie1 said:


> On top of this, the integer can in fact do all the work that the FPU does(just much slower).


You're talking about software emulation.  That's a moot argument because all x86 processors on the planet can emulate floating points through software.



newtekie1 said:


> Actually, I'm pretty sure that is exactly what most are arguing and what this entire lawsuit is based on.  The fact that the CPU only has 4 FPUs is the basis to claiming it is a 4 core CPU not an 8.


The lawsuit argues that there are not 8 discreet cores, only 4.  The fact there is four FPUs is a technicality.  The judge has to ask "did AMD mislead" by branding their FX-8### processors at "8-core" processors.  They never say "integer cores" on the packaging; they leave out that keyword "integer." I don't see how a judge could rule in AMD's favor.  AMD definitely did mislead the public.



newtekie1 said:


> It is capable of doing 8 things at the exact same time, something a 4 core CPU can never do(even with HypterThreading), so it is an 8 core CPU.


Hyperthreading does enable a quad core processor to do 8 integer operations at the same time if the conditions are met.  This is how Intel quad-cores can best AMD's "8-core" processors at their own game.



Aquinus said:


> AMD went with slimmed down cores in order to add more of them which was a fatal mistake.


Which plays into the narrative that AMD intended to mislead the public.


----------



## Aquinus (Jan 2, 2016)

FordGT90Concept said:


> Benchmarks disagree. Quad-core Intel processors handily beat "8-core" AMD at virtually everything multithreaded. You put an 8-core Intel up against AMD's "8-core," AMD gets slaughtered. You don't have an "8-core" processor, you have a quad-core processor accepting 8 threads. Hell, you can even put an older AMD 6-core up against these AMD "8-core" processors and the 6-core will give the "8-core" a severe lashing.


Once again, that's a result of single-threaded performance being garbage. It's not how much it outperforms Intel, it's how much improvement each added core contributes to multi-threaded performance. If scaling is near linear, I would argue that shared components aren't contributing to bad performance. Read #243. This also has arguably very little to do with a shared FPU and you've yet to provide any information that counters my argument other than repeating yourself.


FordGT90Concept said:


> The lawsuit argues that there are not 8 discreet cores, only 4. The fact there is four FPUs is a technicality. The judge has to ask "did AMD mislead" by branding their FX-8### processors at "8-core" processors. They never say "integer cores" on the packaging. They leave out that keyword "integer." I don't see how a judge could rule in AMD's favor. AMD definitely did mislead the public.


You don't have to, an x86 CPU can't run without integer cores. Period. End of story. It's not how x86 CPUs work. I know you made the argument that you can do all of this with a FPU but then it wouldn't be an x86 CPU anymore now would it?


FordGT90Concept said:


> Hyperthreading does enable a quad core processor to do 8 integer operations at the same time if the conditions are met. This is how Intel quad-cores can best AMD's "8-core" processors at their own game.


*WRONG! *It enables Intel CPUs to use the unused resources in the CPU, you're still limited by the hardware within a single core which means that if an instruction is eating up the resources, hyper-threading gets you nothing. It uses the core's resources more efficiently, nothing more, nothing less. If the "conditions are right," then it's probably not already using those resources.


----------



## FordGT90Concept (Jan 2, 2016)

Everything said has been said before.  I'll leave this here:


----------



## newtekie1 (Jan 2, 2016)

FordGT90Concept said:


> "Think." AMD would have done it if they could.



Why do you say that? Intel hasn't done it and we know they can.  And AMD stopped doing that with the first runs of Phenom II.  They changed their strategy with odd numbers of cores in the late Phenom II days.  Even though there were 6 Cores processors with a single bad core, they disabled two full cores and sold them as x4 processors. When Thurban came out in 2010 AMD stopped the odd-core counts.



FordGT90Concept said:


> You're talking about software emulation. That's a moot argument because all x86 processors on the planet can emulate floating points through software.



No, it is not software emulation.  Way before CPUs had FPUs, the x86 processors did the work.  It is still hardware doing the work, it is not software emulation, it is just very slow at doing it.  That is the idea of a general CPU core.  It can do anything, it just isn't fast at doing it because it is built for general use.  The integer cores on Bulldozer are general CPU cores, that can do any type of work you ask it.



FordGT90Concept said:


> The lawsuit argues that there are not 8 discreet cores, only 4. The fact there is four FPUs is a technicality. The judge has to ask "did AMD mislead" by branding their FX-8### processors at "8-core" processors. They never say "integer cores" on the packaging; they leave out that keyword "integer." I don't see how a judge could rule in AMD's favor. AMD definitely did mislead the public.



The integer core is a general x86 core.  What they labeled as the integer core was considered an entire x86 processor at one point in time.



FordGT90Concept said:


> Hyperthreading does enable a quad core processor to do 8 integer operations at the same time if the conditions are met. This is how Intel quad-cores can best AMD's "8-core" processors at their own game.



No, they can't. They can only work on 4 integer operations at one time.  What hyperthreading does is set the processor up to switch extremely quickly between doing those operation to give the user the appearance of the integer core doing two things at the same time.  What HT does is have two operations waiting for the integer core, so when it finishes doing one operation it doesn't have to wait to execute the next.  So clock cycles that would normally be wasted are actually used.  The integer core can never work on two things at the exact same time.  If the integer core could actually work on two integer tasks at the same time, integer work on a desktop i7 would be basically doubled an i5. That never happens, the difference is never even very big, because it is just using idle cycles to do extra work, not actually doing two things at once.

And Intel beats AMD with less cores because Intel's processors are way more powerful than AMD's.  So even with 8-Cores AMD can't top Intel.  AMD's 6-Core Phenoms couldn't beat Intel's 4 cores either, Intel's cores are just a lot lot faster.


----------



## Aquinus (Jan 2, 2016)

That's because the dispatcher pulls it apart into two different pipelines between the integer cores whereas in Thabun the two integer units are parts *of the same pipeline*. So sure, for a moment it's one cohesive unit but, there comes a point where there was added hardware (the dispatcher, something K12 did not have,) pulls them apart to enable them to operate independently, hence why it's called a module. The question is, what does that module house? This is something that Intel's SMT does not do.  If you have conjoined twins at the hip with some shared organs, does that make it one child? I would argue that it doesn't.

My argument really boils down to two pipelines = two cores. 1 decode unit that can do 4 uOps or 2 decode units that can do 2 uOps each doesn't make a difference to me, it's doing the same thing.

Intel's HT is simply filling the gaps in the pipeline when the first thread isn't utilizing the entire thing in order to run a second thread, that's it. It's also why scaling is task dependent with HT however, scaling on FX CPUs, once again, tends to be almost linear which isn't indicative of SMT-like behavior on a single core.


----------



## FordGT90Concept (Jan 2, 2016)

newtekie1 said:


> What hyperthreading does is set the processor up to switch extremely quickly between doing those operation to give the user the appearance of the integer core doing two things at the same time.


I see four ALUs (two INT, two FLOP):




It should be able to do one basic ALU operation per clock, per thread.  More complex operations will cause one thread to be blocked so it would fall to one ALU operation per clock across two threads.

And I should stress on that image that the whole thing is a "core," not just the integer portion.  You can't have a discreet x86 "core" without an x86 decoder.

Note that SMT increases latency which is why the performance gains are not very good.


Edit:


Aquinus said:


> It's also why scaling is task dependent with HT however, scaling on FX CPUs, once again, tends to be almost linear which isn't indicative of SMT-like behavior on a single core.


http://www.overclock.net/t/1469255/fx-8350-trying-to-get-best-performance-per-core


			
				Ultracarpet said:
			
		

> Also I goofed around with disabling cores and such to see if there was a performance difference because apparently there was with bulldozer... there wasn't with piledriver. I got the exact same cinebench scores if I ran a 4 thread bench will all 8 cores enabled compared to a 4 thread bench with only 4 cores enabled.


Trying to find better benchmarks but I'm not coming up with them.


----------



## Aquinus (Jan 2, 2016)

FordGT90Concept said:


> It should be able to do one basic ALU operation per clock, per thread. More complex operations will cause one thread to be blocked.


That's not how hyper-threading works, hyper-threading utilizes unused parts of the pipeline to run that second thread, it doesn't do any parallel execution on a single stage. What it can do is execute multiple *of the same kind of uOP on the ALU at any given time* that is to say, if you have 3 of the same uOps *per instruction* in a row on data that's not dependent on the results, the CPU can execute them in parallel to some extent. You're conflating instruction-level parallelism (parallel uOps,) and thread-level parallelism (parallel instructions). Two very different things. Simply put, the only time uOps can be executed in parallel on the ALU is when they're the same uOp. You can't add and sub at the same time.

With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.

Side note, back in the day on older x86 processors, there were a lot fewer bells and whistles and the core of an x86 EU was the part you said isn't a core.


----------



## FordGT90Concept (Jan 2, 2016)

"Pipelines" diverge and converge.  Look at the diagrams to compare.  Core, Phenom II, and Bulldozer all start as one pipeline.


AMD says adding the extra "integer cluster" adds 12% to the die space.  Intel has said that adding Hyper-Threading Technology adds 5% to its die space.  The former begets more performance (in theory) because there's more dedicated transistors. How does 12% constitute a complete core when it is lacking the capability to prefetch and decode x86 instructions?  It is a component of the core (AMD calls "module") and not a core unto itself.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> "Pipelines" diverge and converge.  Look at the diagrams to compare.  Core, Phenom II, and Bulldozer all start as one pipeline.
> 
> 
> AMD says adding the extra "integer cluster" adds 12% to the die space.  Intel has said that adding Hyper-Threading Technology adds 5% to its die space.  The former begets more performance (in theory) because there's more dedicated transistors. How does 12% constitute a complete core when it is lacking the capability to prefetch and decode x86 instructions?  It is a component of the core (AMD calls "module") and not a core unto itself.


Control logic isn't technically part of the execute unit or processing core, you could have a unified decoder for the entire CPU but, that doesn't mean performance is going to be good which is why it's not typically done in multi-core setups. There are parts of the CPU required for operation that aren't technically part of the processing core or execution unit if you will. As a result, control logic is not a pre-requisite to call something a core or an EU.


----------



## FordGT90Concept (Jan 3, 2016)

When AMD debuted the FX-60, did they not have two prefetchers, two decoders, and two identical sets of execution units?  When Intel debuted the Pentium Extreme Edition 840, did it not have two identical sets of those basic components of a processor?  If the answer is yes to both of those questions, you see why Bulldozer represents a single core with a minor exception.

When substracting CPU resources such as shared caches, HyperTransport, DMI, and so on, the number of transistors scales linearly the more cores are added:
One Core = ~100%
Two Core = ~200%
Four Core = ~400%
Six Core = ~600%
Eight Core = ~800% and so on

This is the way cores are understood by the public and generally considered by the industry.

AMD on the other hand:
Bulldozer One "Core": 88%
Bulldozer One "Module" = 100% (marketed as "2-core")
Bulldozer Two "Core": 176%
Bulldozer Two "Module = 200% (marketed as "4-core")
Bulldozer Three "Core": 264%
Bulldozer Three "Module" = 300% (marketed as "6-core")
Bulldozer Four "Core" = 352%
Bulldozer Four "Module" = 400% (marketed as "8-core")

They're cooking the books to make their processors look more attractive on computer spec sheets ("Why buy an Intel quad-core when you can buy an AMD 8-core for substantially less? More is better, right?").  It doesn't make it true nor accurate.  Compare the two lists above.  How is that not misleading?  I'd argue it goes beyond misleading: it is false advertising.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> When AMD debuted the FX-60, did they not have two prefetchers, two decoders, and two identical sets of execution units?  When Intel debuted the Pentium Extreme Edition 840, did it not have two identical sets of those basic components of a processor?  If the answer is yes to both of those questions, you see why Bulldozer represents a single core with a minor exception.
> 
> When substracting shared hardware resources (such as various levels of cache) and technologies such as DMI and HyperTransport, the number of transistors scales linearly the more cores are added:
> One Core = ~100%
> ...


You make it sound like CPU architectures aren't allowed to change. Isn't that a little closed minded? Just because the trend in the past was to duplicate circuits when you needed more doesn't mean that's how it's going to work going forward. That's the exact reason why EEs make terrible software engineers.


----------



## FordGT90Concept (Jan 3, 2016)

The problem isn't AMD's architecture.  The problem is the word (which is already very well defined and understood as described in the last post) they used to account for its hardware resources.

AMD does not claim "8 integer execution units," "8 integer clusters," nor "8 integer cores" (all are true), they claim "8-core" (false).


----------



## Aquinus (Jan 3, 2016)

An execution unit is a core. Your problem is that you seem to have this obsession with CPU control logic being part of it. It is not. Control logic is merely translation to drive the core, nothing more, nothing less. A CPU has cores but a core is not a CPU. Just as control logic is part of the CPU, not the cores. You can circle diagrams all day long until your blue in the fingers but, it won't change reality. I went to school for this stuff, I eat, breathe, and dream about this stuff, and I can tell you that you're barking up the wrong tree.


----------



## FordGT90Concept (Jan 3, 2016)

CPU control logic is part of a discreet core (and every discreet core has it's own control logic).  Execution units are useless without it.  A defining feature of CPU cores is that they are complete--they house everything they need to take instructions and output results.

Your definition of a "core" applies more to GPUs than CPUs (NVIDIA calls them CUDA "cores" where AMD calls them stream processors).  Then again, CPUs are not highly parallel by nature (because logic) where GPUs are.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> CPU control logic is part of a discreet core (and every discreet core has it's own control logic).  Execution units are useless without it.  A defining feature of CPU cores is that they are complete--they house everything they need to take instructions and output results.
> 
> Your definition of a "core" applies more to GPUs than CPUs (NVIDIA calls them CUDA "cores" where AMD calls them stream processors).  Then again, CPUs are not highly parallel by nature (because logic) where GPUs are.


That's why you have a degree in computer science and work in the industry, right? Want to cite some sources there, big guy? No offense, but you're making stuff up at this point. A core doesn't need to control itself, that's the CPU's job. If it does, it does it as a feedback loop where the core provides data back to the control logic in order to react accordingly to things like changes to the status register in a core, right? Come on, man. I learned this in Hardware 101, this isn't even the hard shit.


----------



## FordGT90Concept (Jan 3, 2016)

Every damn x86 processor on the planet.*

* Except Bulldozer and derivatives.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> Every damn x86 processor on the planet.*
> 
> * Except Bulldozer and derivatives.


Good job citing sources, brah. You get a gold star. Nothing you've provided actually says anything about where the CPU itself ends and the core begins. Maybe you can explain your yourself instead of repeating yourself incessantly like a broken record.

Go to bed, Ford. Your drunk.


----------



## FordGT90Concept (Jan 3, 2016)

Like I need to cite sources for something that is EVERYWHERE on the internet.  Here's one example:
http://searchdatacenter.techtarget.com/definition/multi-core-processor


> A dual core set-up is somewhat comparable to having multiple, separate processors installed in the same computer, but because the *two processors* are actually plugged into the same socket, the connection between them is faster.


Not execution units.  Not integer cores.  Not integer clusters.  "Processors!"

How about another?
http://techterms.com/definition/multi-core


> Multi-core technology refers to CPUs that contain two or more processing cores. These cores operate as *separate processors* within a single chip.


"Processors!"

Or another?
https://www.techopedia.com/definition/5305/multicore


> This technology is most commonly used in multicore processors, where *two or more processor chips* or cores run concurrently as a single system.
> 
> The concept of multicore technology is mainly centered on the possibility of parallel computing, which can significantly boost computer speed and efficiency by including two or more central processing units (CPUs) in a single chip.


"Processors!" and "CPUs!"

Not enough yet?  Have another!
http://www.pcmag.com/encyclopedia/term/55926/multicore


> A computer chip that contains *two or more CPU processing units*.


...okay, that one is just worded badly but..."CPUs!"

Here's a scholarly paper and they have a block diagram showing dual core with separate fetch/decode for each discreet core on page 27:
https://www.cs.cmu.edu/~fp/courses/15213-s07/lectures/27-multicore.pdf
"Core" is consistently used to describe what could be considered a separate processor (having up to a private or shared L2).


You'll have to be satisfied with that because I'm not quoting more.

Generally speaking, each core starts with prefetch (or "front end") and the core ends just before the L2 cache if the L2 cache is shared or just below the L2 cache if the L2 cache is private.  This unit is effectively a stand-alone processor only missing some communication parts (e.g. RAM and chipset).


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> Like I need to cite sources for something that is EVERYWHERE on the internet.  Here's one example:
> http://searchdatacenter.techtarget.com/definition/multi-core-processor
> 
> Not execution units.  Not integer cores.  Not integer clusters.  "Processors!"
> ...


A processor is not a core. A processor contains cores. There is a big difference between a multi-core processor and a multi-processor system.

Once again, your sources say nothing about where control logic lives, which is the CPU not the core.

Von Neumann disagrees with you:


> 2.3 Second: The logical control of the device, that is the proper sequencing of its operations can be most efficiently carried out by a central control organ. *If the device is to be elastic, that is as nearly as possible all purpose, then a distinction must be made between the specific instructions given for and defining a particular problem, and the general control organs which see to it that these instructions—no matter what they are—are carried out.* The former must be stored in some way— in existing devices this is done as indicated in 1.2—the latter are represented by definite operating parts of the device. *By the central control we mean this latter function only, and the organs which perform it form the second specific part: CC.*


https://web.archive.org/web/20130314123032/http://qss.stanford.edu/~godfrey/vonNeumann/vnedvac.pdf

In other words, control logic is a separate entity than what carries out the operations themselves... but you know, I clearly don't know anything on the matter.


----------



## FordGT90Concept (Jan 3, 2016)

Von Neumann died in 1957.  Hardly relevant.

I'll play your game though: a dual-core processor has two "devices" by Von Neumann's definition.  There are two "central controls" and at least one "organ" under each "central control" (usually integer and floating point execution units).


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> Von Neumann died in 1957.  Hardly relevant.


Von Neumann is not relevant in CPU design? That's a joke, right? Most modern CPUs exist because of him. Death doesn't make theory irreverent, however your brain dead assertions seem to though.


----------



## FordGT90Concept (Jan 3, 2016)

See edit.


----------



## Aquinus (Jan 3, 2016)

Now you're just making shit up. Simply put, the paper says the two are distinct entities, not the same. They might be on the same die but, they're not the same thing.


----------



## FordGT90Concept (Jan 3, 2016)

I never said they weren't "distinct entities."  Control logic sits above execution units but the point your missing is that the two together comprise a core in multicore processors.  There may be an over arching control logic (especially for power saving) that encompasses all of the cores on a multi-core processor but that's also something common to modern x86 processors.  It's beyond the scope of this discussion because the debate is about what is or isn't a core and how many of them Bulldozer actually has.


I tried finding some more recent news on the class action suit and turned this up:
http://wccftech.com/amd-class-action-lawsuit-bulldozer-processor-core-count/


> AMD has just officially replied to the allegations and stated the following: “We believe our marketing accurately reflects the capabilities of the “Bulldozer” architecture which, when implemented in an 8 core AMD FX processor is capable of running 8 instructions concurrently.”


That's a pretty weak defense because Hyper-Threading Technology does the same.  Additionally, the threads do not run concurrently through prefetch (true of Bulldozer, Piledriver, Steamroller, & Excavator) and decoding (true of Bulldozer & Piledriver) so, as with all cases of SMT, that statement is only true _some of the time_.

https://pacermonitor.com/public/case/9674725/Dickey_v_Advanced_Micro_Devices,_Inc
It appears AMD motioned to dismiss the case.  The hearing is scheduled for 2/26/2016.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> http://wccftech.com/amd-class-action-lawsuit-bulldozer-processor-core-count/


Since when it WCCFTech reliable and since when does disagreeing with you make them wrong? 


FordGT90Concept said:


> That's a pretty weak defense because Hyper-Threading Technology does the same. Additionally, the threads do not run concurrently through prefetch (true of Bulldozer, Piledriver, Steamroller, & Excavator) and decoding (true of Bulldozer & Piledriver) so, as with all cases of SMT, that statement is only true _some of the time_.


I think someone hasn't been reading my posts because that isn't how hyper-threading works:


Aquinus said:


> That's not how hyper-threading works, hyper-threading utilizes unused parts of the pipeline to run that second thread, *it doesn't do any parallel execution on a single stage*. What it can do is execute multiple *of the same kind of uOP on the ALU at any given time* that is to say, if you have 3 of the same uOps *per instruction* in a row on data that's not dependent on the results, the CPU can execute them in parallel to some extent. You're conflating instruction-level parallelism (parallel uOps,) and thread-level parallelism (parallel instructions). Two very different things. Simply put, the only time uOps can be executed in parallel on the ALU is when they're the same uOp. You can't add and sub at the same time.
> 
> With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.


That's not parallel execution, that's fitting two parallel tasks in the same serial pipeline by filling the gaps, hence why improvements tend to be minor and dependent on the workload. In other words, when one thread isn't using particular resources, another will but, two threads can never use the same resources in a core at the same time. Once again, you seem to be intent on conflating instruction-level parallelism and thread-level parallelism. Saying the same false thing over and over again doesn't make you right. 

It's not a weak defense because FX CPUs do actually execute concurrently as opposed to simultaneously and I'm sure come the hearing AMD will bring in some engineers to explain exactly why that's the case. Hyper-threading does not because the dedicated hardware to do it simply isn't there.

Now, there are SMT systems that are a little more complex and do have extra dedicated hardware like some of the latest SPARC CPUs in order to run 8 threads per core but, it's a very different animal than Intel's HT or AMD's FX modules.

Simply put, back to the initial argument, control logic is part of the CPU, not the core. A core (or execution unit,) alone without the CPU doesn't mean diddly squat because there wouldn't be anything to drive it. There is absolutely no requirement that says that control logic has to be dedicated for every core. This is true for x86, this is true for GPUs, this is true for SPARC. It's true for just about every microprocessor in the world but, just because there are several cases where it is dedicated, you seem to think you can derive the truth from observation which is simply a joke.


----------



## FordGT90Concept (Jan 3, 2016)

Aquinus said:


> Since when it WCCFTech reliable and since when does disagreeing with you make them wrong?


It was a quote directly from AMD.



Aquinus said:


> That's not parallel execution...


AMD said "running 8 instructions concurrently" and that's exactly what HTT does too.  The instructions are prefetched and decoded while the execution units execute what they can when they can.

Simultaneous is synonymous with concurrent.

SMT is a grayscale, not black and white.  On the black end, you have technologies like HTT where there's very little extra transistors to make it work; on the white end (but not including) you have a second discreet processor.  I'd argue that Bulldozer is as close to white as currently exists while HTT is very close to black.  SPARC is in between the two.  SPARCs SMT design is actually very similar to HTT but where HTT assumes cache misses will be rare (thanks to huge caches), SPARC assumes they'll be common.  SPARC fills in the gaps from cache misses by working on other threads.


In your definition of "core" does it only include integers or does it also include the floating point units?  Additionally, what do you call the unit of hardware which encompasses prefetch, decode(rs), execution unit(s), and may or may not include L2 shared cache?  Now when you take that unit of hardware and place them together two form four discreet hardware units one die, what do you call them?  And I'm talking all x86; not just Bulldozer and derivatives.

I would answer as: CPU (or processor) contains one or more cores constituting of one prefetcher and one or more decoders and execution units.


----------



## RejZoR (Jan 3, 2016)

AMD says cores because 99% of people don't know what "integer" cores or "virtual" cores even means.

If number of cores is the issue, then why is no one making fuss about shaders on graphic cards? AMD has tons more of them compared to NVIDIA and yet no one makes any fuss about it. They aren't of same performance either. They just are. AMD can call it 2000 core CPU if they design it in such a way. In the end, in either case you need becnhmarks to assess performance. Because even quad core to quad core comparison NEVER yields the same results, especially not from different company.

So, why all this fuss?


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> AMD said "running 8 instructions concurrently" and that's exactly what HTT does too.


Did you not read my post? Intel HT does not execute concurrently, it executes serially by inserting workloads from concurrent processes into a single pipeline. *THAT IS NOT PARALLEL EXECUTION*.


Aquinus said:


> That's not how hyper-threading works, hyper-threading utilizes unused parts of the pipeline to run that second thread, *it doesn't do any parallel execution on a single stage*. What it can do is execute multiple *of the same kind of uOP on the ALU at any given time* that is to say, if you have 3 of the same uOps *per instruction* in a row on data that's not dependent on the results, the CPU can execute them in parallel to some extent. You're conflating instruction-level parallelism (parallel uOps,) and thread-level parallelism (parallel instructions). Two very different things. Simply put, the only time uOps can be executed in parallel on the ALU is when they're the same uOp. You can't add and sub at the same time.
> 
> With that said, most modern super-scalar CPUs already handle instruction-level parallelism internally when instructions are decoded.





Aquinus said:


> That's not parallel execution, that's fitting two parallel tasks in the same serial pipeline by filling the gaps, hence why improvements tend to be minor and dependent on the workload. In other words, when one thread isn't using particular resources, another will but, two threads can never use the same resources in a core at the same time. Once again, you seem to be intent on conflating instruction-level parallelism and thread-level parallelism. Saying the same false thing over and over again doesn't make you right.


----------



## FordGT90Concept (Jan 3, 2016)

RejZoR said:


> If number of cores is the issue, then why is no one making fuss about shaders on graphic cards? AMD has tons more of them compared to NVIDIA and yet no one makes any fuss about it. They aren't of same performance either. They just are. AMD can call it 2000 core CPU if they design it in such a way. In the end, in either case you need becnhmarks to assess performance.


Because the architectures are wildly different.  I tried to compare block diagrams of GCN and Maxwell a while ago to find analogues and I wasn't getting anywhere.  The parallelism of GPUs and the fact that GPUs serve as co-processors grants them a lot of flexibility CPUs can't afford.  Xeon Phi demonstrates this well.

Bulldozer underperformed Thuban (Phenom II) in most cases.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> Bulldozer underperformed Thuban (Phenom II) in most cases.


Which is a result of what again? The core?! Maybe you forgot one of my earlier posts that showed how AMD's FX cores are gimped compared to prior uArchs.


Aquinus said:


> Bullshit. There are a lot of instructions that not only execute in 1 second cycle, it can sometimes do several of the same instruction at once.
> 
> Before I grab part of this document, I will quote it:
> 
> ...


----------



## FordGT90Concept (Jan 3, 2016)

facepalm.jpg

Using that logic, Bulldozer should be almost twice as fast because there are two "cores" per core.  But no! You need a separate thread to access those!  Even when you compare 8 threads (making it a fairer comparison), Thuban is still competitive likely because the decoder got overwhelmed.  This is why they added a second decoder in Steamroller and Excavator.  Thuban can't keep pace with Steamroller and Excavator but it isn't clear if that is because of the process advantage or because the decoder really made that big of a difference.  Even so, it's moot because that's not what the lawsuit is about.  It is about the definition of a core.  I searched high and low for calling an "execution unit" a core and I'm not finding anything that isn't related directly to Bulldozer and derivatives.


Meh, fuck it.  You're never going to convince me and I'm never going to convince you.  The court will decide if the case should be heard 2/26/2016.


----------



## Aquinus (Jan 3, 2016)

FordGT90Concept said:


> Meh, fuck it. 2/26/2016.


Your right, we're arguing in circles. However, I would like to offer you an honorary degree in using the Google.


----------



## vega22 (Jan 3, 2016)

don't stop!

this thread has given me hours of entertainment 

and some insight tbh


----------



## RejZoR (Jan 3, 2016)

But again, what matters in the end is performance. AMD opted for such core design. Call them half cores or not true cores all you want, they are cores presented to the system and there are 8 of them. If they don't perform as expected, why the fuck are there 5 trillion review sites for then? Clueless people will get screwed (or shall we say they screw themselves) for not asking the right people or checking reviews. Technically speaking, if CPU had just 1 core and companies advertised it as such, no one would buy it, even if that single core literally raped all the multi-core CPU's in the market. Without looking at reviews, you can't possibly tell how well it performs. So, how different is going to the other extreme, 8 cores that supposedly aren't "real" cores?

Intel's HT really can't be called a core, because it can't be called so on any level even though I've seen really weird namings of i7 CPU's with HT on very popular German webpage Computer Universe. AMD can't just call it quad core with 6,5 threads. It would confuse the fuck out of users. So they opted for calling cores the way they are presented to the system.

Also, look at the task manager...




 

It's not exactly a tightly kept secret that required rocket scientists to figure it out. 1 processor, 4 cores, 8 logical units. Difference is, those are actually cores, even though different design than one used by Intel. HT on the other hand doesn't have any kind of core appearance. It's just a side logic that tricks OS into thinking it's another core and gives CPU ability to stack more computation on the same physical core. It's confusing to casual users, but I wouldn't call it cheating on the AMD's end...


----------



## Frick (Jan 3, 2016)

RejZoR said:


> Intel's HT really can't be called a core, because it can't be called so on any level even though I've seen really weird namings of i7 CPU's with HT on very popular German webpage Computer Universe. AMD can't just call it quad core with 6,5 threads. It would confuse the fuck out of users. So they opted for calling cores the way they are presented to the system.



Has nothing to do with the topic, but stores sold the first generation i3/i5/i7 CPU's as CPU's with three, five and seven cores.


----------



## RejZoR (Jan 3, 2016)

It has to do with the topic. Because what people consider as 4 core 8 thread Intel CPU cannot be applied to AMD CPU's. If it says 8 cores, it actually has that many cores. If they are really as effective as Intel's cores number vs number, that's debatable. And that's why reviews exist. In the end, it doesn't matter if number of cores is the same or how effective they are per core or in multi-core arrangement. You have to see benchmarks in either case.


----------



## FordGT90Concept (Jan 3, 2016)

Microsoft would call them cores if they fit the definition of a core.


----------



## Aquinus (Jan 3, 2016)

L2 is part of the core, huh Ford? I'm pretty sure that Core 2 duos, having a shared L2, still were individual cores. Might want to work on that diagram a bit instead of posting it incessantly. Just like control logic is part of the core too, huh? Lets stick with facts and less home-made bullshit.


----------



## RejZoR (Jan 3, 2016)

The reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.


----------



## FordGT90Concept (Jan 3, 2016)

Aquinus said:


> L2 is part of the core, huh Ford? I'm pretty sure that Core 2 duos, having a shared L2, still were individual cores. Might want to work on that diagram a bit instead of posting it incessantly. Just like control logic is part of the core too, huh? Lets stick with facts and less home-made bullshit.


A core doesn't share any resources with another core.  If an L2 cache is shared between two or more cores, none of the cores can claim it as theirs.

In the case of Bulldozer, the L2 cache is shared between the FPU and the two integer clusters.  It is not shared with another core so, as the diagram shows, it is correct.  One bulldozer core (containing two integer clusters) includes the L2 cache.


In the case of Core 2 Duo, the L2 cache is shared between two cores so the L2 cache is not part of either core. The two discreet cores (purple background) packaged together with the L2 cache is a module (green square):





Core 2 Quad was created by combining two dual-core modules producing a multi-chip module (MCM) quad-core CPU:








RejZoR said:


> The reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.


L3 was added because of the massive performance drop between L2 and RAM.  Some processors are getting an L4 cache because of the massive performance drop between L3 and RAM.


----------



## Aquinus (Jan 3, 2016)

RejZoR said:


> The reason why they prefer to use split dedicated caches is to avoid cache trashing. L3 is so far ahead it's almost like a RAM so it's not important anymore.


Yessir. The hit rates on CPU cache nowadays are nutty high, north of 85-90% in a lot of cases, which probably explains why faster memory doesn't do a whole lot of good.


FordGT90Concept said:


> Some processors are getting an L4 cache because of the massive performance drop between L3 and RAM.


You mean the eDRAM cache? That's strictly for the iGPU if I recall correctly because the only chips that sport it are ones with Iris Pro.


----------



## ThE_MaD_ShOt (Jan 3, 2016)

I think that depends on what version of windows you are using. Under win 7 fx8's show up as 8 cpus and win 10 they show up as 4 cpus with 8 threads. I think this was done to help with the performance of Amd processors, but not totally sure on that.


----------



## Aquinus (Jan 4, 2016)

ThE_MaD_ShOt said:


> I think that depends on what version of windows you are using. Under win 7 fx8's show up as 8 cpus and win 10 they show up as 4 cpus with 8 threads. I think this was done to help with the performance of Amd processors, but not totally sure on that.


There is a minor performance hit when using the second core in the module. Probably as @FordGT90Concept described as how the decoder was getting overwhelmed which is why they added a second one in Steamroller.


----------



## FordGT90Concept (Jan 4, 2016)

Aquinus said:


> You mean the eDRAM cache? That's strictly for the iGPU if I recall correctly because the only chips that sport it are ones with Iris Pro.


The eDRAM can be used by Iris Pro and the CPU:
http://www.anandtech.com/show/6993/intel-iris-pro-5200-graphics-review-core-i74950hq-tested/3


			
				AnandTech said:
			
		

> Unlike previous eDRAM implementations in game consoles, Crystalwell is true 4th level cache in the memory hierarchy. It acts as a victim buffer to the L3 cache, meaning anything evicted from L3 cache immediately goes into the L4 cache. Both CPU and GPU requests are cached. The cache can dynamically allocate its partitioning between CPU and GPU use. If you don’t use the GPU at all (e.g. discrete GPU installed), Crystalwell will still work on caching CPU requests. That’s right, Haswell CPUs equipped with Crystalwell effectively have a 128MB L4 cache.


It does not act as a frame buffer for the Iris Pro.  Intel hinted at a separate, 16-32 MiB ESRAM could be used exclusively for Iris Pro's frame buffer in the future.  Skylake-H will likely be getting the same Crystalwell L4 cache as Broadwell.  We could see the same Crytalwell cache spring up on even more chips in the future (Kaby Lake, maybe even Cannonlake).




Aquinus said:


> There is a minor performance hit when using the second core in the module. Probably as @FordGT90Concept described as how the decoder was getting overwhelmed which is why they added a second one in Steamroller.


Even in Excavator, the prefetch and FPUs are still shared.  There's going to be a performance hit from them too.  A legitimate dual-core doesn't share those things as demonstrated by the Core 2 Duo and Phenom II block diagrams.


I did some more digging on Core 2 Duo and it appears that neither core can be disabled.  Conroe-L (single-core) appears to be a different chip altogether.  This makes Core 2 Duo a true module because it has _two of everything_ except L2 and control which makes them inseparable.  Bulldozer is not a module because it doesn't have two of everything--it has one of some things.  This is why FX-8350 should be considered a quad-core.  What was previously understood as a module (complete but inseparable cores) is absent (needs two prefetchers at minimum).


----------



## RejZoR (Jan 4, 2016)

I was hoping Skylake would get L4 by default (current i7 6700k for example), but after I've seen it's basically just a smaller i7 5000 series, I just didn't bother and opted for more cores instead on 5820K.


----------



## cdawall (Jan 4, 2016)

Scaling would show that an FX 8 core has more than 4 cores. Math would say it is physically impossible to say differently.


----------



## eidairaman1 (Jan 5, 2016)

FordGT90Concept said:


> That was Windows XP and XP only has two states: uniprocessor (one thread at a time) and multiprocessor (two or more threads at a time).  Multiprocessor could mean two physical sockets with one core each, one socket with two cores, or one physical + one logic processor.  It was updated to better handle the three variations.
> 
> Bulldozer did the same thing with Vista.  Vista (I believe 7 too) called it eight-cores because it was incapable of distinguishing them but that apparently caused problems because updates were released to fix core parking issues.  Come Windows 8 and newer, Microsoft updated the operating system to definitively account for sockets, cores, and logic processors which is where we see 4 cores and 8 logic processors.
> 
> ...


Still on 7 myself. 
Yes I got all updates plus core unparker tool. The FX8350 does more than I could imagine.


----------



## MalakiLab (Sep 25, 2016)

FordGT90Concept said:


> In the case of Core 2 Duo, the L2 cache is shared between two cores so the L2 cache is not part of either core. The two discreet cores (purple background) packaged together with the L2 cache is a module (green square):



Let me show you the Intel Silvermont, C2000, eight cores architecture.
All new Atoms have modules, with 2 cores in it, sharing the same L2 cache. Are they liars too? 


 


Your graphic you made is also completely wrong. It shows how you don't understand OoO, PRF, branch prediction, resource monitoring. http://www.anandtech.com/show/3922/intels-sandy-bridge-architecture-exposed/3

In short, you don't understand how their microarchitecture work. 95% of the time, the module will work just the same as 2 cores, because both can share the resource in SAME TIME. In most circumstances, it will use both Integer core and each one will have a 128-bit FMAC with 128-bit Integer execution. So they can simultaneously execute most of the instructions independently without having to wait for it's turn like for hyperthreading. Totally different microarchitecture. When things begin degrading itself is when both floating point pipelines have to get together for a single integer core, to execute a single 256-bit AVX instruction, or two symmetrical SSE instructions. Then the entire FPU is taken and leave no resources to the other integer core. In theory the  dispatch controller should give the integer core some instructions not needing any FPU interaction, by going to see in the instructions fetch buffer, and being able to keep it busy while the other complete it's cycles needing all the FPU. On paper it looks awesome, but it's a very very complex operation, sadly not bringing much success. Luckily, those instructions are not very often used. Still, it's a major problem AMD tried to improve in Piledriver, Steamroller and finally Excavator. It was their way to deal with new instructions too, and stay in competition.

It's a good technology, but a little too audacious for today's market. Instead of focusing on having better IPC, they mostly developed way to better dispatch the instructions. That's why they decided to come back to more traditional microarchitectures and be more competitive IPC-wise. It doesn't change the fact a module behave like 2 cores and are in fact 2 cores in a single module. Even Intel agree to that and are using modules for their Atoms. Maybe we should drag them in court too, no?


----------



## cdawall (Sep 25, 2016)

8 months


----------



## FordGT90Concept (Sep 25, 2016)

MalakiLab said:


> Let me show you the Intel Silvermont, C2000, eight cores architecture.
> All new Atoms have modules, with 2 cores in it, sharing the same L2 cache. Are they liars too?


That's an octo-core so Intel is not lying.  The compute cores aren't broken up at all--nothing is shared except L2 cache. 

A "core" only requires data + instruction cache.  Additional caches are added for boosting performance (decreasing the gaps in latency between core and system RAM).




up to 32k = L1
up to 256k = L2
up to 4M = L3
up to 64M = L4 eDRAM in 4950HQ, system RAM otherwise.

As I specified above, if a quad-core processor has 4 L2 caches, then those L2 caches are part of the core because it is not a shared resource.  If the resource is shared (as is the case with Silvermont) then the resource doesn't belong to a core--it's part of the CPU package (like L3, QPI, HyperTransport, memory controller, etc. usually are).




MalakiLab said:


> Then the entire FPU is taken and leave no resources to the other integer core.


This blocking situation is never encountered on Silvermont nor Core 2 Duo.  If a blocking situation is possible, I'd argue (and have argued) the whole of it is a multithreaded core, not multi-core.

A core can take an instruction and execute the whole of it without sharing any parts with any other processor.  Bulldozer and sons, when executing a floating point unit task, do not fit that definition.  Silvermont will happily execute eight 256-bit AVX instructions simultaneously across all cores, unlike an FX-8350.  It'll do that with ANY instruction because none of the execution hardware is shared.


----------



## Aquinus (Sep 25, 2016)

FordGT90Concept said:


> A core can take an instruction and execute the whole of it without sharing any parts with any other processor. Bulldozer and sons, when executing a floating point unit task, do not fit that definition. Silvermont will happily execute eight 256-bit AVX instructions simultaneously across all cores, unlike an FX-8350. It'll do that with ANY instruction because none of the execution hardware is shared.


...but the FPU isn't what did Bulldozer in, it was the reduction in the number of uOps per clock that could be accomplished by either the FPU or the integer cores. Fewer uOps per cycle means that if the bandwidth resources aren't available, full instructions could take more clock cycles to complete which could further harm performance by essentially stalling the pipeline due to these limited resources on each integer core. The net result is relatively garbage performance.

If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.


----------



## BiggieShady (Sep 25, 2016)

Aquinus said:


> If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.


Additionally, let's not forget how late AMD is introducing uOp cache with Zen now, almost 6 years after Intel's Sandy Bridge ... I don't know how much, but absence of uOp cache in bulldozer should also contribute for lesser total net uOps/cycle


----------



## FordGT90Concept (Sep 25, 2016)

Aquinus said:


> ...but the FPU isn't what did Bulldozer in, it was the reduction in the number of uOps per clock that could be accomplished by either the FPU or the integer cores. Fewer uOps per cycle means that if the bandwidth resources aren't available, full instructions could take more clock cycles to complete which could further harm performance by essentially stalling the pipeline due to these limited resources on each integer core. The net result is relatively garbage performance.
> 
> If you look at Intel, all they've been doing is beefing up their cores when it comes to uOp bandwidth.


That's irrelevant.  What is relevant is that if the FX-8350 had 8 FPUs (one to go with each integer core like a traditional core), it's multithreaded FPU performance would be better because there would no longer be any chance for blocking.  The lawsuit is about AMD calling it an "8 core" processor when it is an "8 _integer_ core" processor.  AMD does not make that distinction on the box or in marketing material.  It has mislead the public selling 4 multithreaded cores as 8.  It would be akin to Intel calling the i7-6700 an "8 core" processor.  It doesn't matter that AMD shored up the symmetrical multithreading in Bulldozer and sons with extra hardware for a performance boost.  It's still a quad-core when you throw heavy FPU loads at it and they sold it as an eight-core.


----------



## Frick (Sep 25, 2016)

FordGT90Concept said:


> That's irrelevant.  What is relevant is that if the FX-8350 had 8 FPUs (one to go with each integer core like a traditional core), it's multithreaded FPU performance would be better because there would no longer be any chance for blocking.  The lawsuit is about AMD calling it an "8 core" processor when it is an "8 _integer_ core" processor.  AMD does not make that distinction on the box or in marketing material.  It has mislead the public selling 4 multithreaded cores as 8.  It would be akin to Intel calling the i7-6700 an "8 core" processor.  It doesn't matter that AMD shored up the symmetrical multithreading in Bulldozer and sons with extra hardware for a performance boost.  It's still a quad-core when you throw heavy FPU loads at it and they sold it as an eight-core.



Is there a universal definition of an x86 core though? They could have handled it better, but I wouldn't say they were lying.


----------



## FordGT90Concept (Sep 25, 2016)

AMD pretty much established it with Athlon 64 X2 and Intel followed suit with Pentium D: two processors, one die.  The _only_ anomaly is Bulldozer and sons.

The only other modern exception which I believe @Aquinus pointed out earlier was SPARC processors for databases.  In that case, the FPU is a practically a separate core (8:1 ratio) unto itself because databases usually don't have to deal with floating-point operations.  If the cores encountered floating-point work, they'd farm it out to the floating-point core and wait for a response.


----------



## newtekie1 (Sep 25, 2016)

Frick said:


> Is there a universal definition of an x86 core though? They could have handled it better, but I wouldn't say they were lying.



Simply, if it can execute all the instructions in the x86, or in this case x86_64 instruction set, then it is an x86_64 core.  You don't need an FPU to execute any of the instruction in the basic x86_64 instruction set, it just helps performance greatly for some of them.



FordGT90Concept said:


> Intel followed suit with Pentium D: two processors, one die.



Yeah, the Pentium D wasn't two processor on one die...oh and the Core 2 Quad wasn't 4 processors on 1 die either.


----------



## Aquinus (Sep 25, 2016)

I think the when push comes to push, the core count isn't really what people are pissed off about. This is all about the lackluster performance of these CPUs and I think that this is just a facade for that. No one ever said 8 cores had to be fast.


----------



## FordGT90Concept (Sep 26, 2016)

newtekie1 said:


> Simply, if it can execute all the instructions in the x86, or in this case x86_64 instruction set, then it is an x86_64 core.  You don't need an FPU to execute any of the instruction in the basic x86_64 instruction set, it just helps performance greatly for some of them.


Indeed, FPU instructions are generally under x87 which stems from the 8087 co-processor for the 8086.  Thing is, x87 has been a standard feature for about two decades now.  AMD tried to depreciate it to force developers to use the GPU for FPU tasks.  It failed.



newtekie1 said:


> Yeah, the Pentium D wasn't two processor on one die...oh and the Core 2 Quad wasn't 4 processors on 1 die either.


Die meaning CPU socket.  Yes, they were MCM'd but that's a technical detail that doesn't matter in terms of core count.  Pentium D was sold as a dual core and it had two cores in two modules.  Core 2 Duo was sold as a dual core and it had two cores in one module.  Core 2 Quad was sold as a quad core and it has four cores in two modules.  The cores (L1 instruction to L1 data) did not share any components in any of those processors.



Aquinus said:


> I think the when push comes to push, the core count isn't really what people are pissed off about. This is all about the lackluster performance of these CPUs and I think that this is just a facade for that. No one ever said 8 cores had to be fast.


It does matter.  When you go to Best Buy and a guy comes up to you and says this AMD has 8 cores for $200 and this Intel has 4 cores for $300 bucks.  Most consumers will go with AMD not realizing that AMD only has 4 _complete_ cores.  AMD deliberately mislead the public to get more sales.  The people that filed this lawsuit, in hindsight, know they should have gone with Intel's quad-core for $100 more.


----------



## Aquinus (Sep 26, 2016)

FordGT90Concept said:


> It does matter. When you go to Best Buy and a guy comes up to you and says this AMD has 8 cores for $200 and this Intel has 4 cores for $300 bucks. Most consumers will go with AMD not realizing that AMD only has 4 _complete_ cores. AMD deliberately mislead the public to get more sales. The people that filed this lawsuit, in hindsight, know they should have gone with Intel's quad-core for $100 more.


That's a hollow argument and you know it. If someone is going into Best Buy to buy a PC, they're probably not going to need needing the most powerful CPU on the market. Most people want a cheap computer that works. We've also had this discussion before and we're not going to agree and that's that Bulldozer is more than an 8c CPU than it was like a 4c because it can execute that many instructions in parallel (if you merely exclude the FPU for a minute,) and notice that parallel integer workloads scale almost linearly which screams real cores. Only with FPU workloads does it scale more like hyper-threading would.

As I said before, no one ever said 8 CPU cores needed to be fast but, they're are more than enough components to call it a 8c CPU. Does it have shared resources that can hinder performance, sure but, even in those cases, workloads still scale linearly unlike with just about every form of SMT.


----------



## FordGT90Concept (Sep 26, 2016)

FX-8350 = 4c/8t
I will not exclude FPU (or any other instructions) because cores must be complete.

Silvermont is not fast even compared to an FX-8350 but it has 8 cores (yes, 8 simultaneous AVX instructions) unlike FX-8350.


----------



## Aquinus (Sep 26, 2016)

FordGT90Concept said:


> FX-8350 = 4c/8t
> I will not exclude FPU (or any other instructions) because cores must be complete.
> 
> Silvermont is not fast even compared to an FX-8350 but it has 8 cores (yes, 8 simultaneous AVX instructions) unlike FX-8350.


The funny thing is though that a computer can run just fine without even having an FPU paired with X86 integer cores. Show me a single CPU that has x87 and can run independently from an x86 core and I would be willing to entertain your position but, the reality is that whether the FPU is on the same die or not isn't the point because, it's still essentially still a co-processor. Floating point math was an afterthought, not the driving force for x86 as most things computers do are integer operations.


----------



## Melvis (Sep 26, 2016)

How did I miss this thread? lol 

Im guessing they lost the court battle?


----------



## FordGT90Concept (Sep 26, 2016)

Aquinus said:


> The funny thing is though that a computer can run just fine without even having an FPU paired with X86 integer cores. Show me a single CPU that has x87 and can run independently from an x86 core and I would be willing to entertain your position but, the reality is that whether the FPU is on the same die or not isn't the point because, it's still essentially still a co-processor. Floating point math was an afterthought, not the driving force for x86 as most things computers do are integer operations.


It's been an integral part of processors for the last two decades so no, it isn't a co-processor.  It was an afterthought two decades ago (or rather, too complicated to implement in hardware until process tech matured making it affordable to integrate) but it has been inseparable since.  Hell, even the internet browser you're looking at now would slow to a crawl if it had to do the CSS work on ALU.  Simply visualizing the text characters you're looking at uses floating point math.

FPU can only be reasonably separated in specialized processors (like database or network routers).  In general processors (which x86 epitomizes), it should not be separated.  The fact they're putting it back in Zen proves how stupid that idea was.




Melvis said:


> Im guessing they lost the court battle?


Last action was a time extension on August 23:
"ORDER GRANTING JOINT STIPULATION TO FURTHER EXTEND TIME FOR PARTIES TO ENGAGE IN ADR PROCESS"

TL;DR: they're trying to settle.  It will only go to court if they fail to settle.


----------



## MalakiLab (Sep 26, 2016)

FordGT90Concept said:


> FX-8350 = 4c/8t
> I will not exclude FPU (or any other instructions) because cores must be complete.
> 
> Silvermont is not fast even compared to an FX-8350 but it has 8 cores (yes, 8 simultaneous AVX instructions) unlike FX-8350.



TOTALLY wrong. Silvermont don't have AVX or any other 128-bits instructions. Even the latest, the Goldmont, don't have any AVX, FMA or advanced instructions else than SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2.

Goldmont FPU is a LOT weaker than Piledriver based architecture. By a HUGE margin. Unless they decide to assemble two of their FPUs together and share, they will NEVER achieve the AVX and FMA instructions with a OoO and the TDP. The AMD A10 5750M is totally blasting out any Intel Atom in every possible benchmark.

FX-8350 = 8c/8t, until it needs to achieve an AVX or FMA, that's where the processor need to take two threads to achieve it. You simply don't understand the architecture, at all.

As you are probably under Windows, name me one software compiled to use AVX instructions. Even now, so little games are compiled to take advantage of AVX.


----------



## FordGT90Concept (Sep 26, 2016)

MalakiLab said:


> TOTALLY wrong. Silvermont don't have AVX or any other 128-bits instructions. Even the latest, the Goldmont, don't have any AVX, FMA or advanced instructions else than SSE, SSE2, SSE3, SSSE3, SSE4, SSE4.1, SSE4.2.


Right you are but it still has native support to handle floats in each core where Bulldozer does not.



MalakiLab said:


> Goldmont FPU is a LOT weaker than Piledriver based architecture.


Such is the nature of 20 watt and less design.  It is weaker across the board.  Jaguar would be a better comparison and AMD's low power designs never had a Bulldozer-like design...likely because it could only power down a core (as in a module) at a time.



MalakiLab said:


> As you are probably under Windows, name me one software compiled to use AVX instructions. Even now, so little games are compiled to take advantage of AVX.


Prime95


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> Prime95



I am curious now what program that a standard user, would use, uses AVX? Quite honestly prime could use whatever it wants and it wont affect anyone.


----------



## FordGT90Concept (Sep 26, 2016)

That's the case with all CPU instructions.  It's only a problem if it tries to use it and it isn't supported (e.g. some software compiled prior to 2000 can't run on new CPUs because of obsolete instructions).  It's not something developers tend to brandish about.


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> That's the case with all CPU instructions.  It's only a problem if it tries to use it and it isn't supported.  It's not something developers tend to brandish about.



It is supported, it functions fully as well. If we are going to complain about the performance being lower well AMD x87 performance has trailed Intel for decades. Remember there are also instruction sets AMD performs better in as well as certain applications they do better in than intel. Heavily multithreaded applications using non-biased software works reasonably well on AMD to the point where their servers shares were popping up pretty quickly.


----------



## newtekie1 (Sep 26, 2016)

FordGT90Concept said:


> Indeed, FPU instructions are generally under x87 which stems from the 8087 co-processor for the 8086. Thing is, x87 has been a standard feature for about two decades now. AMD tried to depreciate it to force developers to use the GPU for FPU tasks. It failed.



That can all be true. But the absolute minimum for a x86_64 core is that it can execute all the instructions in the x86_64 instruction set, and each module in an FX series processor has two cores capable of doing that.

A non-shared L2 was standard for a very long time too. But that changed. AMD tried to change to a shared FPU. It hurt performance, but in the strictest technical term, it did not make each module 1c/2t. Each module can in fact work on two instructions at the exact same time. That is not possible with 1c/2t.



FordGT90Concept said:


> Die meaning CPU socket. Yes, they were MCM'd but that's a technical detail that doesn't matter in terms of core count. Pentium D was sold as a dual core and it had two cores in two modules. Core 2 Duo was sold as a dual core and it had two cores in one module. Core 2 Quad was sold as a quad core and it has four cores in two modules. The cores (L1 instruction to L1 data) did not share any components in any of those processors.



Um, no. Die != package or socket.


----------



## FordGT90Concept (Sep 26, 2016)

cdawall said:


> It is supported, it functions fully as well. If we are going to complain about the performance being lower well AMD x87 performance has trailed Intel for decades. Remember there are also instruction sets AMD performs better in as well as certain applications they do better in than intel. Heavily multithreaded applications using non-biased software works reasonably well on AMD to the point where their servers shares were popping up pretty quickly.


It's not about performance, it is about labeling their processors "8-core" when it only has 4 cores accepting two threads each.  This is exemplified by the transistor count (falls in line with a quad core, not an octo core), the limitations of power gating (can only bring FX-8350 down to 2 integer cores, it cannot go to one like a real core can), and by the fact that AMD has never sold a single-core Bulldozer (Core 2 Duos, with one core disabled, was sold as a Celeron, for example).  Each 2 "integer core" + 1 FPU represents a complete core by all modern definitions of the word "core."  The "integer cores" do not.




newtekie1 said:


> A non-shared L2 was standard for a very long time too. But that changed. AMD tried to change to a shared FPU. It hurt performance, but in the strictest technical term, it did not make each module 1c/2t. Each module can in fact work on two instructions at the exact same time. That is not possible with 1c/2t.


As I said before, L2 cache sharing doesn't make or break a core.  Case in point, all cores on every socket share a memory controller for the system RAM.  Because of that, do we call everything under that a core?  Nope.  It could happen that two cores share an L2 as a module, then two modules share an L3.  That doesn't change what a core is.  Separating the FPU does because that's what? 50-60% of a CPU core in terms of transistors?  When you look at it from that perspective, AMD has cheated consumers out of tens of millions of transistors.


Even if AMD made it possible to power gate an integer core, I'd still argue it isn't a core because there still isn't a discreet FPU for each.  If it had two, or more FPUs and the cores could use 1 to all of them (like Core 2 Duo L2), then yeah, I could take your argument. That is not the case though: they're Siamese twins where disabling one would kill the other.  The whole, therefore, is a core, not the parts.  This feature is unique with Bulldozer and sons.  Even SPARC can do fine without the FPU core.


----------



## MalakiLab (Sep 26, 2016)

FordGT90Concept said:


> Right you are but it still has native support to handle floats in each core where Bulldozer does not.


For SSE, SSE2, SSE3, SSSE3, SSE4/4.1/4.2/4a, F16C, it does. For FMA3/4 if you use the m32, m64 and m128 operands, it won't lock up the FPU. Same for AVX/XOP, if you use 32-bit_ or _64-bit mode, you'll be just fine. Though, as soon as they use the VEX coding scheme to put more operands than the reserved half of the FPU for one core, it will amputate the other side reserved for the other core and use both for the set amount of cycles. Meanwhile, the other core can do some instructions not needing any FPU, at the rare occasion there's no such instructions in the buffer, the second core cycles will go to waste. There's a major difference between saying it can't support floats in each core independently and simultaneously and not being able to do some part of some instructions alone. They simply could have rejected the implementation of such instructions, like Intel did for the Atom, and wouldn't have any argument left to imply your bias. Instead, they tried to innovate, combine two FPUs in one, and include the possibility to be used together and achieve the AVX instructions. It doesn't change the fact it will behave like two core for every tasks you'll ask, but just not Prime95. If it's enough to say a module is not two cores? Of course not. 




FordGT90Concept said:


> Such is the nature of 20 watt and less design. It is weaker across the board. AMD Jaguar would be a better comparison.


Well, AMD A6 and A4 could have been also a good comparison.


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> It's not about performance, it is about labeling their processors "8-core" when it only has 4 cores accepting two threads each.



Good argument until you get into the fact, mind you fact, that each "core" can independently process an x86/x86-64 instruction, two per module, unlike a single core dual thread arrangement.

You can also disable the cores individually, unlike an additional thread. Each of those cores is also a physical presence with a clockgen controlling it and can be clocked independently of the core in the same module next to it.

Mind finding me a "thread" that can do that?


----------



## FordGT90Concept (Sep 26, 2016)

MalakiLab said:


> If it's enough to say a module is not two cores? Of course not.


Up until Bulldozer, every "core" was effectively a complete "processor."  It could be powered individually and it operates independently (barring memory but that's always been an issue with multi-processor logic).  FX-8350 only has four objects that meet that definition, not eight.  Athlon 64 X6?  Six.  Core 2 Duo?  Two.  Core i7-920? Four.  Core 2 Quad? Four.



cdawall said:


> You can also disable the cores individually, unlike an additional thread.


Source?  The operating system can stop sending it work but I mean power off as in cold...no current.


----------



## newtekie1 (Sep 26, 2016)

You are taking things that do not define what a x86_64 is and trying to argue that they do. Sorry, I have to move on. I will say this one more time and that is it. The definition of an x86_64 core does not require an FPU, it doesn't require that each core can be disabled independently of the others. The power gating system does not define a core. The ONLY thing that defines an x86_64 core is the ability to process x86_64 instructions. Period.


----------



## FordGT90Concept (Sep 26, 2016)

The box doesn't say "x86_64 core" it says "core"






You're making a technical distinction that simply doesn't exist to the public nor does AMD make that distinction clear on their packaging.


----------



## newtekie1 (Sep 26, 2016)

In an x86_64 processor, that is exactly what core means. The FX series of processors are x86_64 processors. So the word core means x86_64 core.

I mean, are you expecting them to make it clear these aren't ARM cores? They don't say that on the box, so why shouldn't we expect them be IA-64 cores? Is that the logic you really are going with?


----------



## FordGT90Concept (Sep 26, 2016)

newtekie1 said:


> In an x86_64 processor, that is exactly what core means. The FX series of processors are x86_64 processors. So the word core means x86_64 core.


It is not.  Even in AMD's technical slides, they called it "integer core" not just "core."  Here's the best definition for "core" I could scrounge up:
http://www.webopedia.com/TERM/M/multi_core_technology.html


> In consumer technologies, multi-core is usually the term used to describe *two or more CPUs working together on the same chip*. Also called multicore technology, it is a type of architecture where a *single physical processor contains the core logic of  two or more processors*. These processors are packaged into a single integrated circuit (IC). These single integrated circuits are called a die. Multi-core can also refer to multiple dies packaged together. Multi-core enables the system to perform more tasks with a greater overall system performance.  Multi-core technology can be used in desktops, mobile PCs, servers and workstations. Contrast with dual-core, a single chip containing two separate processors (execution cores) in the same IC.


Bulldozer does not represent the norm (complete processors), it represents an exception (Siamesed processors).  AMD can't take a well understood word and redefine it to mean something else.  If they had put "8-Integer Core" on the box, this lawsuit wouldn't have happened.


----------



## Toothless (Sep 26, 2016)

Sooo does this effect the 63xx series because i want my $30.


----------



## FordGT90Concept (Sep 26, 2016)

If the class action suit moves forward, probably but FX 6xxx will likely get less than FX 8xxx.  How much depends on how much it settles for and how many products were sold that are eligible.


----------



## MalakiLab (Sep 26, 2016)

FordGT90Concept said:


> Up until Bulldozer, every "core" was effectively a complete "processor." It could be powered individually and it operates independently (barring memory but that's always been an issue with multi-processor logic). FX-8350 only has four objects that meet that definition, not eight. Athlon 64 X6? Six. Core 2 Duo? Two. Core i7-920? Four. Core 2 Quad? Four.



Where you got that? Go check your sources, because we can disable one core per module. It's done since the Bulldozer exist. Some do it because you can gain 5% performance boost on single core, which can be explained by the dispatching overhead. Just look it up, Gigabyte and Asrock motherboards can disable one core per module. And it's very real, the processor TDP and current are halves. For most people it's irrelevant as the second cores per module make you gain 50% more power compared to the 5% more power per core when you disable cores 2/4/6/8.

You also have frequency scaling. I can scale each core clock speed independently. 



 



 

I will even leave the linfo so you see the machine running in real time. If those were only one core per module without being independent, i wouldn't be able to change their clock frequency individually for all 6 cores.

http://malakilab.org/linfo/


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> It is not.  Even in AMD's technical slides, they called it "integer core" not just "core."  Here's the best definition for "core" I could scrounge up:
> http://www.webopedia.com/TERM/M/multi_core_technology.html
> 
> Bulldozer does not represent the norm (complete processors), it represents an exception (Siamesed processors).  AMD can't take a well understood word and redefine it to mean something else.  If they had put "8-Integer Core" on the box, this lawsuit wouldn't have happened.



There is no definition of a CPU core. That is were you will fail to make any argument. AMD could sell shit in a box and call it a core and if it is by their own definition a core they are not incorrect.


----------



## MalakiLab (Sep 26, 2016)

FordGT90Concept said:


> It is not.  Even in AMD's technical slides, they called it "integer core" not just "core."  Here's the best definition for "core" I could scrounge up:
> http://www.webopedia.com/TERM/M/multi_core_technology.html
> 
> Bulldozer does not represent the norm (complete processors), it represents an exception (Siamesed processors).  AMD can't take a well understood word and redefine it to mean something else.  If they had put "8-Integer Core" on the box, this lawsuit wouldn't have happened.



You don't understand what is defined here. multicore is a processor with multiple cores. A module is not a processor. You're mixing the terms and twisting them to suit your rhetoric. Hell you don't even understand what is a die. If you take for example the Opteron 6272. It's one processor, it have 2 dies, each die have 4 modules, each module have 2 cores. A multicore processor is a processor having more than 2 cores. Even if what you claim were right, and one module would be one core, it would still be 4 cores in a FX-8xxx, and 2 cores, in a FX-4xxx, and would match the definition you given of a multicore processor.


----------



## FordGT90Concept (Sep 26, 2016)

MalakiLab said:


> View attachment 79270


Why does it say "cpu cores : 3"



cdawall said:


> There is no definition of a CPU core. That is were you will fail to make any argument. AMD could sell shit in a box and call it a core and if it is by their own definition a core they are not incorrect.


Seagate got sued and lost because Seagate's definition of "GB" (10^9) varied from Microsoft's definition of "GB" (2^30).


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> Seagate got sued and lost because Seagate's definition of "GB" (10^9) varied from Microsoft's definition of "GB" (2^30).



All windows application up to windows 10 have labeled it per core not per module/thread.


----------



## FordGT90Concept (Sep 26, 2016)

Modules are purely a hardware construct.  It is invisible to software.

Windows 8.1 and newer see "sockets," "cores," and "logical processors."  Is Windows wrong?


VulkanBros said:


>



That's why I asked.  Linux may be in agreement with Windows that FX-6350 is a tri-core.


----------



## cdawall (Sep 26, 2016)

FordGT90Concept said:


> Modules are purely a hardware construct.  It is invisible to software.
> 
> Windows 8.1 and newer see "sockets," "cores," and "logical processors."  Is Windows wrong?
> 
> ...



Knoppix lists the 8320 and 4350 as 8 and 4 core units. They are physically there, so how are they only logical cores?

and I apoligize windows 8.1 and 10 are the only ones that list them as logical, which is merely a function of the task scheduler.


----------



## Roph (Sep 26, 2016)

I think you're wasting your breath. I remember back when this thread was started he was talking about the iGPU found on the FX chips.

News to me 

It's a frivolous lawsuit.

The GTX970 comparison isn't comparable; nvidia outright lied about things such as ROP/TMU count. The 3.5 + 0.5 memory is up in the air.

AMD sold you 8 cores, and you got 8 cores you can assign work to. When I encode 8 music tracks at once on my "4 core" FX-8320, I get double the encoding performance as I do encoding 4 tracks at once. Shocker.


----------



## FordGT90Concept (Sep 26, 2016)

cdawall said:


> Knoppix lists the 8320 and 4350 as 8 and 4 core units. They are physically there, so how are they only logical cores?


How many cores do you see?




I count four. Top left, top right, bottom left, bottom right.

Here's an actual 8-core Xeon 7500 series:


----------



## cdawall (Sep 26, 2016)

Doesn't matter? It has 8 calculation units that can work independently until the need the FPU.


----------



## newtekie1 (Sep 26, 2016)

FordGT90Concept said:


> Windows 8.1 and newer see "sockets," "cores," and "logical processors." Is Windows wrong?



Windows can't even agree with itself.  A lot of the bulldozer APUs get listed as 4c/4t in TaskManager.



FordGT90Concept said:


> They aren't physically there. Die shot of Opteron 6386 SE. Where are the 16 cores? I can only see eight.



Probably because that die only has 8 cores.  The 6386, and all 16-core 6300 series opterons are two of those dies side by side.

Like this:





You can read more about where this die shot comes from here: http://www.theregister.co.uk/2012/11/05/amd_opteron_6300_server_chip/

So, yeah, that die you say you can see 8 cores in.  It is an 8-core bulldozer die, not a 16.  So you just admitted that it is 8-Cores, not 4.  Ooooooops.


----------



## FordGT90Concept (Sep 26, 2016)

I edited because I realized the mistake (the picture actually comes from earlier in this thread).  Here's the two side by side, equivalent hardware circled:





I see 8 in your picture, not 16.

As to my mistake, I do see 8 components, two in each white box but there's no clear line separating them; hence, I see four cores.

Here's the post: https://www.techpowerup.com/forums/...count-on-bulldozer.217327/page-3#post-3367260


----------



## cdawall (Sep 26, 2016)

Wait I have one how many cores does this have?


----------



## FordGT90Concept (Sep 26, 2016)

I'm thinking quad-core (center to the right, L2 above them).  Most of that die is GPU.  How many GPU cores? I'm not going to guess because it's such a mess.


----------



## newtekie1 (Sep 26, 2016)

FordGT90Concept said:


> Here's the two side by side, equivalent hardware circled:



Actually, did you notice how in the areas you circled, there are an almost identical mirror of hardware in certain places?  You can not tell me you didn't notice these two isanely identical areas inside the areas you circled.


----------



## MalakiLab (Sep 26, 2016)

FordGT90Concept said:


> Why does it say "cpu cores : 3"
> 
> 
> Seagate got sued and lost because Seagate's definition of "GB" (10^9) varied from Microsoft's definition of "GB" (2^30).



No modules are seen by the kernel, in fact, it's why Windows needed a patch also. If you knew anything about computer engineering and how addresses works, you would understand. It's the same thing for Atoms modules, they are seen at first as one virtual core containing two physical cores, or for Intel it can be 2 threads. The addressing is working like that, to send an instruction you need the address of the processor, which is the physical ID, then the module is seen as a core, and the core is seen a thread. That's how they adapted modules in Linux and Windows, by adapting the threading to the modules. It's getting technical from there, and you would have to know how a processor actually work, and how modules work. Like i said, the Intel Atoms have the same exact way to address the cores, by using the modules to dispatch to the cores, just like if it were threads. Just communicate with a Windows programmer and he will tell you there is no "agreement" of Windows seeing the module as a core. For addresses, it function the exact same way as threading. There's 4 addresses passed to direct the instructions to the core/thread you want. The format is Addr1->Addr2->Addr3. First is the logical processor, which is the socket, then there's the virtual processor, which can be a module or the core, then you have the physical processor, which can be a physical thread or a core. It's also why a core in a module, it can be AMD, Intel, SPARC, ARM, MIPS, it can't have a module with cores with hardware threads, because we would have to rethink the addressing entirely to include the 4th address, hence why Intel processors with cores in modules, don't have hyperthreading. That's how they made modules work, because the kernel don't make a distinction between a core or a thread, and is dispatching the load to the first address, with the second address and the third address. And yes, the kernel is software, and it's the thing controlling the hardware. Instead of ending up in a hardware thread, the instruction end up in a hardware core, the logical core (hyperthreading) is dispatching threads the EXACT same way a module is dispatching to the cores, but instead of ending up in a hardware thread, it ends up in a physical core. 

So YES, modules are totally a hardware construct for the kernel, but so are also the cores. But no they are not invisible for the kernel, it's just the same thing for it, address1/2/3, that's it. The kernel is not asking the processor : "Hey, are you a core or a module?". 

You remind me of the retards when hyperthreading got out, saying : "look, look, i have 8 processors", and we responded by : "no it's not processors, it's cores, and 4 of them go to the same core, but Windows see them as cores". And arguing eternally because they don't know anything of what they are talking about.

It's what we are trying to explain to you, it's working the EXACT same way as hyperthreading, but at the end, instead of having 8 "logical cores", alternating between two entries on the same physical core, it's going to a real physical core. Instead of using one thread to fill holes in the physical core by dispatching two threads in same time, it is a module dispatching two threads between two physical cores in same time in the module. It behave like hyperthreading, but it is not hyperthreading.

If you want it explained by someone being in engineering :


----------



## FordGT90Concept (Sep 26, 2016)

newtekie1 said:


> Actually, did you notice how in the areas you circled, there are an almost identical mirror of hardware in certain places?  You can not tell me you didn't notice these two isanely identical areas inside the areas you circled.
> View attachment 79302


It's not that cut and dried.  Here's major parts I found that were mirrored.  As I said before, Siamese twins.




Edit: And now I see another problem.  The line on the second from the top big box needs to be shifted to the left a little bit.



MalakiLab said:


> No modules are seen by the kernel, in fact, it's why Windows needed a patch also. If you knew anything about computer engineering and how addresses works, you would understand. It's the same thing for Atoms modules, they are seen at first as one virtual core containing two physical cores, or for Intel it can be 2 threads. The addressing is working like that, to send an instruction you need the address of the processor, which is the physical ID, then the module is seen as a core, and the core is seen a thread. That's how they adapted modules in Linux and Windows, by adapting the threading to the modules. It's getting technical from there, and you would have to know how a processor actually work, and how modules work. Like i said, the Intel Atoms have the same exact way to address the cores, by using the modules to dispatch to the cores, just like if it were threads. Just communicate with a Windows programmer and he will tell you there is no "agreement" of Windows seeing the module as a core. For addresses, it function the exact same way as threading. There's 4 addresses passed to direct the instructions to the core/thread you want. The format is Addr1->Addr2->Addr3. First is the logical processor, which is the socket, then there's the virtual processor, which can be a module or the core, then you have the physical processor, which can be a physical thread or a core. It's also why a core in a module, it can be AMD, Intel, SPARC, ARM, MIPS, it can't have a module with cores with hardware threads, because we would have to rethink the addressing entirely to include the 4th address, hence why Intel processors with cores in modules, don't have hyperthreading. That's how they made modules work, because the kernel don't make a distinction between a core or a thread, and is dispatching the load to the first address, with the second address and the third address. And yes, the kernel is software, and it's the thing controlling the hardware. Instead of ending up in a hardware thread, the instruction end up in a hardware core, the logical core (hyperthreading) is dispatching threads the EXACT same way a module is dispatching to the cores, but instead of ending up in a hardware thread, it ends up in a physical core.
> 
> So YES, modules are totally a hardware construct for the kernel, but so are also the cores. But no they are not invisible for the kernel, it's just the same thing for it, address1/2/3, that's it. The kernel is not asking the processor : "Hey, are you a core or a module?".


Got links for that?



MalakiLab said:


> It's what we are trying to explain to you, it's working the EXACT same way as hyperthreading, but at the end, instead of having 8 "logical cores", alternating between two entries on the same physical core, it's going to a *real physical core "integer cluster."* Instead of using one thread to fill holes in the physical core by dispatching two threads in same time, it is a module dispatching two threads between two *physical cores "integer clusters"* in same time in the module. It behave like hyperthreading, but it is not hyperthreading.


FTFY

I called it "hybridized simultaneous multithreading."  It's not Hyper-Threading Technology but it also isn't a dual core.  It's something in between.  At bare minimum, they should have called it "8-core*" with fine print saying it has 8 integer clusters and explaining what that means in laymen terms.



MalakiLab said:


> If you want it explained by someone being in engineering :


Nothing new in there.


----------



## Aquinus (Sep 27, 2016)

Yet, somehow we completely ignore the part where integer ALUs are a fundamental building block of any microprocessor whereas the FPU has always been an add-on.

Tell me, by that definition does the Intel 8080 or 8088 not contain a CPU core because it lacks a FPU?


----------



## MalakiLab (Sep 27, 2016)

FordGT90Concept said:


> It's not that cut and dried.  Here's major parts I found that were mirrored.  As I said before, Siamese twins.
> 
> 
> 
> ...



What you do for a living? I am a software engineer. My speciality is multicore, multiprocessor. Don't try to correct me, it is two physical cores. If you don't compile with extreme vectorization, a module behave like 2 cores. It's 2 physical cores, with assigned it's own designated FMAC and 128-bit Integer. You can't pretend like you know anything about instructions and how a processor behave, you have no idea. The pictures you are posting are ridiculous, it's normal for a die not to be perfectly symmetric.

You come here and you pretend you know more than people who studied way more than you did. That's what i do for a living, i study how programs compile on multicore processors, how the cores are behaving, how the threads are behaving, how the kernel handling the the threads, how GCC is handling the threads, then how the compiled program handle the threads. Then i optimize the sources of the package and usually make recommendations to the kernel and GCC dev teams on how the CFLAGS are behaving on precise architecture and microarchitecture.

I can tell that,
CFLAGS="-march=bdver2 -mmmx -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -maes -mpclmul -mpopcnt -mabm -mlwp -mfma -mfma4 -mxop -mbmi -mtbm -mavx -msse4.2 -msse4.1 -mlzcnt -mf16c -mprfchw -mfxsr -mxsave --param l1-cache-size=16 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=bdver2 -fstack-protector-strong -O2 -pipe"
will do a good job at handling all cores and producing quality binaries.

But in fact,
CFLAGS="-march=bdver2 -mmmx -msse -msse2 -msse3 -mssse3 -msse4a -mcx16 -msahf -maes -mpclmul -mpopcnt -mabm -mlwp -mfma -mfma4 -mxop -mbmi -mtbm -mavx -msse4.2 -msse4.1 -mlzcnt -mf16c -mprfchw -mfxsr -mxsave --param l1-cache-size=16 --param l1-cache-line-size=64 --param l2-cache-size=2048 -mtune=bdver2 -fstack-protector-strong -O3 -pipe" is too much vectorization for the -mavx, and the FP Scheduler can't follow and the FPU becomes a burden too heavy for the advantage of CPU cycles you get by using it to vectorize instead of of SSE. I am programming in assembly, i invoke the instructions myself and see how the processor behave if i begin threading it to the max, what is the breaking point.

You don't even know how the scheduler works. Even AMD engineers took so much time understanding how to handle such a complex task. They went even nearer with the Excavator architecture, but it would take too much time to get it to perfection, meanwhile Intel is improving it's IPC to the max, and the they can't compete. People want more power.

What they learnt with the Bulldozer is not wasted. Because they are introducing some ARM co-processor in their processor, which will have to be shared.

Just go check https://en.wikipedia.org/wiki/ARM_big.LITTLE

The big.LITTLE architecture is recognized as Octocore. Even if the Cortex-A7 is shared with an A15 in a virtual core module, and in that configuration, they share the same VFPv4 Floating Point Unit, and the Cortex-A7 is mainly just used for low power THUMB-2 instructions. It is not uncommon practice, and no engineer ever will tell you those two cores inside the module are just one. Not a single one.





If you are interested in seeing a real operating system taking full advantage of the AMD cores, i began doing some videos of distributed compilation, on which i study afterwards the details of how the processor performed. 25 years of experience in engineering.










For me the debate finish here. If you are really interested in knowing how much you are wrong, go study x86 instruction set and more precisely x87 subset and how they are implemented in modern processors. When you will be ready in a few years, we will have the same conversation.


----------



## FordGT90Concept (Sep 27, 2016)

Aquinus said:


> Yet, somehow we completely ignore the part where integer ALUs are a fundimental building block of any microprocessor whereas the FPU has always been an add-on.
> 
> Tell me, by that definition does the Intel 8080 or 8088 not contain a CPU core because it lacks a FPU?


"Cores" didn't exist in 1980s.  You're taking a word defined circa 2006 and retroactively applying it.  When the word was defined, it effectively meant two processors and that's what it stayed through today.


----------



## cdawall (Sep 27, 2016)

FordGT90Concept said:


> "Cores" didn't exist in 1980s.  You're taking a word defined circa 2006 and retroactively applying it.  When the word was defined, it effectively meant two processors and that's what it stayed through today.



The word still has no definition.


----------



## Aquinus (Sep 27, 2016)

FordGT90Concept said:


> "Cores" didn't exist in 1980s.  You're taking a word defined circa 2006 and retroactively applying it.  When the word was defined, it effectively meant two processors and that's what it stayed through today.


It was the most simple definition of a compute core. They didn't have multi-core CPUs but, that was a CPU with a single core capable of serial computation. Even modern day microcontrollers sometimes lack a FPU because it may not be necessary for the task. CPUs complex enough where they need to execute more than one instruction in parallel are bound to have a FPU because it's already complex enough to be a general processor, not merely a micro-controller but, that doesn't change anything. The simple fact is that a computer does not need an FPU to operate but, it needs ALUs. Sure, it's harder to solve certain problems without a FPU but such difficulties doesn't make it not a core.

The 8088 was also small enough where a system could be wired out to support more than one 8080 on the same system. Software merely didn't support it.


----------



## FordGT90Concept (Sep 27, 2016)

MalakiLab said:


> What you do for a living? I am a software engineer. My speciality is multicore, multiprocessor. Don't try to correct me, it is two physical cores. If you don't compile with extreme vectorization, a module behave like 2 cores. It's 2 physical cores, with assigned it's own designated FMAC and 128-bit Integer. You can't pretend like you know anything about instructions and how a processor behave, you have no idea. The pictures you are posting are ridiculous, it's normal for a die not to be perfectly symmetric.
> 
> You come here and you pretend you know more than people who studied way more than you did. That's what i do for a living, i study how programs compile on multicore processors, how the cores are behaving, how the threads are behaving, how the kernel handling the the threads, how GCC is handling the threads, then how the compiled program handle the threads. Then i optimize the sources of the package and usually make recommendations to the kernel and GCC dev teams on how the CFLAGS are behaving on precise architecture and microarchitecture.
> 
> ...


Nice lack of links.



MalakiLab said:


> What they learnt with the Bulldozer is not wasted. Because they are introducing some ARM co-processor in their processor, which will have to be shared.
> 
> Just go check https://en.wikipedia.org/wiki/ARM_big.LITTLE
> 
> ...


I'm not finding any big.LITTLE chip that separated the FPU from its respective cores.  If what you say is true about the FPU sharing, that's effectively turning one of the cores into a co-processor.  That doesn't change the fact it has 8 distinct cores that are capable of functioning independently of each other.

Edit: Also, die shot:




I see 8 distinct processors.



cdawall said:


> The word still has no definition.


If this goes to court, it will be _legally_ defined.



Aquinus said:


> It was the most simple definition of a compute core. They didn't have multi-core CPUs but, that was a CPU with a single core capable of serial computation. Even modern day microcontrollers sometimes lack a FPU because it may not be necessary for the task. CPUs complex enough where they need to execute more than one instruction in parallel are bound to have a FPU because it's already complex enough to be a general processor, not merely a micro-controller but, that doesn't change anything. The simple fact is that a computer does not need an FPU to operate but, it needs ALUs. Sure, it's harder to solve certain problems without a FPU but such difficulties doesn't make it not a core.
> 
> The 8088 was also small enough where a system could be wired out to support more than one 8080 on the same system. Software merely didn't support it.


Been over all that already.  FPUs have been intrinsic to x86 design since the mid 1990s.  A core has been established as a complete processor (as in take an uniprocessor, take away the memory controller, slap it on a die with another uniprocessor, add back in the memory controller, and a bus to communicate with the rest of the system) since circa 2006.  You're using excuses from lifetimes ago in technology to justify what AMD did.  AMD should know better that if you're going to redefine things, you better make it damn clear on the box.  AMD is going to lose this and they're going to lose hard.  The case against AMD is stronger than the case against Seagate which Seagate lost.


----------



## cdawall (Sep 27, 2016)

FordGT90Concept said:


> If this goes to court, it will be _legally_ defined.



Good on them, because up until this point there isn't nor has there ever been a definition of a core.


----------



## MalakiLab (Sep 27, 2016)

FordGT90Concept said:


> Nice lack of links.
> 
> What they learnt with the Bulldozer is not wasted. Because they are introducing some ARM co-processor in their processor, which will have to be shared.
> 
> ...



The Cell Broadband you will CLEARLY find it. 1 PPE core, 8 SPE cores.

https://en.wikibooks.org/wiki/Microprocessor_Design/Multi-Core_Systems

You can't tell me IBM is wrong. The PPE don't have any FPU, it transfers them to the external SPEs, who just do Floating-Point calculations external to the PPE. Yet the PPE and the SPEs are called cores. Because that's what they are, cores. Even if you can't do nothing alone with an SPE, it's still called a core, seen as cores, behave like cores. The processor is considered as 9 cores.

You can't even dare say you know better than IBM what constitute or don't constitute a core.

Or you can take Nvidia too. They claim they have 2560 Cuda cores in their GTX 1080. When ALL they have in a core is an FPU, and there's 50 Cores in a module, with only a scheduler, a dispatcher, and all it can do is floating-point operations in parallel.









No one, no company can define with precision what consist of a core. If AMD decide it's 2 Piledriver cores fused together with one shared FPU, it is. Even if they had no FPU at all. It is still 2 cores in one module.


----------



## FordGT90Concept (Sep 27, 2016)

PPE supports VMX (page 20), aka AltiVec, which is cable of single-precision floats.  A primitive FPU but an FPU none the less. Only the PPE fits the definition of a "core."  SPEs could be considered co-processors in the sense that 8087 was a co-processor.

I'd argue GPUs don't have cores because they can't function without a CPU to give them orders.  They are always incomplete.  They're more like SPEs (co-processors) than PPEs (processors).  That said, AMD, Intel, NVIDIA, ARM, etc. play fast and loose with what they call most of the components of their GPUs.  One can't reasonably draw a box around a broad component and call that something universal.  Case in point: compare the above to GCN:




Not much agreement on what to call anything.  That said, NVIDIA choose to mimic GCN with Pascal to improve Pascal's virtual reality performance.

The nature of the thread has dictated CPU layout.  GPU layout has changed drastically over the years.


----------



## Aquinus (Sep 27, 2016)

FordGT90Concept said:


> Been over all that already. FPUs have been intrinsic to x86 design since the mid 1990s. A core has been established as a complete processor (as in take an uniprocessor, take away the memory controller, slap it on a die with another uniprocessor, add back in the memory controller, and a bus to communicate with the rest of the system) since circa 2006. You're using excuses from lifetimes ago in technology to justify what AMD did. AMD should know better that if you're going to redefine things, you better make it damn clear on the box. AMD is going to lose this and they're going to lose hard. The case against AMD is stronger than the case against Seagate which Seagate lost.


Last time I checked, x87 and other floating point instructions are still extensions to X86. Archaic or not, it doesn't change the fact that any machine using X86 can not operate without integer cores whereas it can be done without FPUs, it's just not done anymore because it doesn't make sense given *general purpose workloads* but, that doesn't change the fact about how the CPU works and what's required for it to be able to operate. x87 is still a co-processor, the only difference now is that it's on the same die as the integer ALUs and the rest of the CPU logic but, that doesn't mean that it's required to call it a compute core. I've used microcontrollers that have absolutely no ability to do floating point math, but they still have a compute core that can access memory, execute instructions, and do everything a CPU would do *except floating point math*.

So, you can make the same case until your blue in the face but, the simple fact is that X86 is nothing without integer cores and can operate without an FPU just fine, just because all general purposes CPUs now come with FPUs is beside the point. Do AMD and nVidia GPUs only have one core because they only have one UVD (or similar,) core for video playback? Every GPU nowadays has one, so why not? It's not mandatory for the operating of a GPU but most people get a lot of use out of video acceleration. The FPU is no different. It makes your life easier but, it is by no way required for the operation of X86. Whether it's connected by and external bus or on the same die, it's still a co-processor.

Simply put, there is no such thing as an x86 CPU capable of only doing floating point math while lacking integer ALUs and compute. That alone should tell anyone that the FPU is not a dictator for what should be considered a compute core. Once again x87, SSE, MMX, and all the other floating point math extensions are exactly that, extensions.


----------



## FordGT90Concept (Sep 27, 2016)

Aquinus said:


> ...that doesn't mean that it's required to call it a compute core.


It's the industry standard and has been for decades.



Aquinus said:


> I've used microcontrollers that have absolutely no ability to do floating point math, but they still have a compute core that can access memory, execute instructions, and do everything a CPU would do *except floating point math*.


We're talking about general processors here, not microcontrollers.


The only reason why x86 doesn't include the x87 standard is because they both go back to distinct processors decades ago.  Most architectures designed after the creation of the IEEE754 standard have FPU features standard.  One example of this is IA-64.  The _only_ reason x87 remains an extension is because of backwards compatibility.  In practice, it is not separate.  Every processor built on it in the last decade, almost two, have an FPU--even Bulldozer.


This is all very irrelevant anyway because a core is a core, not an integer cluster.  AMD, at best, is going to settle which means they don't admit guilt to misleading the public.  At worse, it will go to court, AMD will lose, and they'll likely have to pay out hundreds of millions or billions for making consumers think they got twice what they got.


----------



## cdawall (Sep 27, 2016)

FordGT90Concept said:


> This is all very irrelevant anyway because a core is a core, not an integer cluster. AMD, at best, is going to settle which means they don't admit guilt to misleading the public. At worse, it will go to court, AMD will lose, and they'll likely have to pay out hundreds of millions or billions for making consumers think they got twice what they got.



You keep saying AMD will loose. How? they are not incorrect there is not a standard for what a core is and the second core of the module sure as hell isn't a thread so does that mean AMD can countersue Microsoft for misleading the public on what a thread is? Where does this nonsense end. It can act independently, it can do all of the work a core can minus FPM which has never been a written requirement of a core.


----------



## FordGT90Concept (Sep 27, 2016)

The public believes core fits the Athlon X2 and Intel model which is discreet processors in one socket.  Bulldozer's "cores" are not discreet.  That's all the judge has to look at and decide.  It's not unlike how NVIDIA sold the GTX 970 with "4 GiB of VRAM" but didn't notify the public that the last 0.5 GiB of that underperforms the rest by a huge margin.  Excluding important information like your "cores" sharing FPUs or your memory gimps itself are doors for the public to seek damages.

Seagate did not counter sue Microsoft for not correctly labeling hard drive capacity (using math for GiB but showing a GB label).  AMD could certainly try to sue Microsoft but where Seagate had a strong case against Microsoft (and still does), AMD really doesn't against Microsoft.  What AMD wants to call a "core," no one else does.  Microsoft would have to make an exception for Bulldozer and how can Microsoft adequately explain to the public what is weird about Bulldozer in two words?  They can't.   AMD really brought this on itself by not making it clear to the public the product is different and it will have to pay the price for it.

There is no "minus" for a core.  It either is a complete processor or it isn't.


----------



## cdawall (Sep 27, 2016)

It is a complete processor, each core inside of a module can function without each other and independently. They are physically present therefore they are not "logical cores" they are "physical cores"


----------



## FordGT90Concept (Sep 27, 2016)

There's a lot of hardware there that indicates it isn't two physical processors:





Pretty much everything is shared except the integer clusters.  We're talking about 20% of a CPU that isn't shared.  20% a processor does not make.  One core: two integer clusters and two threads.


----------



## Aquinus (Sep 27, 2016)

FordGT90Concept said:


> We're talking about 20% of a CPU that isn't shared. 20% a processor does not make.


That "20% addition" gives you a full core worth of performance in most cases. The only time that changes is when you're exclusively using the FPU on both "integer clusters" at the same time which is an unrealistic use case. Once again, AMD gimped overall performance per clock, but, if you consider how applications scale to pure parallel workloads, it's pretty close to linear speed up for every thread added which feels a lot like a real core. Most forms of SMT don't have those kinds of performance characteristics, hyper-threading certainly doesn't.


FordGT90Concept said:


> There is no "minus" for a core. It either is a complete processor or it isn't.


This line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.


----------



## FordGT90Concept (Sep 27, 2016)

Let's look at it from a different perspective: failure.  If an instruction fetcher fails in a Bulldozer chip, you lose two integer clusters and one FPU.  If an instruction decoder fails in a quad-core Deneb or Zen, you lose one FPU and one integer cluster.  Is it really separate when a single point of failure (a component that is shared) can disable both?


----------



## FR@NK (Sep 28, 2016)

AMD can claim there are two cores per module but thread scaling seems to disagree:

This is a FX-8320 piledriver 8c/8t.






Now compare an Ivy bridge 4c/8t.






It looks like based on what i've seen, each module was designed as one core with extra hardware for multithreading under most workloads. Somewhere along the way they decided to market it as each module was two cores. I think this was a mistake; bulldozer would have looked like a much faster chip if the FX-8150 was marketed as a 4c/8t chip. Instead we got an 8c/8t chip that marginally hurt the performance when 2 threads were scheduled to the same module compared to spreading them out between modules before doubling them up.


----------



## FordGT90Concept (Sep 28, 2016)

It does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core.  Absolutely nothing suggests it is except AMD's marketing material.

I take it Dhrystone sees HTT and limits itself to 4 cores?


----------



## Roph (Sep 28, 2016)

I'd like to see such a graph generated on a Phenom 1, as you scale up and starve it of its pathetic cache. Is that not a quad core?


----------



## FordGT90Concept (Sep 28, 2016)

Since it only has four cores and no simultaneous multithreading, it would look like both of those do up to 4 and then steady at 1 beyond that.  It would exactly look like Dhrystone on Ivy--any quad-core without some kind of in-core multithreading would.


----------



## cdawall (Sep 28, 2016)

Aquinus said:


> This line of reasoning disturbs me. Why does this have to be dealt with as an absolute? Our definition of a core should reflect the CPU and the architecture. CPU technology is far less monolithic than it used to be and it's only going to continue to move that direction. Either that or we should just admit that the term "core" as Ford knows it is obsolete.



This is the same issue I keep seeing, why do some people assume monolithic dies are the only things that can be described as a core? These have the ability to independently work, something you cannot do with HT. These are and always will be physically there, it isn't a "logical" core.


----------



## MalakiLab (Sep 28, 2016)

FR@NK said:


> AMD can claim there are two cores per module but thread scaling seems to disagree:
> 
> This is a FX-8320 piledriver 8c/8t.
> 
> ...



It is called Amdahl's Law. It depends on a lot of things. First on the hardware to be able to handle all the workload, as the dispatching takes more and more resources. The more processors or the more cores you throw in a processor, the more overhead there will be in management of the flow.

Second is the parallelism of the workflow you have to input in processor. Not everything can be parallelized infinitely. Codes nowadays are not well designed to be on so many cores, and you'll see on other 8 cores and 12 cores Intel processors, you'll begin to see the same exact behaviour.




If you want to read some good text, written by Intel, you can purchase this one : https://www.computer.org/csdl/mags/so/2011/01/mso2011010023-abs.html

Without talking of hyperthread or anything, after 4 cores, the current way we compile and program stuff, don't take all advantage of being parallelized. Graphic is also from Intel, on how they see cores behave, it's part of the article above.




It's funny though seeing you trying to analyse data and interpret them to suit your point of view on something.


----------



## MalakiLab (Sep 28, 2016)

FordGT90Concept said:


> It does have a much straighter line because of that extra hardware but yeah, each module definitely ain't no dual core.  Absolutely nothing suggests it is except AMD's marketing material.
> 
> I take it Dhrystone sees HTT and limits itself to 4 cores?



Wrong. Dhrystone is calculating MIPS. Hyperthreading don't boost MIPS, but squeeze instructions to be dealt more efficiently, so can speed up by making sure every core don't have some execution holes and everything is well coordinated to maximize usage of all the processor components. The only way to boost the instructions per second is adding more cores. And it might not be linear after 4 cores.


----------



## FordGT90Concept (Sep 28, 2016)

cdawall said:


> These have the ability to independently work, something you cannot do with HT.


There is hardware both clusters rely on (not independent).  AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space.  An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4).  What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core.  There's absolutely nothing wrong with that.  In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores.  Nothing suggests it does except the box.  The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.


----------



## MalakiLab (Sep 28, 2016)

FordGT90Concept said:


> There is hardware both clusters rely on (not independent).  AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space.  An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4).  What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core.  There's absolutely nothing wrong with that.  In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores.  Nothing suggests it does except the box.  The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.



You're wrong. If you worked with Xeons, in datacenters, company medium clusters or supercomputers, you'd know that. Software and the workload are having a huge repercussion on how multicore and multiprocessor behave.

I'll show you yet another example of a program i personally worked on. http://compression.ca/pbzip2/

Nothing in real life have an infinite linear curve. There's always a limit to which an application can be parallelized. Every processor, every software and every instructions show that kind of behaviour.

It is a law, it is calculated, it is calculable. http://research.cs.wisc.edu/multifacet/amdahl/

Get your science right.

EDIT : You remained blind to all proof presented to you so far. Don't say there's no proof, everything converge as a proof. If you had an once of honesty i can prove you are wrong. I made the test myself some times ago, as i worked on the ondemand governor, which made the core clock fluctuate depending on the workflow. I made the test to be 100% sure if one core is at 1600MHz and the other is at 3900MHz, that the virtual machine bound to one of the module core won't get affected by the other at lower speed. Both integers and FPU are slower on the 1600MHz core, and both are faster on the 3900MHz core. Both work independently. Until i throw an AVX instruction, then the entire FPU clock is getting at the speed of the core asking for the unification, until the instruction is done. You can even try it yourself with QEMU/KVM, while putting a core affinity to the VMs, one on the first module core, other on the second module core. Very easy to replicate. When i don't use AVX, i have 100% of the time 2 cores in a module. But you continue putting your head in sand, playing blind.

EDIT 2 : I am also surprised on how inaccurate your point of views are. It's actually the opposite, AMD have a semi SMT, because it can't have another thread on a core in a module. It's one of the bottleneck of the cores in the module. with the fact it is not ordering well. Hyperthreading is a much much better SMT implementation, and takes a lot more space on the die. Complete opposite. For that you seem to talk about, you're claiming it's transparent to the OS, when it's not, at all. 95% of the thread management is made my the kernel, the threads library and the kernel, all software. The processor only order them to the right core/module and then to the right thread. The processor don't decide what core will take what thread, the kernel do. What the processor decide is what it will do with the thread. The Intel SMT is better too because it don't have 2 cores to supply, the hyperthreaded one don't have to be on time and constantly supplied like the AMD Bulldozer have to. 





Microsoft is very very bad at handling threads, unlike Linux. Because Linux was used with SPARC and other servers to have like 8 chips, with 16 cores, 128 threads. But before hyperthreading, Windows kernel never had a proper threading library. It's normal for a SPARC to have so many threads, as it's a RISC, not a CISC. In Linux they just have to modify some point of the kernel to make it recognize the module as a core with multiple threads. It doesn't change anything, except it will address threads like it should be. If the big SMT would be detrimental for the design, you can be sure a SPARC or ALPHA processor would be bottleneck like there's no tomorrow. But it's not. Pretty much everything composing your logic is opposite to what is established in computer engineering.


----------



## cdawall (Sep 29, 2016)

FordGT90Concept said:


> There is hardware both clusters rely on (not independent).  AMD's implementation of SMT is only about 25% faster than Intel's implementation but at significantly higher cost in terms of design and die space.  An extra physical core should always represent near 100% increase in performance no matter the application because it is truly independent (both cores show this going up to 4).  What AMD did with Bulldozer is enable a single core to be able to handle two threads more efficiently when two threads are in the core.  There's absolutely nothing wrong with that.  In fact, AMD's implementation is quite better than Intel's but that in no way means it has two distinct cores.  Nothing suggests it does except the box.  The hardware isn't there, the relative performance isn't there, and newer operating systems show half of what AMD claims.



That's great and all until you end up being incorrect.


----------



## FordGT90Concept (Sep 29, 2016)

Meh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand.  AMD is going to lose.


----------



## Aquinus (Sep 29, 2016)

FordGT90Concept said:


> Meh. A whole lot of jargon that simply doesn't matter when it comes to what consumers understand.  AMD is going to lose.


The ignorance of the consumer makes AMD wrong? Interesting. Is that seriously what you've reduced your argument to? That makes absolutely no sense. It's like saying the consumer was ignorant so AMD will be punished. That's laughable at best.


----------



## FordGT90Concept (Sep 29, 2016)

It made Seagate wrong when they were class-action sued.


----------



## FR@NK (Sep 29, 2016)

Aquinus said:


> The ignorance of the consumer makes AMD wrong?



Who do you think is sitting in the jury box? Ignorant consumers!



FordGT90Concept said:


> It made Seagate wrong when they were class-action sued.



I remember feeling very cheated when my new 60GB drive was only showing 55.87GB. I imagine it would be much worse when you realize you are missing half of your cores.


----------



## Aquinus (Sep 30, 2016)

FordGT90Concept said:


> It made Seagate wrong when they were class-action sued.


The high failure rate class action? How was the consumer wrong and ignorant when their drives actually did have high failure rates? The funny thing about that is that failure rate can be quantified, just as performance can be and you'll find out very quickly that performance really drops off when you're doing all floating point math which isn't a realistic load for a normal application as there are usually a lot of integer operations mixing in with floating points ops so, even if there is a shared FPU, realistically multi-core performance won't suffer as much.

When people complain about Bulldozer, what is the #1 complaint? I'll give you a hint: It's not multi-core performance. The biggest complaint is single-threaded performance so, even without another task trying to utilize the FPU, performance still sucks and that isn't because BD doesn't have "real cores." The fact that the FPU is shared is beside the point but you seem to be incredibly intent on making it an upfront issue. The simple fact is that BD's performance blows because the number of uOps BD can execute at any given time was seriously reduced over K12. Since dispatch width per core is significantly reduced, it's completely possible that instructions before that might take 3 or 4 clock cycles now might take 5 or 6 because of the reduced width of even integer operations because AMD slimmed down the core, they didn't just share some parts like the FPU and dispatch/decode hardware as opposed to beefing up each core which would take up more die space and would reduce how many cores you could cram in for a given size. AMD's mistake was that multi-core performance didn't make up for the loss in single-threaded performance. Pair that up with poor hit rates for cache and pipeline stalls due to a very long pipeline and you have a recipe for a disaster.

People need to stop being obsessed with reducing this problem to something as simple as "it doesn't have real cores," because the problems with Bulldozer are much greater and larger in number than merely a shared FPU but that's all everyone seems to be focused on because honestly, if you need so much floating point bandwidth that a single SIMD unit is too slow, you should be using something optimized for massively parallel SIMD operations like GPUs.

Lets say for a minute Bulldozer didn't have the second integer core, okay? Would you still be pissed off because performance is crap because the FPU has half of the floating point capability as both K12 and SB and later Intel CPUs? The FPU literally can do twice as much on K12 and SB+ because it's twice as wide as Bulldozer's.

So if you want to get pissed off about something, get pissed off about that because a second integer core doesn't change the fact that the FPU already is seriously under-powered, even if it wasn't shared, which will continue to plague AMD if they don't change that in Zen.



FR@NK said:


> I remember feeling very cheated when my new 60GB drive was only showing 55.87GB.


You mean how you can still buy a 1TB drive and find that 92.7GB is "missing" because people don't realize that HDD manufactures state SI prefixed bytes and not binary prefixed bytes?


----------



## FordGT90Concept (Sep 30, 2016)

http://www.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1


----------



## FR@NK (Sep 30, 2016)

Aquinus said:


> The high failure rate class action?



I see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.

The class wanted a 7% refund on the drives they bought which as you can see below nearly matches the difference when referring to gigabytes. Also notice how the difference increases as harddrives get larger and use larger prefixes.








Aquinus said:


> because people don't realize that HDD manufactures state SI prefixed bytes



Again: Who do you think is sitting in the jury box? Ignorant consumers! This is why im not surprised AMD is getting sued over core counts.


----------



## Aquinus (Sep 30, 2016)

FordGT90Concept said:


> http://www.bit-tech.net/news/bits/2007/10/26/seagate_lawsuit_concludes_settlement_announced/1


That required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.) The difference is that there is nothing wrong with stating that a core is merely registers and combinational logic. If AMD has to do anything, it will the fine print on the back of the box that explains the difference between integer cores and their relationship to the FPU.

The argument falls apart when you consider what would happen if AMD had doubled the width of the single FPU (not add a second one,) per module and it's impact it would have had on floating point performance and I'm willing to bet that you would instantly make up the difference but, that still doesn't fix the integer cores which is where a lot of performance is lost. Once again, the class action makes it sound like bulldozer sucks because it has a shared FPU when it's really because it has gimped FPUs. Sharing it was smart, slimming it out was not. A similarly clocked Intel quad core will have double the floating point performance than an "8 core" BD chip at the same clock. It also happens to be the case (as I said before,) that the FPU per module is half of the width of the FPU on K12 and SB through at least Haswell. If BD had FPUs that were twice as wide, it would still be shared but, if you consider the clocks that BD runs at, you make up some of that difference and floating point performance would line up more with a 6c Intel CPU if that were the case instead of somewhere between a dual-core and quad-core Intel chip at the same clock.

Simply put, you could still have a FPU on every core but, if they make the FPU half as wide than it is now per every module, you're still stuck with the same crappy performance because your ability to dispatch hasn't been improved. When using any streaming SIMD task with floating point data, the wider FPU at any given clock speed will always be faster than a narrower one because half the width means twice as many cycles to do the same thing and fewer cycles to complete a task means better IPC. So despite having twice as many FPUs, the reduced width of each unit harms overall throughput.

tl;dr: Increasing the width of the already shared FPU by double would have the same performance characteristics as doubling the number of FPUs with the current width which is reason alone to reject the "it's not 8 cores," claim based strictly on the FPU itself. Simply put, caveat emptor.


----------



## FordGT90Concept (Sep 30, 2016)

FR@NK said:


> I see you arent familiar with the seagate lawsuit. It really changed how anything technical had to have fine print explaining instead of assuming the consumer understood.


Yup, even DVD+/-Rs have 1 GB = 1,000,000,000 bytes on the packaging.  I noticed with newer hard drives, they don't even put the capacity on the HDD label.  All it has is the model number which you can usually figure out capacity from (e.g. ST1000 = 1TB, ST3000 = 3 TB).



Aquinus said:


> That required Seagate to explain the difference between GB and GiB, not to adopt something that would be consistent with the OS (unless you see it being advertised as something like 1TiB.)


AMD needs to explain what is a core and what is not a core because what they provide doesn't fit the mold of what people expect.


----------



## Aquinus (Sep 30, 2016)

FordGT90Concept said:


> AMD needs to explain what is a core and what is not a core because what they provide doesn't fit the mold of what people expect.





			
				BitTech said:
			
		

> We asked Bernard Seite, technical advisor, AMD, whether we really should regard the two execution units within a Bulldozer Module as cores and were told, ‘_If you take the overall group of applications that are running on x86, 90 per cent is integer… We look at how efficient Hyper-Threading [is]. Sometimes you have negative impact, but most of the time, you have something which is in between zero and 40. The Bulldozer Module will never be negative [in its performance gains] – you have two threads, and the two threads are not going to clash._’
> ...
> The only time two threads within a Bulldozer Module could clash, we were told, was if each required 256-bit floating point precision, for example if both threads used the new 256-bit AVX capabilities of the CPU. This is because the floating point unit – as previously alluded to – is shared and comprises two 128-bit fp units which can be ganged to produce a single 256-bit unit. However, it’s very unlikely that we’ll see many 256-bit fp threads any time soon as the standard is new and will take time to adopt. Seite also pointed out that ideally the OS (or the complier, potentially) should be aware of the capabilities of the Module and assign the second 256-bit thread to another Module, perferrably one not running any hardcore fp work.


http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2

Except it sounds exactly like people are going to expect. Once again, people keep equating bad performance to "not really being cores."

I'm pretty sure you need to read #374 again and not just the beginning about Seagate.


----------



## FordGT90Concept (Sep 30, 2016)

Compare two of Intel's "cores" to two of AMD's "cores"...


> The only time two threads within a Bulldozer Module could clash...


...is an impossibility in the former.  Hyper-Threading Technology does not add "cores."  The comparison is therefore invalid.

AMD should have called:
-core -> integer cluster
-module -> core
...lawsuit never would have happened because the description matches the product.


----------



## Aquinus (Sep 30, 2016)

FordGT90Concept said:


> Compare two of Intel's "cores" to two of AMD's "cores"...
> 
> ...is an impossibility in the former.  Hyper-Threading Technology does not add "cores."  The comparison is therefore invalid.
> 
> ...


So you're entire argument is based upon not being able to dispatch two 256-bit (quad-precision,) FP ops at the same time? I'm sure you're running full precision AVX all the time. 

Once again, you're not addressing the real problem which is the width of the FPU, not the fact that it's shared. When I said re-read my post, I did mean the entire thing, including the second half of it.


Aquinus said:


> The argument falls apart when you consider what would happen if AMD had doubled the width of the single FPU (not add a second one,) per module and it's impact it would have had on floating point performance and I'm willing to bet that you would instantly make up the difference but, that still doesn't fix the integer cores which is where a lot of performance is lost. Once again, the class action makes it sound like bulldozer sucks because it has a shared FPU when it's really because it has gimped FPUs. Sharing it was smart, slimming it out was not. A similarly clocked Intel quad core will have double the floating point performance than an "8 core" BD chip at the same clock. It also happens to be the case (as I said before,) that the FPU per module is half of the width of the FPU on K12 and SB through at least Haswell. If BD had FPUs that were twice as wide, it would still be shared but, if you consider the clocks that BD runs at, you make up some of that difference and floating point performance would line up more with a 6c Intel CPU if that were the case instead of somewhere between a dual-core and quad-core Intel chip at the same clock.
> 
> Simply put, you could still have a FPU on every core but, if they make the FPU half as wide than it is now per every module, you're still stuck with the same crappy performance because your ability to dispatch hasn't been improved. When using any streaming SIMD task with floating point data, the wider FPU at any given clock speed will always be faster than a narrower one because half the width means twice as many cycles to do the same thing and fewer cycles to complete a task means better IPC. So despite having twice as many FPUs, the reduced width of each unit harms overall throughput.
> 
> tl;dr: Increasing the width of the already shared FPU by double would have the same performance characteristics as doubling the number of FPUs with the current width which is reason alone to reject the "it's not 8 cores," claim based strictly on the FPU itself. Simply put, caveat emptor.


----------



## FordGT90Concept (Sep 30, 2016)

And "you're not addressing the real problem" of AMD overstating the capabilities of their processors.


----------



## Aquinus (Oct 1, 2016)

FordGT90Concept said:


> And "you're not addressing the real problem" of AMD overstating the capabilities of their processors.





Aquinus said:


> Simply put, you could still have a FPU on every core but, if they make the FPU half as wide than it is now per every module, you're still stuck with the same crappy performance because your ability to dispatch hasn't been improved.


I'm at least making an effort unlike your witty remarks which aren't proving anything. The only point they're making is that you disagree with me.


----------



## FordGT90Concept (Oct 1, 2016)

There's already been enough proof provided in this thread that to do so again would simply be an exercise in repetition.  AMD redefined "core" so it stands out on the shelf next to Intel (8 "cores" for $800 versus 4 cores for $1000).  Everything about it stinks of misleading the public.  The details really don't matter in the eyes of consumer protection law.


----------



## eidairaman1 (Oct 1, 2016)

No point of beating a dead horse. The cpus have 8 physical cores in 4 modules, they share resources between 2 cores, not much different than any cpu sharing the cache.


----------



## 64K (Oct 1, 2016)

Eventually someone is going to tire in this debate and be rolled away in a wheelchair.


----------



## Aquinus (Oct 1, 2016)

eidairaman1 said:


> No point of beating a dead horse. The cpus have 8 physical cores in 4 modules, they share resources between 2 cores, not much different than any cpu sharing the cache.


...and people don't seem to get that the problem is that the width of the FPU is 100% responsible for the poor performance, not the fact that there is only one of them. Bulldozer and Haswell both have the same number of FPUs and on a purely floating point benchmark, Haswell will be twice as fast because the width of the FPU is twice as big. So at any given clock speed, you'll see half the performance. Half the width, half the performance. That's not too hard to understand.


----------



## FordGT90Concept (Oct 1, 2016)

eidairaman1 said:


> No point of beating a dead horse. The cpus have 8 physical cores in 4 modules, they share resources between 2 cores, not much different than any cpu sharing the cache.


Except that sharing caches is normal.  L3, for example, is often accessible by all of the cores in the CPU.  The only cache that usually isn't shared is L1.  Each Bulldozer "module" has one L1 for instructions and two L1s for data.  A single "core" has one L1 for instructions and one L1 for data.  The lack of a second L1 instruction cache is another indicator the "module" is an extended "core," not a dual core.



Aquinus said:


> ...and people don't seem to get that the problem is that the width of the FPU is 100% responsible for the poor performance, not the fact that there is only one of them. Bulldozer and Haswell both have the same number of FPUs and on a purely floating point benchmark, Haswell will be twice as fast because the width of the FPU is twice as big. So at any given clock speed, you'll see half the performance. Half the width, half the performance. That's not too hard to understand.


...evidence that AMD cheated consumers...


----------



## Aquinus (Oct 1, 2016)

FordGT90Concept said:


> ...evidence that AMD cheated consumers...


...but not having 8 cores like the lawsuit is about. I'm not disagreeing that AMD screwed consumers. I'm disagreeing with the statement that it doesn't have 8 cores.


----------



## cdawall (Oct 1, 2016)

FordGT90Concept said:


> Compare two of Intel's "cores" to two of AMD's "cores"...
> 
> ...is an impossibility in the former.  Hyper-Threading Technology does not add "cores."  The comparison is therefore invalid.
> 
> ...



A module isn't a core, it has two integer clusters mind finding me a core with two?

Seriously your argument is AMD used an undefined word differently. A module isn't a core, the only thing you can argue is the lack of an FPU per core, but guess what it's still a core at that point.


----------



## Aquinus (Oct 1, 2016)

cdawall said:


> Seriously your argument is AMD used an undefined word differently. A module isn't a core, the only thing you can argue is the lack of an FPU per core, but guess what it's still a core at that point.


...but Ford has a definition of a core and this doesn't agree with it. Are you telling me that doesn't make him right?! What a shocker. 

Edit: I don't want to make this a rant but, as a software engineer, there are a lot of cases where I opt for integers (fixed point,) over floating point values for reasons of performance, accuracy and precision. When you write software that translates something into your earnings or projected earnings, you don't want round-off error or any "lost" data, you want every penny. You want everything to add up to exactly what it's supposed to be, not just for earnings but, so you can confirm your data against an audit if people think you're full of shit. We can't be like Wells Fargo and screw people out of their fractions of a cent like on Office Space because people are scummy. In reality, you need control of that if you're going to be an ethical institution that isn't willing to lie about progress.

I just wanted to get that out there because from my perspective, floating point is a different animal than integer math all together and I treat it completely differently. For that reason, I can't consider the FPU directly part of the core. It's important but, it's a special case to me.


----------



## FordGT90Concept (Oct 2, 2016)

That's what decimal128 is for.   It still incurs a performance penalty though.


When AMD said "dual core" with FX-62, they didn't mean two integer clusters, they meant two complete processors in one package, each having its own instruction decoder, instruction cache, floating point cluster, integer cluster, and data cache.  Intel followed suite with Pentium D.  If AMD meant "core" was only the integer cluster, then why did it have two of everything relevant to processing all tasks relevant to x86 (especially instruction decoder and floating point)?  That hardware represents standard features in x86 and has been since the 90s.

Logic would suggest having two FPUs per core would be better than two integer clusters per core because of the performance penalties FPUs incur.  AMD, instead did the opposite.  They took the worst performing part of a processor and shared it with two threads without bolstering it.  It was gimped from the day of conception.


"Core" is very well defined and certainly no court would accept your argument that it is "undefined."


----------



## cdawall (Oct 2, 2016)

It's so defined you yourself had to find basically an urban dictionary of computers to do so.


----------



## Prima.Vera (Oct 2, 2016)

I see a lot of bla-bla-bla going on on the last 16 pages. However, everybody seems to miss the point of the article. Let's review it:
_"Tony Dickey, representing himself in the U.S. District Court for the Northern District of California, accused AMD of falsely advertising the core count in its latest CPUs, and contended that because of they way they're physically structured, AMD's 8-core "Bulldozer" chips really only have four cores._
_The lawsuit alleges that Bulldozer processors were designed by stripping away components from two cores and combining what was left to make a single "module." In doing so, however, the cores no longer work independently. Due to this, AMD Bulldozer cannot perform eight instructions simultaneously and independently as claimed, or the way a true 8-core CPU would."_
So he is suing AMD because of this statement. That's all. Personally I don't have CPU design experience so I will not take any side. However, if he wins the process, then he was right, and end of discussion. If not, the same.


----------



## FordGT90Concept (Oct 2, 2016)

cdawall said:


> It's so defined you yourself had to find basically an urban dictionary of computers to do so.


Does the equivalent literature exist to say otherwise?  Has anyone, other than AMD, published a document that defined what a "core" is?


----------



## Prima.Vera (Oct 2, 2016)

OK, one simple question and I'm gone. If AMD has indeed 8 cores, can you or can you not clock each individual core with a different frequency?


----------



## BiggieShady (Oct 2, 2016)

FordGT90Concept said:


> Does the equivalent literature exist to say otherwise?


So the equivalent literature doesn't say otherwise which makes urban dictionary the best source for the definition which makes the term well defined. 
Not getting your logic.


----------



## Aquinus (Oct 2, 2016)

> Now, it’s absolutely true that the Bulldozer family of products has had much lower single-thread performance than either previous AMD CPUs (in many cases) or Intel chips (in virtually all cases). But this lawsuit doesn’t appear to argue that AMD mismarketed its CPUs because single-threaded performance was weaker than expected, but because _multi-threaded_ scaling was critically harmed by the decision to share various aspects of the underlying architecture. Weak single-threaded performance and high power consumption created a situation in which BD could neither hit its target clock frequencies nor its IPC targets. Critically, these issues do not disappear when the CPU is run in one-thread per module mode.
> 
> Dickey’s lawsuit is wrong on other areas of fact as well. Bulldozer does share a single FPU block per work unit, but consumer workloads are rarely FPU-heavy. Each CPU module _does_ contain the eight integer pipelines you’d expect in a typical dual-core conventional chip (4 ALU + 4 AGU per module). Dickey refers to Bulldozer as being unable to “perform eight calculations simultaneously,” but this is imprecise, inexact language that does not reflect the complexity of how a CPU executes code. Bulldozer is absolutely capable of executing eight threads simultaneously, and executing eight threads on an eight-core FX-8150 is faster than running that same chip in a four-thread, four-module mode. Bulldozer can decode 16 instructions per clock (not eight) and it can keep far more than eight instructions in flight simultaneously.
> 
> This lawsuit essentially asks a court to define what a core is and how companies should count them. As annoying as it is to see vendors occasionally abuse core counts in the name of dubious marketing strategies, asking a courtroom to make declarations about relative performance between companies is a cure far worse than the disease. From big iron enterprise markets to mobile devices, companies deploy vastly different architectures to solve different types of problems. An eight-core, Cortex-A7-based, mobile SoC is a very different beast from an eight-core big.Little Cortex-A57 / Cortex-A53 configuration. That chip is very different from an Oracle M7 or the SPARC T5. The T5 doesn’t pack the per-core performance of Intel’s 18-core Xeons, or IBM’s Power8.


http://www.extremetech.com/extreme/...lse-bulldozer-chip-marketing-is-without-merit


----------



## cdawall (Oct 2, 2016)

Prima.Vera said:


> OK, one simple question and I'm gone. If AMD has indeed 8 cores, can you or can you not clock each individual core with a different frequency?



You can each core is independent for clocks


----------



## Aquinus (Oct 2, 2016)

cdawall said:


> You can each core is independent for clocks


Can you really control the multiplier on each integer core independently of the other core in the module in the BIOS? If true, I would actually find that really interesting.


----------



## FordGT90Concept (Oct 2, 2016)

@MalakiLab claims it is possible to change the clockspeeds on the integer clusters which begs the question what speed is the FPU, instruction decoder, and so on running at?  Also note in the picture how Linux calls the FX-6350 a tri-core.



BiggieShady said:


> So the equivalent literature doesn't say otherwise which makes urban dictionary the best source for the definition which makes the term well defined.
> Not getting your logic.


There's a plethora of samples that have everything packed into one unit collectively called a "core."  When an entire industry does something the same across a wide variety of hardware, it becomes the definition of what that something is.  Case in point: Oxford English Dictionary now recognizes the words 'Merica and YOLO.  The definition of "core" was established with K8 X2, K10, Conroe, Penryn, Nehalem, Westmere, and Sandy Bridge (I'm leaving a lot out) before AMD redefined it with Bulldozer in late 2011.  Its launch goes against the five years of established precedent.  None of those processors shared anything required to process any type of data.


----------



## BiggieShady (Oct 2, 2016)

FordGT90Concept said:


> When an entire industry does something *the same *across a wide variety of hardware


But the whole point here is that the entire industry has many *different *examples of cpu cores ... and you claim that it is the *same*, not different. That's pure fantasy.
If we look only at x86 instruction set with extensions and according market shares today, then truly intel's core is de facto definition what core is  but let's not do that for science sake.


----------



## FordGT90Concept (Oct 2, 2016)

No, the industry doesn't.  What constitutes a core is fairly consistent in the last decade excluding Bulldozer.  Even ARM tends to have the same components as x86 in each core.  The only hardware that is remotely similar to Bulldozer is SPARC which is designed explicitly for databases.  In SPARC, it has 8 instruction decoders for 8 cores.  Even though the FPU is separate in SPARC, it behaves like an internal coprocessor.  None of the cores claim ownership of it nor share hardware directly with it.  AMD shares way too much hardware for an FX-8350 to be considered an octo-core.

I'm not talking about sales, just processor designs that exist.


----------



## cdawall (Oct 2, 2016)

Aquinus said:


> Can you really control the multiplier on each integer core independently of the other core in the module in the BIOS? If true, I would actually find that really interesting.



Depending on the BIOS and board you can clock per core, inside of the OS you can set core clocks per core and voltages independently. I do it with my pair of 12 core optys.



FordGT90Concept said:


> There's a plethora of samples that have everything packed into one unit collectively called a "core."  When an entire industry does something the same across a wide variety of hardware, it becomes the definition of what that something is.  Case in point: Oxford English Dictionary now recognizes the words [URL='http://www.cnn.com/2016/09/12/health/oxford-new-words-trnd/index.html']'Merica and YOLO[/URL].  The definition of "core" was established with K8 X2, K10, Conroe, Penryn, Nehalem, Westmere, and Sandy Bridge (I'm leaving a lot out) before AMD redefined it with Bulldozer in late 2011.  Its launch goes against the five years of established precedent.  None of those processors shared anything required to process any type of data.




If it was established why can't you find a definition?


----------



## Aquinus (Oct 2, 2016)

FordGT90Concept said:


> Even though the FPU is separate in SPARC, it behaves like an internal coprocessor.


Hint: It acts like an internal co-processor when it's dedicated per core as well. There is a very fine line where the FPU starts and ends and isn't fully coupled into the integer core like you claim. Yes, it does allow the result generated by the FPU to flow back to the integer core but, that's usually so the AGU can figure out where to put it in memory after the calculation is complete.

A FPU can not function as a processor of any kind by itself. Integer math is a requirement for any modern day machine used personally or in servers. Even GPUs which are designed to do massively parallel floating point computations must have the ability to do integer math because floating point means nothing without it. Is it really so hard to comprehend that a CPU can exist without a FPU but a CPU can't exist without integer logic?

Also, IBM's POWER7 has *four* DP FPUs per core and can do SMT with up to 4 threads per core. The dedicated FPUs didn't make it a core but, the singular pairs of ALUs and AGUs did. How is that not any different from the reverse case? If I recall correctly, multi-core POWER CPUs have shared instruction decode logic that gets put on to queues for each core. So not only does it have dedicated FPUs contained within a single "core", it has shared logic for all of the cores to dispatch instructions. By your logic, the POWER7 is a one core CPU because it shared resources between all of the cores but, could be 4 times as many cores because of the number of FPUs.

Either way, even if BD had a more FPUs or a beefier FPU, I think people would have still called foul on the terrible integer performance which begins with single-threaded applications running alone. AMD hoped that more cores was going to offset the degradation of IPC but, they were wrong. Haswell's integer core has twice as many ALUs as BD and one more AGU. That alone should tell you something.

Simple fact is that AMD told the public that Bulldozer was going to have a 256-bit FMA FPU per module. There was no deception. The problem is that most people don't know what the hell that means. People also don't probably know that their Intel CPU probably has dual dispatch 256-bit FPUs per integer core. Different CPUs with different goals. That's it.


----------



## FordGT90Concept (Oct 2, 2016)

cdawall said:


> If it was established why can't you find a definition?


I already gave one from Webopedia.  You can look at most architectures and see it matches Webopedia's definition.  Example UltraSPARC T2 (UltraSPARC T1 had the FPU connected to the crossbar):








Aquinus said:


> Hint: It acts like an internal co-processor when it's dedicated per core as well.


The FPU is like x87 where it is connected to a system bus (crossbar in UltraSPARC T1).  It's a discreet processor that handles its own instructions with its own caches.  It shares nothing with any core.  In Bulldozer, one instruction decoder handles three components (FPU + two integer clusters).  No processor exists before or since with that kind of layout.



Aquinus said:


> Is it really so hard to comprehend that a CPU can exist without a FPU but a CPU can't exist without integer logic?


I never said it couldn't but in recent history, everytime it was done, it was considered an error in hindsight.  Examples: UltraSPARC T1 had one FPU to 8 cores; UltraSPARC T2 moved the FPU into the 8 cores so there's a total of 8.  Bulldozer and sons had one FPU per two integer clusters; Zen is moving to one FPU per core.  Gimping the FPU is a great way to lose processor sales to the competition.  So technically it can be done but in application, it's foolish.



Aquinus said:


> Also, IBM's POWER7 has *four* DP FPUs per core and can do SMT with up to 4 threads per core. The dedicated FPUs didn't make it a core but, the singular pairs of ALUs and AGUs did. How is that not any different from the reverse case? If I recall correctly, multi-core POWER CPUs have shared instruction decode logic that gets put on to queues for each core. So not only does it have dedicated FPUs, it has shared logic for all of the cores. By your logic, the POWER7 is a one core CPU because it shared resources between all of the cores.


Oh look, it's all packed into each core like expected:




Seriously, stop thinking so hard.  It is very simple.


----------



## cdawall (Oct 2, 2016)

FordGT90Concept said:


> I already gave one from Webopedia. You can look at most architectures and see it matches Webopedia's definition. Example UltraSPARC T2 (UltraSPARC T1 had the FPU connected to the crossbar):



So where in that image does it say every core has to be setup in this exact configuration to qualify as a core? That isn't even an x86-64 CPU so design on that end alone would allow differences. 



FordGT90Concept said:


> The FPU is like x87 where it is connected to a system bus (crossbar in UltraSPARC T1). It's a discreet processor that handles its own instructions with its own caches. It shares nothing with any core. In Bulldozer, one instruction decoder handles three components (FPU + two integer clusters). No processor exists before or since with that kind of layout.



It took 3 generations of CPU's for Intel to implement HT again after the fiasco that was netburst. Remember that before playing the "never existed" card.


----------



## FordGT90Concept (Oct 2, 2016)

cdawall said:


> So where in that image does it say every core has to be setup in this exact configuration to qualify as a core? That isn't even an x86-64 CPU so design on that end alone would allow differences.


Each core is fully autonomous.  That is the defining feature of a core.  Nothing is shared.  Bulldozer shares a lot, UltraSPARC T1 shares nothing (has to leave the core to reach it making it a coprocessor).



cdawall said:


> It took 3 generations of CPU's for Intel to implement HT again after the fiasco that was netburst. Remember that before playing the "never existed" card.


They're separate lineages:
Long pipelines: Pentium 4 --USA -> Core I#
Short pipelines: Pentium M --Israel-> Core/Core 2 (I think it lives on today as Atom)

HTT was never technically gone--they just weren't launching new processors of its design because Netburst was a clusterfuck that took years to clean up.  That said, I really don't get your line of thought with this comment.


----------



## Aquinus (Oct 2, 2016)

FordGT90Concept said:


> Seriously, stop thinking so hard. It is very simple.


Take your own advice. A core is something that can (by itself,) *execute* instructions independently.


FordGT90Concept said:


> Oh look, it's all packed into each core like expected:
> 
> 
> 
> ...


You do realize that each one of those POWER7 cores has the same integer hardware as Bulldozer's integer core and even has shared dispatch hardware not shown on that diagram which is only describing the memory hierarchy.


----------



## FordGT90Concept (Oct 2, 2016)

Aquinus said:


> Take your own advice. A core is something that can (by itself,) *execute* instructions independently.


Except that the integer cluster gets instructions decoded by separate hardware that it does not possess.  It is dependent on the hardware around it--completely useless without it.



Aquinus said:


> You do realize that each one of those POWER7 cores has the same integer hardware as Bulldozer's integer core and even has shared dispatch hardware not shown on that diagram which is only describing the memory hierarchy.


I can't find any thing to support this claim.  All I could find is POWER8 which does have "predecode" but look further down the pipeline and each core still has a dedicated decoder:




It almost appears that it has at least two ALUs and two FPUs.  And why not?  With 8 threads in the core, it can certainly keep them busy.  I got no problem with multiple integer clusters and floating point clusters inside a core.  The point is, each one does not constitute a core--the whole of it does.  Instruction to result, it never leaves the core.  The same should be said of Bulldozer's "module."


----------



## BiggieShady (Oct 2, 2016)

FordGT90Concept said:


> I can't find any thing to support this claim.





Looks to me that instruction dispatcher is shared between 4 fixed point units, and it's all inside core boundary ... and since it's already shared isn't that what really matter how wide it is - how many instructions per clock can it dispatch ... how is this different than having a single double wide dispatcher out of core boundaries shared between two cores?
The answer is, it doesn't matter, this power 7 core could be split into 2 weaker cores that would be less super scalar on their own, each would need more cycles for wider instructions, it would be truly two independent but weaker cores.


----------



## FordGT90Concept (Oct 2, 2016)

Like POWER8, it appears to be a complete processor with lots of extra hardware to increase throughput.  "Core boundary" is right.

I so see the similarities between that and Bulldozer yet IBM calls it what it is: a core.  AMD does not.  Like I said, all data points to AMD lying to making the processors look better next to Intel.

To be very clear: I have no issue with Bulldozer's design.  I have an issue with AMD doubling the "core" count.


----------



## BiggieShady (Oct 2, 2016)

FordGT90Concept said:


> To be very clear: I have no issue with Bulldozer's design. I have an issue with AMD doubling the "core" count.


It is clear, you have an issue with code made of pure AVX 256bit instructions not scaling beyond 4 threads, you are completely fine with bad cache hits and gimped uop scheduler. IMO it should be other way round.


----------



## FordGT90Concept (Oct 2, 2016)

Look at the FX-8350 from the perspective of being a quad-core.  AVX 256-bit becomes a non-issue.

Single-threaded performance is peripheral to the lawsuit.  Yeah, it isn't the best but there's really nothing misleading about that part.  AMD struggled in that department since Intel has prioritized it.



BiggieShady said:


> Looks to me that instruction dispatcher is shared between 4 fixed point units, and it's all inside core boundary ... and since it's already shared isn't that what really matter how wide it is - how many instructions per clock can it dispatch ... how is this different than having a single double wide dispatcher out of core boundaries shared between two cores?


Because the whole of it is one core--not a component inside.  If IBM called those two "Fixed Point Units" "cores," I'd be as up in arms over that as I am over Bulldozer.  But they didn't because sense.  If only AMD had sense.


----------



## Aquinus (Oct 2, 2016)

FordGT90Concept said:


> AVX 256-bit becomes a non-issue.


AVX 256-bit is already a non-issue because hardly any software relies on quad precision floating point math.


FordGT90Concept said:


> If IBM called those two "Fixed Point Units" "cores," I'd be as up in arms over that as I am over Bulldozer.


The other name for those "fixed point units" are ALUs. Remember when I said POWER7 has the same integer hardware as a single BD core? That's two ALUs and two AGUs.


----------



## FordGT90Concept (Oct 2, 2016)

Aquinus said:


> The other name for those "fixed point units" are ALUs. Remember when I said POWER7 has the same integer hardware as a single BD core? That's two ALUs and two AGUs.


Yet, nothing is shared with a neighboring "core."

Zen is going to have 4 ALUs and 2 AGUs. Does that redefine what a core is?  Nope, it just increases the amount of parallelism the processor is capable of.  Adding a second integer cluster does the same damn thing (not a "core").


----------



## Aquinus (Oct 2, 2016)

FordGT90Concept said:


> Yet, nothing is shared with a neighboring "core."
> 
> Zen is going to have 4 ALUs and 2 AGUs. Does that redefine what a core is?  Nope, it just increases the amount of parallelism the processor is capable of.  Adding a second integer cluster does the same damn thing (not a "core").


I see the same gimped FMA FPU though. Weren't you complaining about FP throughput?


----------



## BiggieShady (Oct 2, 2016)

What would you say if Zen was presented as a 2 cores per module cpu like this?


----------



## cdawall (Oct 2, 2016)

FordGT90Concept said:


> Each core is fully autonomous.  That is the defining feature of a core.  Nothing is shared.  Bulldozer shares a lot, UltraSPARC T1 shares nothing (has to leave the core to reach it making it a coprocessor).



So by that logic sharing an L2 is not a core.



FordGT90Concept said:


> They're separate lineages:
> Long pipelines: Pentium 4 --USA -> Core I#
> Short pipelines: Pentium M --Israel-> Core/Core 2 (I think it lives on today as Atom)
> 
> HTT was never technically gone--they just weren't launching new processors of its design because Netburst was a clusterfuck that took years to clean up.  That said, I really don't get your line of thought with this comment.



Simple HT showed a performance degradation in a lot of scenarios back when it first came out. Software and hardware evolved and now SMT is the status quo. So the idea that sharing reasources and an FPU is the devil and "isn't a real core" might be an issue right now, but this shit will come back. These chips were meant for an HPC cluster and performed better than Intel's offerings at the time and they did so for a reason. As you said yourself size wise the modules look more like a tradition core than what the cores do, yet in a massively multithreaded, non-biased environment you were seeing scaling near 100% per core. Something Intel hasn't been able to emulate until haswell was released.


----------



## FordGT90Concept (Oct 3, 2016)

Aquinus said:


> I see the same gimped FMA FPU though. Weren't you complaining about FP throughput?


There is one per core.  It is not gimped because it is not shared.  8 cores =  8 FPUs.  In Bulldozer, not only were there 4 FPUs, but each one was only adequate for one core.



BiggieShady said:


> What would you say if Zen was presented as a 2 cores per module cpu like this?
> View attachment 79610


If the called the combined object a "module" and not a "core," throw Zen into the lawsuit.



cdawall said:


> So by that logic sharing an L2 is not a core.


L2 has always been optional.  The same goes with L3 and L4 (eDRAM).  They only exist to speed up memory latency.  They are not critical to the function of a core.  That said, L1 -> system memory would be painfully slow.



cdawall said:


> Simple HT showed a performance degradation in a lot of scenarios back when it first came out. Software and hardware evolved and now SMT is the status quo. So the idea that sharing reasources and an FPU is the devil and "isn't a real core" might be an issue right now, but this shit will come back. These chips were meant for an HPC cluster and performed better than Intel's offerings at the time and they did so for a reason. As you said yourself size wise the modules look more like a tradition core than what the cores do, yet in a massively multithreaded, non-biased environment you were seeing scaling near 100% per core. Something Intel hasn't been able to emulate until haswell was released.


Pentium 4 didn't originally come with HTT.  Intel saw all of the cache misses with Pentium 4 and thought a solution to minimize performance loss when that happens is to give it a second thread to work on while the first thread was retrieving data.  This was when most software was coded for a single processor.  It was also something added in hindsight--not a very good implementation.  When they went to design Nehalem, they started designing the architecture from the perspective of having HTT.  That's why its implementation was much better.

Remember that Bulldozer was AMD's first attempt at simultaneous multithreading.  First try was pretty bad (Bulldozer) and they improved it with each iteration but they couldn't fundamentally fix the blocking problems and poor single-threaded performance.  Zen throws out Bulldozer's ideas and replaces it with HTT-like simultaneous multithreading.  I'm not expecting AMD's Zen SMT performance to match HTT because Intel has lot of practice.  At least it is a step in the right direction.

8 Intel cores is going to beat 8 Bulldozer "cores."  Intel is going to charge you a lot more for the privilege though.

Diagrams above showed 75% gain at best, 25% at worst, not "near 100%" (that would be a real dual core, not a hybrid like Bulldozer is).  AMD sacrificed single-threaded performance for that though where Intel did not for 0-50% gain.


----------



## cdawall (Oct 3, 2016)

FordGT90Concept said:


> L2 has always been optional.  The same goes with L3 and L4 (eDRAM).  They only exist to speed up memory latency.  They are not critical to the function of a core.  That said, L1 -> system memory would be painfully slow.



FPU is optional as well. Hence the lack of it's existence, obviously.




FordGT90Concept said:


> Pentium 4 didn't originally come with HTT.  Intel saw all of the cache misses with Pentium 4 and thought a solution to minimize performance loss when that happens is to give it a second thread to work on while the first thread was retrieving data.  This was when most software was coded for a single processor.  It was also something added in hindsight--not a very good implementation.  When they went to design Nehalem, they started designing the architecture from the perspective of having HTT.  That's why its implementation was much better.



Bad argument, my point stands, Intel released a hunk of shit. Took something that worked in theory and applied it to a later CPU. There is no reason why we wont see the module ideology expand and continue. The design was ahead of it's time and not targeted at peasant workloads. It is and always will be an HPC chip.



FordGT90Concept said:


> Remember that Bulldozer was AMD's first attempt at simultaneous multithreading.  First try was pretty bad (Bulldozer) and they improved it with each iteration but they couldn't fundamentally fix the blocking problems and poor single-threaded performance.  Zen throws out Bulldozer's ideas and replaces it with HTT-like simultaneous multithreading.  I'm not expecting AMD's Zen SMT performance to match HTT because Intel has lot of practice.  At least it is a step in the right direction.



Technically bulldozer could handle 2 threads per core or 4 per module on top of the whole two core idea, so where in the Windows task manager did that fall?



FordGT90Concept said:


> 8 Intel cores is going to beat 8 Bulldozer "cores."  Intel is going to charge you a lot more for the privilege though.



Which generation? Massively multithreaded environments outside of windows tell a tale...



FordGT90Concept said:


> Diagrams above showed 75% gain at best, 25% at worst, not "near 100%" (that would be a real dual core, not a hybrid like Bulldozer is).  AMD sacrificed single-threaded performance for that though where Intel did not for 0-50% gain.



Cool I can make diagrams where it shows nearly 100% scaling depending hugely on OS it sits inside of. Even using your numbers what scaling does HT show? It sure isn't 75%. Another proof that these are "real" cores.


----------



## Frick (Oct 3, 2016)

One day I'll read this thread and dole out thanks whenever I learn something. Should be good.


----------



## Prima.Vera (Oct 3, 2016)

I found a lot of good info here:
http://www.bit-tech.net/hardware/cpus/2011/10/12/amd-fx-8150-review/2


----------



## FordGT90Concept (Oct 3, 2016)

cdawall said:


> FPU is optional as well. Hence the lack of it's existence, obviously.


In theory, not in practice.



cdawall said:


> Bad argument, my point stands, Intel released a hunk of shit. Took something that worked in theory and applied it to a later CPU. There is no reason why we wont see the module ideology expand and continue. The design was ahead of it's time and not targeted at peasant workloads. It is and always will be an HPC chip.


They are wide cores.  This lawsuit will likely force AMD to call them cores too.



cdawall said:


> Technically bulldozer could handle 2 threads per core or 4 per module on top of the whole two core idea, so where in the Windows task manager did that fall?


It would still be a 4-threaded core.  A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.



cdawall said:


> Which generation? Massively multithreaded environments outside of windows tell a tale...


Sandybridge/Ivybridge which were out about the same time as Bulldozer.



cdawall said:


> Cool I can make diagrams where it shows nearly 100% scaling depending hugely on OS it sits inside of. Even using your numbers what scaling does HT show? It sure isn't 75%. Another proof that these are "real" cores.


Go ahead and run your benchmarks then.  I'm waiting.  Here's the post, by the way.  Spoiler: it will never reach 95%+ that an actual dual core would.



Prima.Vera said:


> Still haven't got my answer, if you can oc each of the 8 cores independently?





FordGT90Concept said:


> @MalakiLab claims it is possible to change the clockspeeds on the integer clusters which begs the question what speed is the FPU, instruction decoder, and so on running at?  Also note in the picture how Linux calls the FX-6350 a tri-core.


----------



## Prima.Vera (Oct 3, 2016)

> FordGT90Concept said: ↑
> @MalakiLab claims it is possible to change the clockspeeds on the integer clusters which begs the question what speed is the FPU, instruction decoder, and so on running at? Also note in the picture how Linux calls the FX-6350 a tri-core.



Thank you. Looks like AMD might be right after all...


----------



## FordGT90Concept (Oct 3, 2016)

Power management circuits can be added pretty much anywhere in a processor to shut parts of it off.  It only proves that Bulldozer has those circuits.


----------



## Aquinus (Oct 3, 2016)

FordGT90Concept said:


> In theory, not in practice.


Theory is having only a FPU and no integer cores. Every x86 CPU since its inception to date has had an integer pipeline. Every single one. Whereas not everyone one has had an integrated FPU. In modern times it happens to be the case that the benefit of having a FPU is enough to include it all the time but, there is absolutely nothing to suggest that the FPU is required for the definition of a core because it used to be done. It was done once before, it can be done again. Once again, as the guy from AMD said in an interview, 90% of the work CPUs handle is integer in nature (and my work as a software engineer aligns with this statement.) It only makes sense to beef out a CPU to accommodate that kind of workload if die space is at a premium.


FordGT90Concept said:


> They are wide cores. This lawsuit will likely force AMD to call them cores too.


2 ALUs and 2 AGUs makes them skinny cores just as the single issue 256-bit FMA FPU (which can be split into dual issue 128-bit,) is a skinny FPU. They're also independent ALUs and AGUs which can receive their own instructions which feels a whole lot like a core. They have their own registers, its own control lines, and even its own instruction cache. Even the way that they scale feels, smells, and tastes like cores and not SMT. They're also not wide cores if you're comparing the integer pipeline against Haswell's 4 ALUs and 3 AGUs or the FPU against Intel's double wide FPU that can quad-issue 128-bit ops and dual issue 256-bit AVX.


FordGT90Concept said:


> It would still be a 4-threaded core. A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.


So now we're letting Microsoft define a core? Are you ever going to make up your mind or are you going to keep changing it to suit your argument?


FordGT90Concept said:


> Go ahead and run your benchmarks then. I'm waiting. Here's the post, by the way. Spoiler: it will never reach 95%+ that an actual dual core would.


Spoiler: Most multi-threaded workloads that aren't purely parallel in nature will never have 100% speed up indefinitely. More cores means more overhead.


----------



## FordGT90Concept (Oct 3, 2016)

Aquinus said:


> Once again, as the guy from AMD said in an interview, 90% of the work CPUs handle is integer in nature (and my work as a software engineer aligns with this statement.)


He also said blocking was possible.  Cores never block other cores ergo not a dual core.



Aquinus said:


> 2 ALUs and 2 AGUs makes them skinny cores just as the single issue 256-bit FMA FPU (which can be split into dual issue 128-bit,) is a skinny FPU. They're also independent ALUs and AGUs which can receive their own instructions which feels a whole lot like a core. They have their own registers, its own control lines, and even its own instruction cache. Even the way that they scale feels, smells, and tastes like cores and not SMT. They're also not wide cores if you're comparing the integer pipeline against Haswell's 4 ALUs and 3 AGUs or the FPU against Intel's double wide FPU that can quad-issue 128-bit ops and dual issue 256-bit AVX.


Except that those "cores" don't understand x86 instructions.  They understand opcodes given to them by the instruction decoder and fetcher.  On the other hand, a real core (even the POWER7 and POWER8 behemoths) has the hardware to interpret instruction to a result without leaving the core.  So either AMD's definition is wrong or Intel, IBM, ARM Holdings, and Sun are wrong.  Considering IBM produces chips that are nearly identical to Bulldozer with four integer clusters and they don't call that a quad-core, I'd say AMD is definitively wrong.



Aquinus said:


> So now we're letting Microsoft define a core? Are you ever going to make up your mind or are you going to keep changing it to suit your argument?


All modern operating systems call FX-8350 a quad-core with 8 logical processors, not just Windows.  When *nix has to work on POWER7 and Bulldozer, are they really going to use AMD's marketing terms to describe what is actually there?  I'd hope not.



Aquinus said:


> Spoiler: Most multi-threaded workloads that aren't purely parallel in nature will never have 100% speed up indefinitely. More cores means more overhead.


Asyncronous multithreading is always capable of loading systems to 100% so long as it can spawn enough threads and those threads are sufficiently heavy.  Overhead is only encountered at the start in the main thread and at the end of the worker thread (well under 1% of compute time).


----------



## cdawall (Oct 3, 2016)

FordGT90Concept said:


> Considering IBM produces chips that are nearly identical to Bulldozer with four integer clusters and they don't call that a quad-core, I'd say AMD is definitively wrong.



Not to nit pick, but isn't this the exact opposite of what you said earlier? I though AMD was the only CPU to ever attempt this...



FordGT90Concept said:


> They are wide cores. This lawsuit will likely force AMD to call them cores too.



Doubtful. AMD can create words to describe things just as well as the next guy. If AMD can't call what they consider a module a module, I guess Intel will have to ditch HyperThreading in favor for SMT. That is literally what you are saying needs to happen.



FordGT90Concept said:


> It would still be a 4-threaded core. A lot of enterprise RISC processors already handle 8-threads per core (many FPUs and ALUs in each) so that isn't exactly new.



Difference is those only have ONE integer and ONE FPU, not TWO and ONE.



FordGT90Concept said:


> Go ahead and run your benchmarks then. I'm waiting. Here's the post, by the way. Spoiler: it will never reach 95%+ that an actual dual core would.



I was very specific with the workloads that would show near 100% scaling, I would wager you cannot prove me wrong, but after reading your argument you find one useless benchmark (not real world scenario) that only uses the FPU for calculations and claim I am incorrect. As has been said a multitude of times the FPU isn't used for the majority of calculations. The real issue behind AMD isn't the configuration of the modules it is the shit design of the internal cores themselves. The module works excellent and if they were stronger cores the pure idea of this lawsuit wouldn't even exist. That my friend is actually the basic design of Zen mind you.


----------



## FordGT90Concept (Oct 3, 2016)

cdawall said:


> Not to nit pick, but isn't this the exact opposite of what you said earlier? I though AMD was the only CPU to ever attempt this...


Didn't know about POWER7/8 previously.



cdawall said:


> Doubtful. AMD can create words to describe things just as well as the next guy. If AMD can't call what they consider a module a module, I guess Intel will have to ditch HyperThreading in favor for SMT. That is literally what you are saying needs to happen.


You can't call donkeys elephants and sell them as elephants without getting sued.  AMD did as much, and got sued.



cdawall said:


> Difference is those only have ONE integer and ONE FPU, not TWO and ONE.


Sure doesn't look like it in the diagram.  It actually looks like there are two (one is just math, the other is math + load/store) and each one is two-wide.  The only difference is that IBM didn't draw a box around it and say "herp, derp, dis is a 'core'"



cdawall said:


> I was very specific with the workloads that would show near 100% scaling, I would wager you cannot prove me wrong.


I don't have FX-8350 to test.  I've written a lot of programs that get near 100% scaling.  Random Password Generator would actually be a pretty good test for this.

10 million attempts
uncheck special characters
check require special charaters (creates an unsolvable situation)
*minimum characters 32* Edit: Added this one because it can massively impact time if it randomly does a lot of short ones
5.9142104
Disabled even number cores in Task Manager (it still spawns 8 threads)
13.2191610

123% faster

I'm gonna add a thread limiter to make it easier to test...


----------



## 64K (Oct 3, 2016)

FordGT90Concept said:


> You can't call donkeys elephants and sell them as elephants without getting sued.  AMD did as much, and got sued.



That's true and this madness needs to end right now. I'm fed up with Door To Door Donkey Salesmen trying to swindle me out of my hard earned money.

When this trial ends I wager that there will be a legal definition of a core if nothing else. It will be interesting to watch AMD backpedal at that time.


----------



## cdawall (Oct 3, 2016)

FordGT90Concept said:


> Didn't know about POWER7/8 previously.



Obviously...




FordGT90Concept said:


> You can't call donkeys elephants and sell them as elephants without getting sued.  AMD did as much, and got sued.



This is they didn't call a donkey an elephant they stepped away from what you believe is the status quo and produced something that was scalable in a way no manufacturer had done before.



FordGT90Concept said:


> Sure doesn't look like it in the diagram.  It actually looks like there are two (one is just math, the other is math + load/store) and each one is two-wide.  The only difference is that IBM didn't draw a box around it and say "herp, derp, dis is a 'core'"



You do know why IBM doesn't have to draw boxes around things and explain what cores are correct? Thing about enterprise level equipment is function matters not the nonsense this lawsuit is about. I say it again this is literally an argument of a definition that doesn't exist.



FordGT90Concept said:


> I don't have FX-8350 to test.  I've written a lot of programs that get near 100% scaling.  Random Password Generator would actually be a pretty good test for this.



I will give it a shot. I am slightly curious how much of a difference task schedulers make in the situation as well.



64K said:


> That's true and this madness needs to end right now. I'm fed up with Door To Door Donkey Salesmen trying to swindle me out of my hard earned money.
> 
> When this trial ends I wager that there will be a legal definition of a core if nothing else. It will be interesting to watch AMD backpedal at that time.



Thing is all AMD has to do is stand strong instead of backpedaling. If they strong arm the lawsuit they will win, if they back pedal it will be assumed due to known guilt.


----------



## FordGT90Concept (Oct 3, 2016)

cdawall said:


> You do know why IBM doesn't have to draw boxes around things and explain what cores are correct? Thing about enterprise level equipment is function matters not the nonsense this lawsuit is about. I say it again this is literally an argument of a definition that doesn't exist.


No, it's because IBM knew they couldn't get away with selling the chip as a 16 or 32 "core" processor when it clearly only has 8 cores.  You know, like FX-8350 clearly only have 4 cores.



cdawall said:


> Thing is all AMD has to do is stand strong instead of backpedaling. If they strong arm the lawsuit they will win, if they back pedal it will be assumed due to known guilt.


You don't think Seagate tried to do the same when sued over HDD capacity?  There is no path for AMD to win here.

This is debugging data...I'll upload updated program shortly...
1: 24.6987115
2: 13.1477996
3: 9.1914374
4: 7.4688438
5: 6.8086950
6: 6.2363480
7: 5.8927118
8: 5.7746498
...the application is working correctly.  Big jumps between 1-4 where there's an actual core to do the work.  Small jumps between 5-8 where HTT is kicking in.  Beyond that, performance is expected to fall because the threads are fighting each other for time.

...once W1z lets me edit it that is...


1.1.4, 6700K, final...
8: 5.6961283
7: 5.7390397
6: 6.2014922
5: 6.7108575
4: 7.1342991
3: 8.8729954
2: 12.6389990
1: 24.1833987


----------



## cdawall (Oct 3, 2016)

How do you edit minimum number of characters?


----------



## FordGT90Concept (Oct 3, 2016)

Click on the row and it should become editable.

I'll attach 1.1.4 here until I can edit the other thread.  Edit: Download here: https://www.techpowerup.com/forums/threads/random-password-generator.164777/page-2


Edit: I'm ready for FX-8350 data.

It does have up to 20% overhead per core.


----------



## Aquinus (Oct 4, 2016)

FordGT90Concept said:


> He also said blocking was possible. Cores never block other cores ergo not a dual core.


DMA is blocking and memory writes are usually write-back to cache, so does a shared L2 negate the possibility of being a core?


FordGT90Concept said:


> Except that those "cores" don't understand x86 instructions. They understand opcodes given to them by the instruction decoder and fetcher. On the other hand, a real core (even the POWER7 and POWER8 behemoths) has the hardware to interpret instruction to a result without leaving the core. So either AMD's definition is wrong or Intel, IBM, ARM Holdings, and Sun are wrong. Considering IBM produces chips that are nearly identical to Bulldozer with four integer clusters and they don't call that a quad-core, I'd say AMD is definitively wrong.


POWER7 is only a behemoth in the sense that it has a strangely large number of discrete FPUs but, the smallest constant unit is the fixed point unit or combo of ALUs and AGUs. IBM produces CPUs that actually have a pretty large amount of floating point hardware given the fact that it's a general purpose CPU.


FordGT90Concept said:


> All modern operating systems call FX-8350 a quad-core with 8 logical processors, not just Windows. When *nix has to work on POWER7 and Bulldozer, are they really going to use AMD's marketing terms to describe what is actually there? I'd hope not.


You say that like it's because of the definition of a core and not for the sake of how processes are scheduled in the kernel.


FordGT90Concept said:


> Asyncronous multithreading is always capable of loading systems to 100% so long as it can spawn enough threads and those threads are sufficiently heavy. Overhead is only encountered at the start in the main thread and at the end of the worker thread (well under 1% of compute time).


That depends on how the application is architected. Most applications don't have 100% independent threads and even if they do, they usually require getting joined by a control thread that completes the calculation or whatever is going on. That one thread is going to wait for all the other dispatched ones to complete. Purely functional workloads are going to benefit the most from multiple cores because they have properties that allow for good memory locality (data will primarily reside in cache.) I've been writing multithreaded applications for several years now and I can tell you that in most cases, these workloads aren't purely async. More often than not, there are contested resources that limit overall throughput. Applications that can be made to be purely functional are prime examples of things that should be run on the GPU because of the lack of data dependencies on calculated values.

As a developer, if I have a thread that is not limited in most situation and will almost always give me a speed up of over 50% versus another thread, I consider it a core. It's tangible bandwidth that can be had and to me, that's all that matters. HTT only helps in select cases but more often than not, I can't get speed up beyond one or two threads over 4 on a quad-core Intel setup using hyperthreading where I can when bulldozer integer cores.

I'll agree with you that the integer core isn't what we've traditionally recognized as a core but, it has far too many dedicated resources to call it SMT. So while it might not be a traditional core, it's a lot more like a traditional core than like SMT.


----------



## FordGT90Concept (Oct 4, 2016)

Aquinus said:


> DMA is blocking and memory writes are usually write-back to cache, so does a shared L2 negate the possibility of being a core?


No because that can happen with any pool of memory with multiple threads accessing it.



Aquinus said:


> POWER7 is only a behemoth in the sense that it has a strangely large number of discrete FPUs but, the smallest constant unit is the fixed point unit or combo of ALUs and AGUs. IBM produces CPUs that actually have a pretty large amount of floating point hardware given the fact that it's a general purpose CPU.


It has a lot of hardware on both accounts.  Unlike UltraSPARC T1, it was designed to do well at everything...so long as it could be broken into a lot of threads.



Aquinus said:


> You say that like it's because of the definition of a core and not for the sake of how processes are scheduled in the kernel.


They go hand-in-hand.  Because a "module" really represents a "core" operating systems need to issue heavy threads to each core before scheduling a second heavy thread to the same cores.  Windows XP (or was it 7?) got a patch to fix that on Bulldozer because the order of the cores reported to the operating system differed from HTT's.  The OS needs to treat the two technologies similarly to maximize performance.



Aquinus said:


> Most applications don't have 100% independent threads and even if they do, they usually require getting joined by a control thread that completes the calculation or whatever is going on.


Virtually every application I multithread does so asynchronously.  The only interrupt is updating on progress (worker thread invokes main thread with data which the main thread grabs and carries out).  That is probably why there is up to a 20% hit.  I suppose I could reduce the number of notifications but...meh.  10 million 32 character passwords generated in <6 seconds is good enough. 

SMT is a concept, not a hardware design.  SMT is what Bulldozer does (two threads in one core).


----------



## Aquinus (Oct 4, 2016)

FordGT90Concept said:


> No because that can happen with any pool of memory with multiple threads accessing it.


That doesn't mean that latency is going to be consistent between cores without shared cache. Common L2 makes it advantageous to call two integer cores as a *pair* of logical cores because they share a local cache. Context switching between those two cores will result in better cache hit rates because data is likely to already reside in L2 if it was used on the other integer core. That improves performance because accessing memory is always slower than hitting cache. It improves latency because you're preserving memory locality, not because you don't understand that an integer core is that thing that does most of the heavy lifting in a general purpose CPU.


FordGT90Concept said:


> It has a lot of hardware on both accounts. Unlike UltraSPARC T1, it was designed to do well at everything...so long as it could be broken into a lot of threads.


Except it doesn't do everything well. It has a huge emphasis on floating point performance which the POWER7 might be able to keep up with Haswell with but, when it comes to integer performance, it gets smacked down just like AMD does with floating point performance. Intel is successful because it beefs the hell out of its cores. It doesn't mean that what AMD provided is not a core, it just means that it's ability to compute per clock cycle is less than Intel's due to the difference inside the cores themselves, not because AMD hasn't produced something that can operate independently. If I really need to dig it out, I can pull up the table of dispatch rates per clock cycle for the most common x86 instructions on several x86-based CPUs. Haswell straight down dominates everything because it can do the most per clock cycle which makes sense because the FPU is gigantic and Intel has just been adding ALUs and AGUs.


FordGT90Concept said:


> They go hand-in-hand. Because a "module" really represents a "core" operating systems need to issue heavy threads to each core before scheduling a second heavy thread to the same cores. Windows XP (or was it 7?) got a patch to fix that on Bulldozer because the order of the cores reported to the operating system differed from HTT's. The OS needs to treat the two technologies similarly to maximize performance.


As I stated earlier, it's due to memory locality. The Core 2 Quad, being two C2D dies on one chip, could have had improved performance by using this tactic as well because context switching on cores with a shared cache is faster than where there isn't. This isn't because they're not real cores, it's for scheduling purposes but, I'm sure you work with kernels and think about process scheduling all the time and would know this so, I'm just preaching to the choir.


FordGT90Concept said:


> Virtually every application I multithread does so asynchronously. The only interrupt is updating on progress (worker thread invokes main thread with data). That is probably why there is up to a 20% hit. I suppose I could reduce the number of notifications but...meh. 10 million 32 character passwords generated in <6 seconds is good enough.


Sounds like a real world situation to me and by no means a theoretical one. 


FordGT90Concept said:


> SMT is a concept, not a hardware design. SMT is what Bulldozer does (two threads in one core).


SMT is most definitely a hardware design and to say otherwise is insane. Are you telling me that Intel didn't make any changes to their CPUs to support hyper-threading? That is a boatload of garbage. Bulldozer is two cores with shared hardware, most definitely not SMT. SMT is making your core wider to allow for instruction level parallelism which SMT can take advantage of during operations that don't utilize the entire core or eat up the entire part of a single stage of the pipeline. Bulldozer has dedicated hardware and registers. SMT implementations most definitely don't have a dedicated set of registers, ALUs, and AGUs. They utilize the extra hardware already in the core to squeeze in more throughput which is why hyperthreading gets you anywhere between 0 and 40% performance of a full core.


----------



## FordGT90Concept (Oct 4, 2016)

Aquinus said:


> That doesn't mean that latency is going to be consistent between cores without shared cache. Common L2 makes it advantageous to call two integer cores as a *pair* of logical cores because they share a local cache. Context switching between those two cores will result in better cache hit rates because data is likely to already reside in L2 if it was used on the other integer core. That improves performance because accessing memory is always slower than hitting cache. It improves latency because you're preserving memory locality, not because you don't understand that an integer core is that thing that does most of the heavy lifting in a general purpose CPU.


All caches, exist to improve performance.  The more caches, with progressively longer response times, the better overall performance will be.  L2 can be made part of the a core's design but it doesn't have to be.  A core generally only needs L1 caches.  An example of cores that have dedicated L1 and L2 is much of the Core I# family.  Here's Sandy Bridge:





Again, distinguishing feature of a core is nothing is shared.  L2 can't be considered part of Core 2 Duo's core because it is shared with the neighboring core.



Aquinus said:


> Except it doesn't do everything well. It has a huge emphasis on floating point performance which the POWER7 might be able to keep up with Haswell with but, when it comes to integer performance, it gets smacked down just like AMD does with floating point performance. Intel is successful because it beefs the hell out of its cores. It doesn't mean that what AMD provided is not a core, it just means that it's ability to compute per clock cycle is less than Intel's due to the difference inside the cores themselves, not because AMD hasn't produced something that can operate independently. If I really need to dig it out, I can pull up the table of dispatch rates per clock cycle for the most common x86 instructions on several x86-based CPUs. Haswell straight down dominates everything because it can do the most per clock cycle which makes sense because the FPU is gigantic and Intel has just been adding ALUs and AGUs.


Which AMD did too, but stupidly required a separate thread to access them.



Aquinus said:


> Sounds like a real world situation to me and by no means a theoretical one.


Well, you made me test it (everything else the same):
8: 5.6961283 sec -> 5.2848560 sec
1: 24.1833987 -> 24.1773771 sec

It stands to reason that 7 threads wouldn't see that difference because only where the main thread lies does it compete with the worker thread.  8 is still faster than 7 so...it's just a boost cutting out the UI updates.




Aquinus said:


> SMT is most definitely a hardware design and to say otherwise is insane. Are you not telling me that Intel didn't make any changes to their CPUs to support hyper-threading? That is a boatload of garbage. Bulldozer is two cores with shared hardware, most definitely not SMT. SMT is making your core wider to allow for instruction level parallelism which SMT can take advantage of during operations that don't utilize the entire core or eat up the entire part of a single stage of the pipeline. Bulldozer has dedicated hardware and registers. SMT implementations most definitely don't have a dedicated set of registers, ALUs, and AGUs. They utilize the extra hardware already in the core to squeeze in more throughput which is why hyperthreading gets you anywhere between 0 and 40% performance.


SMT isn't defined by one implementation.  It does describe HTT and Bulldozer well.  Bulldozer takes away from single-threaded performance to boost multi-threaded performance where Intel does the opposite.  At the end of the day, they are different means to the same end (more throughput without adding additional cores).


----------



## cdawall (Oct 4, 2016)

FordGT90Concept said:


> SMT is a concept, not a hardware design. SMT is what Bulldozer does (two threads in one core).



but it has two integer clusters that not only behave, but look like cores that merely lack an FPU which isn't used for 90% of instruction sets?

And again it can process two threads per core or 4 per module.


----------



## Aquinus (Oct 4, 2016)

FordGT90Concept said:


> All caches, exist to improve performance. The more caches, with progressively longer response times, the better overall performance will be. L2 can be made part of the a core's design but it doesn't have to be. A core generally only needs L1 caches. An example of cores that have dedicated L1 and L2 is much of the Core I# family. Here's Sandy Bridge:


That's not the point. There are benefits to scheduling processes that on cores with a shared cache. It doesn't really matter if you consider it to be part of the core or not. Where it is and how it operates is all that matters and what matters is that calling two cores logical pairs has the benefit of using local cache improving hit rates which improves overall performance. You're pretty picture doesn't really add anything to the discussion, it just shows that you know how to use Google.


FordGT90Concept said:


> Which AMD did too, but stupidly required a separate thread to access them.


What are you talking about? AMD did the exact opposite by sharing an FPU and doubling the number of dedicated integer cores. IBM put an emphasis of doing pseudo-GPGPU-like floating point parallelism on the CPU where AMD put an emphasis on independent integer operation. You're comparing these two like they're the same but they're almost as different as a CPU versus a GPU.


FordGT90Concept said:


> Well, you made me test it (everything else the same):
> 8: 5.6961283 sec -> 5.2848560 sec
> 1: 24.1833987 -> 24.1773771 sec


Feels pretty theoretical to me. I'm sure that's serving some purpose in the real world that's earning someone money.


FordGT90Concept said:


> SMT isn't defined by one implementation. It does describe HTT and Bulldozer well. *Bulldozer takes away from single-threaded performance to boost multi-threaded performance where Intel does the opposite*. At the end of the day, they are different means to the same end (more throughput without adding additional cores).


SMT is defined by the implementation just as discrete computational core is. This isn't software we're talking about. The bold part is exactly what happened but, that doesn't mean they're not cores.


----------



## FordGT90Concept (Oct 4, 2016)

cdawall said:


> but it has two integer clusters that not only behave, but look like cores that merely lack an FPU which isn't used for 90% of instruction sets?


They do not behave like nor look like cores and FPU is a major exclusion.



cdawall said:


> And again it can process two threads per core or 4 per module.


It cannot.



Aquinus said:


> That's not the point. There are benefits to scheduling processes that on cores with a shared cache. It doesn't really matter if you consider it to be part of the core or not. Where it is and how it operates is all that matters and what matters is that calling two cores logical pairs has the benefit of using local cache improving hit rates which improves overall performance. You're pretty picture doesn't really add anything to the discussion, it just shows that you know how to use Google.


You're talking code that would have to be written for specific processors.  That's not something that generally happens in the x86 world.  I doubt even Intel's compiler (which is generally considered the best) exploits the shared L2 of Core 2 Duo in the way you are claiming.



Aquinus said:


> What are you talking about? AMD did the exact opposite by sharing an FPU and doubling the number of dedicated integer cores. IBM put an emphasis of doing pseudo-GPGPU-like floating point parallelism on the CPU where AMD put an emphasis on independent integer operation. You're comparing these two like they're the same but they're almost as different as a CPU versus a GPU.


POWER7 pretty clearly has at least two integer clusters.  The only difference between Bulldozer and POWER7 is that POWER7 has a "Unified Issue Queue" where Bulldozer had three separate schedulers (two integer and one floating).  That said,  each unit could have it's own scheduler (not finding anything that details the inner workings of the units).


> There are a total of 12 execution units within each core: two fixed-point units, two loadstore units, four double-precision floatingpoint unit (FPU) pipelines, one vector, one branch execution unit (BRU), one condition register logic unit (CRU), and one decimal floating-point unit pipeline. The two loadstore pipes can also execute simple fixedpoint operations. The four FPU pipelines can each execute double-precision multiplyadd operations, accounting for 8 flops/cycle per core.


http://www.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=kalla_power7.pdf
It has quite the mix of hardware accelerating pretty much every conceivable task.



Aquinus said:


> SMT is defined by the implementation just as discrete computational core is. This isn't software we're talking about. The bold part is exactly what happened but, that doesn't mean they're not cores.


I've yet to see any evidence that proves the module isn't a core and much to the contrary.


----------



## BiggieShady (Oct 4, 2016)

FordGT90Concept said:


> I've yet to see any evidence that proves the module isn't a core and much to the contrary.


While you are at it you should also seek evidence if a big rock and the small rock are both rocks, when clearly you can fit several small rocks inside of a big rock. (almost went with the car analogy  because rocks have no inner workings but hey, they are silicon and also monolithic albeit not by design).
Nobody expects for small car to do same as a big car, but when it comes to cores people are suddenly acting like they are dealing with SI units, people valuing cpus by the core count might as well pay for cpus by the kilogram ... btw, I'm gladly trading one kilogram of celerons for same mass of i7s.
Maybe good automotive analogy would be 8 cylinder engine using one spark plug for pair of cylinders twice as often 

Anyway, operating systems are always dealing with pairs of logical processors to accommodate all possible (existing and non yet existing) physical organizations of execution units in modern super scalar cpu where both thread data dependency and pure thread parallelism are exploited for optimal gains in all scenarios. This setup is too generic and way too flexible to use logical processor from the OS as an argument in this case. AMD half a module core is a core albeit less potent and less scalable, it's not hyper threading - it's more scalable. Only in terms of scaling you could argue AMD core is less of a core than what is the norm (hence my market share tangent - intel is the norm). So the underdog in the duopoly is not putting enough asterisks and fine print on the marketing material = slap on the wrist (symbolic restitution and obligatory asterisk with fine print for the future*)

* may scale differently with different types of workload


----------



## Aquinus (Oct 4, 2016)

FordGT90Concept said:


> You're talking code that would have to be written for specific processors. That's not something that generally happens in the x86 world. I doubt even Intel's compiler (which is generally considered the best) exploits the shared L2 of Core 2 Duo in the way you are claiming.


You have absolutely no idea what you're talking about, Ford. The application doesn't need to know anything about cache because it's used automatically. When a memory access occurs, cache is usually hit first because latency to check it is relatively fast. A thread moving from one core to another core on the same L2 is likely to have better hit rates at lower latencies because it's using write-back data from when it was executing on the other core. There is no code that has to be written to do this, it just happens because when the memory address is looked up, is in a cached range, and exists, it will use it. I find it laughable you think the compiler is responsible for this. It's not like software is recompiled to handle different cache configurations.


FordGT90Concept said:


> They do not behave like nor look like cores and FPU is a major exclusion.


Actually for general purpose computation, it's not a major execution unit because the core can run without it. Just because you think it's necessary doesn't mean everyone agrees with you. The FPU also has never been treated as a core, always as an addition to it and additions can be removed.


FordGT90Concept said:


> It cannot.


I'm pretty sure he meant a thread per core and two threads per module and yes, it can. Just because speed up isn't perfect doesn't mean that it isn't but, speed up is a hell of a lot better than just about every SMT implementation.


FordGT90Concept said:


> POWER7 pretty clearly has at least two integer clusters.


Two fixed point units and two load store units is another way of saying two ALUs and two AGUs.


FordGT90Concept said:


> http://www.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?media=kalla_power7.pdf
> It has quite the mix of hardware accelerating pretty much every conceivable task.


Interesting, other than being able to partition cores into virtual CPUs for the purpose of dispatching along with an actual SMT implementation, it sounds exactly like x86. You do realize this is exactly how just about every implementation of a super scalar architecture begins but, I'm sure you'll Google that in no time.


FordGT90Concept said:


> I've yet to see any evidence that proves the module isn't a core and much to the contrary.


That's because you're, a: drawing hard lines on something that's a bit arm wavy and a bit vague and b: using Google to help you make that case. That doesn't mean that you understand what you're reading even if you think you do. Just because you read it on the internet doesn't instantly make you an expert on the subject, it means you know how to use Google.


----------



## cdawall (Oct 4, 2016)

Aquinus said:


> I'm pretty sure he meant a thread per core and two threads per module and yes, it can. Just because speed up isn't perfect doesn't mean that it isn't but, speed up is a hell of a lot better than just about every SMT implementation.



Nope 2 and 4 is what the scheduler can handle. With many many more in the queue.


----------



## FordGT90Concept (Oct 4, 2016)

Aquinus said:


> You have absolutely no idea what you're talking about, Ford. The application doesn't need to know anything about cache because it's used automatically. When a memory access occurs, cache is usually hit first because latency to check it is relatively fast. A thread moving from one core to another core on the same L2 is likely to have better hit rates at lower latencies because it's using write-back data from when it was executing on the other core. There is no code that has to be written to do this, it just happens because when the memory address is looked up, is in a cached range, and exists, it will use it. I find it laughable you think the compiler is responsible for this. It's not like software is recompiled to handle different cache configurations.


Oh, you're talking context switching.  Most of the data is going to be on in L3 which virtually all desktop processors have now.  This is why the Core I# series has a small L1, small L2 and big L3.  Only the L3 is shared across all of the cores.  That said, some architectures let cores make requests of other core's L2 for this very purpose.



Aquinus said:


> Interesting, other than being able to partition cores into virtual CPUs for the purpose of dispatching along with an actual SMT implementation...


Except that IBM consistently calls the whole monolithic block a "core" accepting 8 threads.  You know, like sane people do. 



Aquinus said:


> That's because you're, a: drawing hard lines on something that's a bit arm wavy and a bit vague and b: using Google to help you make that case.


a) Only AMD is "wavy and a big vague" because they see profit in lying to the public.
b) I haven't used Google in a long time.



cdawall said:


> Nope 2 and 4 is what the scheduler can handle. With many many more in the queue.


I don't know whether that claims is true or not but I do know that if there is more than two threads in the core (as in "module"), those threads will have to be in a wait state.  Bulldozer can't execute more than two at a time.


----------



## BiggieShady (Oct 4, 2016)

FordGT90Concept said:


> I don't know whether that claims is true or not but I do know that if there is more than two threads in the core (as in "module"), those threads will have to be in a wait state. Bulldozer can't execute more than two at a time.


Confusion comes from the fact it can dispatch 16 instructions per clock which means nothing for superscalar processor core count wise ... other than whole super scaling aspect additionally mudding the definition of a core ... add to that list also using word thread for both hardware thread and software thread


----------



## Aquinus (Oct 4, 2016)

FordGT90Concept said:


> Oh, you're talking context switching. Most of the data is going to be on in L3 which virtually all desktop processors have now. This is why the Core I# series has a small L1, small L2 and big L3. Only the L3 is shared across all of the cores. That said, some architectures let cores make requests of other core's L2 for this very purpose.


Or maybe a smaller L2 is faster, takes up less room, and has better latency characteristics than a large one. When L2 is large, you want hit rates to be high because going down to L3 or memory is going to be extra costly given the initial added latency for accessing a larger SRAM array. Switching contexts to a core with a common cache improves performance more than you would think because the further away you get from it, the more time it's going to take to get data in that context. It's the same reason why you have the kernel aware of "cores" and "processors" because generally speaking, switching between processors is less costly than switching between cores within a processor which is less costly than switching between logical cores. It's just exploiting how a kernel scheduler works.


FordGT90Concept said:


> Except that IBM consistently calls the whole monolithic block a "core" accepting 8 threads. You know, like sane people do.


That's because any less integer hardware and it couldn't do much of anything at all by itself. 


FordGT90Concept said:


> a) Only AMD is "wavy and a big vague" because they see profit in lying to the public.


Slimming out their cores to get more of them isn't misleading the public. The public in general simply doesn't understand what more cores means and it doesn't always mean better performance. That's not AMD's fault. Maybe Intel should be sued for Netburst being shit despite having high clocks. "But it runs at 3.6Ghz!" Give me a freaking break and tell people to stop being so damn lazy and learn about what they're using.


FordGT90Concept said:


> b) I haven't used Google in a long time.


I'm sure you have all of those images stored on your hard drive, ready to go a moments notice. 


FordGT90Concept said:


> I don't know whether that claims is true or not but I do know that if there is more than two threads in the core (as in "module"), those threads will have to be in a wait state. Bulldozer can't execute more than two at a time.


Being a super-scalar CPU, it can execute several instructions at once but, it depends on which instructions they are and how they're ordered and that's per integer core. They have their own dedicated hardware that can do multiple instructions at once depending at which part of the pipeline is going to be utilized. Two different mechanisms to handle superscalar instruction level parallelism, to me, says core. Each integer core having its own L1-d cache also seems to indicate to me a core since a core cares about its own data and not the other cores on the calculation at hand.


----------



## FordGT90Concept (Oct 4, 2016)

Still waiting on FX-8### data.  Chart really doesn't prove anything without it.


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> Still waiting on FX-8### data.  Chart really doesn't prove anything without it.



I just set my FX9370 back up at home, haven't had time to test it yet.


----------



## BiggieShady (Oct 5, 2016)

I'm also interested in benchmark numbers and scaling @cdawall, although I'm not sure about effect of @FordGT90Concept application being .NET based. Instructions are in common intermediate language executed on stack based "virtual machine" process running on register based CPU. We must assume .NET runtime is well optimized for bulldozer arch (or maybe someone knows ).

Sadly, there are not many compiler flags usable for bulldozer in windows even when built directly to machine code with ms compiler. All we have is generic optimizations, more generic /favor:AMD64 and some more generic /arch:[IA32|SSE|SSE2|AVX|AVX2]

Linux folks like bulldozer bit more because they have GCC with "magical" *-march=bdver1* compiler option, and AMD's own continuation of Open64 compiler ... also all libraries are easily rebuilt in the appropriate "flavor"


----------



## FordGT90Concept (Oct 5, 2016)

This uses WPF so the only way it would work on Linux is emulated.


----------



## cdawall (Oct 5, 2016)

I used it on my 5960x and couldn't get consistent results...


----------



## BiggieShady (Oct 5, 2016)

FordGT90Concept said:


> This uses WPF so the only way it would work on Linux is emulated.


Of course, best you can do is port long running thread job part of your app to c or c++, build win32 DLL using MinGW with gcc 4.7 and bulldozer compile flags, then use [DllImport] in your .net app  and I wouldn't wish that on anyone.

Also WinForms is not WPF.

Coincidentally, as I felt like I have seen this before I managed to find an almost year old similar case http://www.leagle.com/decision/In FDCO 20160408M22/DICKEY v. ADVANCED MICRO DEVICES, INC.
and it was dismissed:


> ...the court GRANTS defendant's motion to dismiss with leave to amend.


----------



## cdawall (Oct 5, 2016)

BiggieShady said:


> Of course, best you can do is port long running thread job part of your app to c or c++, build win32 DLL using MinGW with gcc 4.7 and bulldozer compile flags, then use [DllImport] in your .net wpf app  and I wouldn't want that to anyone.
> 
> Also WinForms is not WPF.
> 
> ...






			
				california court said:
			
		

> _*C. Fraud-Based Claims*
> Assuming that California law applies to plaintiff's claims, defendant argues that plaintiff's CLRA, UCL, FAL, fraudulent inducement, and negligent misrepresentation causes of action must be dismissed due to plaintiff's failure to plead key elements of fraud. Dkt. No. 27 at 13. The Ninth Circuit has held that "where a complaint includes allegations of fraud, Federal Rule of Civil Procedure 9(b) requires more specificity including an account of the time, place, and specific content of the false representations as well as the identities of the parties to the misrepresentations." Swartz v. KPMG LLP, 476 F.3d 756, 764 (9th Cir. 2007) (citation omitted). Plaintiff's claims sound in fraud, and are thus subject to Rule 9(b)'s pleading requirements. See Pirozzi v. Apple Inc., 913 F.Supp.2d 840, 850 (N.D. Cal. 2012) ("Plaintiff's claims under the UCL, FAL, CLRA, and for Negligent Misrepresentation . . . sound in fraud, and are subject to the heightened pleading requirements of Rule 9(b).") (citing Kearns v. Ford Motor Co., 567 F.3d 1120, 1127 (9th Cir. 2009)).
> 
> "n order to be deceived, members of the public must have had an expectation or an assumption about the matter in question." Daugherty v. Am. Honda Motor Co., 144 Cal.App.4th 824, 838 (2006) (citation omitted). Under California law, "a class representative proceeding on a claim of misrepresentation as the basis of his or her UCL action must demonstrate actual reliance on the allegedly deceptive or misleading statements, in accordance with well-settled principles regarding the element of reliance in ordinary fraud actions." In re Tobacco II Cases, 46 Cal.4th 298, 306 (2009)
> ...



and this section is why it will just get dismissed again.


----------



## FordGT90Concept (Oct 5, 2016)

BiggieShady said:


> Coincidentally, as I felt like I have seen this before I managed to find an almost year old similar case http://www.leagle.com/decision/In FDCO 20160408M22/DICKEY v. ADVANCED MICRO DEVICES, INC.
> and it was dismissed:


That's the same case but it appears that Dickey is in Alabama but the lawsuit was in California.  As a result of that, amendments had to be made to the claim and refile the suit.  Here's the whole quote:


> For the foregoing reasons, the court GRANTS defendant's motion to dismiss with leave to amend. Within 14 days, plaintiff shall submit an amended complaint that corrects the deficiencies identified in this order. Furthermore, a case management conference will be held on May 13, 2016 at 10:30 a.m. The parties shall submit a joint case management statement by May 6, 2013.


That's why it is on-going.


Dickey et. al. needs to expand on this: 





> The court agrees with defendant that the alleged statements by AMD cited in the complaint do not suggest that AMD told consumers that a core had to be completely independent from other cores and could not share any resources.


I'd argue that the word "core" explicitly means independence to the public.  To say otherwise, is to let AMD define the word in a way that is inconsistent with competitor offerings and even their own previous offerings.


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> I'd argue that the word "core" explicitly means independence to the public. To say otherwise, is to let AMD define the word in a way that is inconsistent with competitor offerings and even their own previous offerings.



L1, L2, L3, L4/edram would all be shared resources.


----------



## FordGT90Concept (Oct 5, 2016)

L1 usually isn't shared.  L2 can belong to a single core but sometimes not.  L3 and L4 are never claimed by a single core at this point.

It's pretty easy to tell what is shared and what isn't by comparing number of pools of L2 and up with number of cores in the system.  It should be 1:1.  Bulldozer split the L1 data cache from what should be 32-64 KB to two 16 KB caches.


----------



## BiggieShady (Oct 5, 2016)

Ha, so it's the same Dicky winning the case by citing Tom's Hardware 
it was funny until I remembered the same company now owns AnandTech ... now it's just sad


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> L1 usually isn't shared.  L2 can belong to a single core but sometimes not.  L3 and L4 are never claimed by a single core at this point.
> 
> It's pretty easy to tell what is shared and what isn't by comparing number of pools of L2 and up with number of cores in the system.  It should be 1:1.  Bulldozer split the L1 data cache from what should be 32-64 KB to two 16 KB caches.



Read what you quoted.



> The court agrees with defendant that the alleged statements by AMD cited in the complaint do not suggest that AMD told consumers that a core had to be completely independent from other cores and could not share *any* resources.



ANY resources, I know what the argument is aimed at (shared FPU), but a resource is a resource can't nitpick.


----------



## FordGT90Concept (Oct 5, 2016)

Bulldozer shares:
-L1 instruction
-Instruction Fetch
-Branch Predicition
-Predecode/Pick
-Instruction decoder
-Dispatch
-FPU
-Write Coalescing Cache
-Core Interface Unit







If you plucked the integer cluster out of the module and tried to use it by itself, you'd find it capable of little more than running a calculator.  It needs that shared instruction decoder or it is completely worthless in an x86 system.


----------



## cdawall (Oct 5, 2016)

As far as I know all of that can handle more than one instruction at a time...so what difference does it make if they are shared that part of the performance wouldn't change. The integer cores themselves are the weak point not the ability to force instructions down them.


----------



## FordGT90Concept (Oct 5, 2016)

Because nothing is shared in a core.  A core is incomplete if it can't go from instruction all of the way to result.  If an instruction decoder malfunctions, you lose 25% of your processor, not 12.5%.


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> Because nothing is shared in a core.  A core is incomplete if it can't go from instruction all of the way to result.  If an instruction decoder malfunctions, you lose 25% of your processor, not 12.5%.



According to whom?


----------



## FordGT90Concept (Oct 5, 2016)

Everything that is not a Bulldozer.  Even UltraSPARC T1 went instruction to result without leaving the core (page 16):
http://www.oracle.com/technetwork/systems/opensparc/t1-08-ust1-uasuppl-draft-p-ext-1537736.html
Floating-point work went to the "external interface" to reach the FPU.


----------



## Aquinus (Oct 5, 2016)

cdawall said:


> According to whom?


I think we know where this is going.


> _The problem with plaintiff's argument is that while plaintiff alleges that he relied on AMD's representations about the number of cores on a chip, the complaint does not allege that plaintiff believed that a core could not share resources *or that plaintiff even had a particular understanding of whatconstitutes a "core."* Accordingly, the court finds that plaintiff's allegations regarding reliance are insufficient to state a claim for fraud._


In other words, Dickey didn't know what the heck he was talking about and AMD's usage of terminology was valid as there is no precedent for assuming that a core is purely independent without any shared resources, nor did AMD claim that the core doesn't have shared resources.


FordGT90Concept said:


> This is all very irrelevant anyway because a core is a core, not an integer cluster. AMD, at best, is going to settle which means they don't admit guilt to misleading the public. At worse, it will go to court, AMD will lose, and they'll likely have to pay out hundreds of millions or billions for making consumers think they got twice what they got.


Remember when you said that? The court actually said practically the opposite.


> _Second, defendant argues, none of the statements by AMD that plaintiff cites asserts that a "core" must be an "independent processing unit" without any shared resources. Dkt. No. 27 at 5. Defendant points out that one of the industry articles cited in the complaint actually suggests that AMD's use of the term "core" was appropriate. See Compl. ¶ 34 n. 23 (citing http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043-3.html) ("To best accommodate its Bulldozer module, the company is saying that anything with its own integer execution pipelines qualifies as a core (no surprise there, right?), if only because most processor workloads emphasize integer math. I don't personally have any problem with that definition."). *The court agrees with defendant that the alleged statements by AMD cited in the complaint do not suggest that AMD told consumers that a core had to be completely independent from other cores and could not share any resources. While an alleged statement by AMD's competitor Intel cited in the complaint refers to "independent central processing units in a single computing environment," the complaint does not describe any statements by AMD that suggest complete independence of cores.*See Compl. ¶¶ 19-21._



Yeah, so you want to say this again?


FordGT90Concept said:


> Because nothing is shared in a core.



Sorry Ford, your hardline stance on what constitutes a core doesn't seem to hold up, just as Dickey's didn't, not that I expect this to change your mind.

In other words, what constitutes a core depends on the hardware, its architecture, and any claims made by the company. No surprise there to be honest.


----------



## FordGT90Concept (Oct 5, 2016)

> Defendant moves to dismiss on several grounds. Defendant argues that (1) Alabama law, not California law, governs plaintiff's claims, and so the UCL, FAL, and CRLA claims should be dismissed; (2) even under California law, plaintiff fails to state a claim for fraud; (3) the breach of warranty allegations fail to state a claim; and (4) plaintiff's unjust enrichment allegations fail to state a claim. Dkt. No. 27 at 2-3. The court addresses each of these arguments below.


It was dismissed for:
1, 3, 4) legalese
2) plaintiff should have gotten IBM, Intel, ARM, and Sun to testify instead of relying on Tom's Hardware
Which leads to this:


> For the foregoing reasons, the court GRANTS defendant's motion to dismiss with leave to amend. Within 14 days, plaintiff shall submit an amended complaint that *corrects the deficiencies* identified in this order.


Which is why the case is still open:
https://www.pacermonitor.com/public/case/9674725/Dickey_v_Advanced_Micro_Devices,_Inc


Dickey did a terrible job at making his case:


> Defendant argues that plaintiff fails to plead specific facts to show the basis for his expectations about "cores." Dkt. No. 27 at 13. First, defendant argues that plaintiff does not allege that he personally saw or relied on any statements by AMD indicating that a core is an independent processing unit without any shared resources. Id. at 5. *Nor does plaintiff allege that he personally believed that the cores on Bulldozer chips would not share resources when he bought them.* Id. Plaintiff responds by arguing that the complaint specifically alleges that plaintiff saw AMD's advertising about the number of cores in its Bulldozer chips and that he relied on that advertising in making his purchase. Dkt. No. 30 at 21-22 (citing Compl. ¶¶ 42-46). The problem with plaintiff's argument is that while plaintiff alleges that he relied on AMD's representations about the number of cores on a chip, *the complaint does not allege that plaintiff believed that a core could not share resources or that plaintiff even had a particular understanding of what constitutes a "core."* Accordingly, the court finds that plaintiff's allegations regarding reliance are insufficient to state a claim for fraud.


Dickey clearly didn't do his homework before filing suit.


Edit: I'm not certain he can win his case because of that testimony.  He had to go into the purchase "believing the would not share resources."  The logical conclusion is that he realized the performance was poor, did some searching on the internet, found the Tom's Hardware article, and sued without really thinking about it nor consulting experts on how to make the strongest case against AMD.  It doesn't prove AMD right, it proves Dickey didn't make his case.


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> Edit: I'm not certain he can win his case because of that testimony. He had to go into the purchase "believing the would not share resources." The logical conclusion is that he realized the performance was poor, did some searching on the internet, found the Tom's Hardware article, and sued without really thinking about it nor consulting experts on how to make the strongest case against AMD. It doesn't prove AMD right, it proves Dickey didn't make his case.



Actually winning a court case that says it is an 8 core CPU would mean they were correct. That's kinda how this works.


----------



## FordGT90Concept (Oct 5, 2016)

Dismissing isn't winning unless the plaintiff gives up (Dickey has not) or the judge closes the door to trying again (which was not).  The fact AMD and Dickey are trying to settle now suggests that AMD thinks there is a case to be made and they want to stop it.  AMD wants this behind them for Zen's launch.


----------



## BiggieShady (Oct 5, 2016)

FordGT90Concept said:


> Because nothing is shared in a core.


... it's shared in the module not in the core, each core owns shared module resources either all of them half of the time (front end) or half of them all the time (FPU) ... regarding definition of what core is, why would we suddenly start making fixed definitions in rapidly changing tech fields?


----------



## FordGT90Concept (Oct 5, 2016)

As I said, the "integer clusters" are useless without using hardware at "module" level; ergo, the "module" represents the "core."

Because all processors require decoders, prefetcher, ALUs, AGUs, and in most cases, FPUs, to qualify as a general processor.  If we start defining processors by integer clusters, there's going to be a race to cram as many integer clusters as they can into a processor--all of the blocking components be damned.  This is not a good path to travel down for consumers.  This is why Xeon Phi never made it to consumers.  Dumbed down cores aren't very useful to the public.


----------



## cdawall (Oct 5, 2016)

FordGT90Concept said:


> As I said, the "integer clusters" are useless without using hardware at "module" level; ergo, the "module" represents the "core."
> 
> Because all processors require decoders, prefetcher, ALUs, AGUs, and in most cases, FPUs, to qualify as a general processor.  If we start defining processors by integer clusters, there's going to be a race to cram as many integer clusters as they can into a processor--all of the blocking components be damned.  This is not a good path to travel down for consumers.  This is why Xeon Phi never made it to consumers.  Dumbed down cores aren't very useful to the public.



Bulldozer was designed to be in HPC clusters, not sit in consumer hands. AMD foolishly released them for consumers because well they have to have a consumer level CPU.


----------



## Aquinus (Oct 6, 2016)

FordGT90Concept said:


> Dickey clearly didn't do his homework before filing suit.


That first one that's highlighted was him admitting that he knew that there were shared resources before he bought it which makes fraud a hard nut to crack because AMD most definitely disclosed that bulldozer was going to have shared resources. It doesn't really matter if you consider it a core or not, he knew what he was buying regardless of what your definition of a core is. He didn't just not do his homework, he doesn't seem to understand that fraud requires being deceived and being deceived doesn't mean being ignorant about what you're reading.


FordGT90Concept said:


> Because all processors require decoders, prefetcher, ALUs, AGUs, and in most cases, FPUs, to qualify as a general processor.


Sure, but, invoking the unwritten rule that all of those things must be dedicated hardware to constitute a core is still the primary problem.


FordGT90Concept said:


> This is why Xeon Phi never made it to consumers. Dumbed down cores aren't very useful to the public.


The Xeon Phi never made it to the consumer because it's like GPGPU. Consumers don't really care about machine learning or HPC applications. To think consumers would have benefited from having a Xeon Phi in their system (a co-processor mind you,) is a pretty big stretch.


BiggieShady said:


> regarding definition of what core is, why would we suddenly start making fixed definitions in rapidly changing tech fields?


We wouldn't? I think this is really all just to help Ford sleep at night. Simply put, Ford was pretty insistent that not only Dickey was going to win but, that it would be a crushing defeat for AMD when it wasn't since even the first page of this now 19 page thread. I suspect that isn't the only thing Ford is wrong about regardless of whether we wants to admit that to himself or not and I think the hardliner attitude only makes it that much more apparent. Honestly, people who actually work in the field and are good at it understand that you can't think this way. As a developer, it doesn't really matter about the specifics about the core. If I can expect core-like performance characteristics as opposed to SMT-like characteristics, then I consider it a core. If BD was like hyper-threading in the sense that speed up is 0-40%, I would agree with Ford but, it's not. There is almost always speed up and it's more often than not, more than 50% which is better speed up versus what Intel's SMT implementation is capable of on a good day.


FordGT90Concept said:


> Dismissing isn't winning unless the plaintiff gives up (Dickey has not) or the judge closes the door to trying again (which was not).  The fact AMD and Dickey are trying to settle now suggests that AMD thinks there is a case to be made and they want to stop it.  AMD wants this behind them for Zen's launch.


It was dismissed because the court agreed with AMD that Dickey's argument wasn't strong enough (to put it lightly.) Heck, they even used his own resources against him. That does not mean that they're trying to settle, it means that it's at a standstill until Dickey makes a better case. AMD isn't going to settle if Dickey can't made a half decent argument where the court is willing to dismiss on AMD's request. Until Dickey actually tries to force the issue (and I would love him to try,) it might not be a win for AMD, but it's most definitely not a loss considering there has been practically no press on this as of late. Probably for very good reason (it's a dead end.)

Don't forget that the lawsuit isn't if AMD has 8 fully independent cores or not (which the court as seemed to have accepted that they are independent enough to call them cores using Dickey's own resources,) but, rather if AMD misled the public with regards to how the CPU operates. Simply put, core or not, AMD stated from the get-go that it would have shared components well before it was even released. That alone is enough to throw away half of the case.


----------



## FordGT90Concept (Oct 6, 2016)

Aquinus said:


> Sure, but, invoking the unwritten rule that all of those things must be dedicated hardware to constitute a core is still the primary problem.


Courts work on precedent and certainly you can't deny the precedent is there.



Aquinus said:


> The Xeon Phi never made it to the consumer because it's like GPGPU. Consumers don't really care about machine learning or HPC applications. To think consumers would have benefited from having a Xeon Phi in their system (a co-processor mind you,) is a pretty big stretch.


They could only sell it to consumers if it could run DirectX 9 on it and it be competitive with NVIDIA and AMD offerings.  Intel abandoned that project and abandoned the idea of bringing it to consumers with it.  Had Intel succeeded, it would have performed competently at GPU and CPU tasks.



Aquinus said:


> Simply put, Ford was pretty insistent that not only Dickey was going to win but, that it would be a crushing defeat for AMD when it wasn't since even the first page of this now 19 page thread.


He still could.



Aquinus said:


> If I can expect core-like performance characteristics as opposed to SMT-like characteristics, then I consider it a core.


You're forgetting I'm a programmer too and probably 25-50% of my programs utilize async-multithreading.  I was actually surprised to see HTT boosted Random Password Generator about 30% boost per core.



Aquinus said:


> If BD was like hyper-threading in the sense that speed up is 0-40%, I would agree with Ford but, it's not.


Where my understanding of a core is that it represents a 100% clone of hardware which translates to 80-100% performance increase per core versus a single thread.  My testing shows this in threads 1-4.  It's threads 5-8 where HTT and Bulldozer's design exposes itself.  I'm eagerly awaiting that data to compare because I suspect know Bulldozer will not show 80-100% in 5-8  like it does in 1-4.  It represents proof they aren't cores but SMT inside a core.



Aquinus said:


> There is almost always speed up and it's more often than not, more than 50% which is better speed up versus what Intel's SMT implementation is capable of on a good day.


There's nothing wrong about that.  What is wrong is calling it a "core" when it isn't.



Aquinus said:


> That does not mean that they're trying to settle, it means that it's at a standstill until Dickey makes a better case.


Negative, the latest court documents says they are pursuing alternative dispute resolution which means they are trying to reach an agreement (settle) before it ends up back in court.



Aquinus said:


> ...but it's most definitely not a loss considering there has been practically no press on this as of late.


There's only press coverage of these things when they start and when they end--not in between.


----------



## FR@NK (Oct 6, 2016)

FordGT90Concept said:


> L3 and L4 are never claimed by a single core at this point.



On intel chips each core has its own L3 cache. But each core can also access another core's L3 cache if the data it needs is stored there instead of accessing the RAM but it has increased latency to do so compared to its own L3 cache. A bi-directional ring bus connects the cores and their caches together. The cores are clearly independent even including the "shared" last level cache.








FordGT90Concept said:


> There's nothing wrong about that. What is wrong is calling it a "core" when it isn't.



This is where AMD failed. If Bulldozer was marketed as a 4c/8t chip it may not have been such a bust; because without a doubt its performance running 5-8 threads is much better compared to a hyperthreaded 4c/8t intel chip under most workloads. Yet calling it an 8 core chip set the bar higher and threads 5-8 couldnt reach that mark.

I wonder what the settlement will be, $25 off of next year's Zen chips? In the end, only the lawyers will come out with any real money in this suit.


----------



## FordGT90Concept (Oct 6, 2016)

FR@NK said:


> On intel chips each core has its own L3 cache. But each core can also access another core's L3 cache if the data it needs is stored there instead of accessing the RAM but it has increased latency to do so compared to its own L3 cache. A bi-directional ring bus connects the cores and their caches together. The cores are clearly independent even including the "shared" last level cache.


That's kind of my point: memory has to be shared at some point because they can't coexist without it.  That said, never knew Xeon had such a crazy internal bus like that.



FR@NK said:


> This is where AMD failed. If Bulldozer was marketed as a 4c/8t chip it may not have been such a bust; because without a doubt its performance running 5-8 threads is much better compared to a hyperthreaded 4c/8t intel chip under most workloads. Yet calling it an 8 core chip set the bar higher and threads 5-8 couldnt reach that mark.


Exactly!  They shot themselves in the foot.


----------



## BiggieShady (Oct 6, 2016)

Aquinus said:


> We wouldn't? I think this is really all just to help Ford sleep at night. Simply put, Ford was pretty insistent that not only Dickey was going to win but, that it would be a crushing defeat for AMD when it wasn't since even the first page of this now 19 page thread. I suspect that isn't the only thing Ford is wrong about regardless of whether we wants to admit that to himself or not and I think the hardliner attitude only makes it that much more apparent. Honestly, people who actually work in the field and are good at it understand that you can't think this way. As a developer, it doesn't really matter about the specifics about the core. If I can expect core-like performance characteristics as opposed to SMT-like characteristics, then I consider it a core. If BD was like hyper-threading in the sense that speed up is 0-40%, I would agree with Ford but, it's not. There is almost always speed up and it's more often than not, more than 50% which is better speed up versus what Intel's SMT implementation is capable of on a good day.



Well, AMD has been pretty outspoken about Bulldozer design coming from this IEEE scientific paper: http://www.microarch.org/micro37/papers/18_Kumar-Conjoined-Core.pdf

Abstract:


> This paper proposes *conjoined-core* chip multiprocessing – topologically feasible resource sharing between* adjacent cores* of a chip multiprocessor to reduce die area with minimal impact on performance and hence improving the overall computational efficiency.



Figure 2 in the paper looks almost exactly like a BD module.

So, @FordGT90Concept, this type of core has a scientific name, it's a conjoined core ... keyword here is core.

AMD gave us branch misprediction penalty of a Northwood, L1D cache of a Prescott and conjoined cores from an IEEE paper ... last one being the least offensive ... too bad they can't be punished for the first two instead


----------



## FordGT90Concept (Oct 6, 2016)

I was calling it "hybrid," "conjoined" works too.  Anyway you shake, AMD still omitted a key word in "8-core" they advertised it as.  I'd be okay with "8-conjoined core" or "8-integer core."  They need to make it clear to the public it is different.

Not sharing the L1D cache is bewildering.  Even in that IEEE paper, they show a 64KiB shared L1D cache.  If they were going to keep it separate, they should have had at least 32 KiB in each.


Edit: The paper says there is a performance cost:


> We show that, given a set of novel optimizations that reduce the negative impacts of this sharing, we can reduce area requirements by more than 50%, *while achieving performance within 9-12% of conventional cores* without conjoining. Alternatively, by only sharing ﬂoating point units and crossbar ports, core area can be reduced by more than 23% while achieving performance within 2% of conventional cores without conjoining.


I think that's something the public has the right to know because that is significant.


----------



## cdawall (Oct 6, 2016)

Yea lets add words because that makes consumers happy. I like how you keep calling them cores however.


----------



## FordGT90Concept (Oct 6, 2016)

Only when there's a preceding word that clarifies it is not traditional.  Example: compact SUV or pygmy goat.


----------



## cdawall (Oct 6, 2016)

FordGT90Concept said:


> Only when there's a preceding word that clarifies it is not traditional.  Example: compact SUV or pygmy goat.



Using your own example, would a compact SUV still not be an SUV? Verbatim what you are saying is AMD is provided users 8 cores to their truest of forms.


----------



## FordGT90Concept (Oct 6, 2016)

A compact SUV is much, much smaller than a typical SUV; likewise, a pygmy goat is much smaller than a typical goat.  The former is a "type of" the latter but not the same as.  A conjoined core is a type of core but not the same as a (conventional) core.  So no, they didn't provide "8 cores," they provided "8 conjoined cores."  That's an important distinction between the two.

Additionally, the first of a type usually defines what the normal is.  For example, the popularity of the Ford Explorer (mid-sized, five passenger, front engine, rear wheel drive) established it as the class of SUV.  Core was defined a decade ago in the x86 world by the Athlon 64 X2.

If you phoned a Ford dealer, told them to send you an SUV, and an Escape (compact SUV) showed up, would you not be disappointed?


----------



## cdawall (Oct 6, 2016)

No because I asked for an SUV and that is by definition an SUV. This is also why everyone thinks your argument is incorrect.

I would be mad if I called for an SUV and they showed up with a Fiesta, those are two different things. Notice how the analogy works.


----------



## FordGT90Concept (Oct 6, 2016)

Do note that that Ford Escape (compact SUV) starts at $23,600 where Ford Explorer (mid-sized SUV) starts at $31,660.  They cheated you out of $8,060 or 25% of your vehicle.  The exact same argument can easily be made against Bulldozer.  If you can't see that, I guess we're going to have to agree to disagree on that point.


----------



## cdawall (Oct 6, 2016)

You didn't ask for a specific SUV you said SUV. Thing is when you purchased an AMD you didn't buy an Intel, so asking for Intel's cores would be, well asinine. You can disagree all you want, but so far the courts agree that an SUV is an SUV.


----------



## FordGT90Concept (Oct 7, 2016)

When I'm sold "8 cores," I expect the equivalent of two Phenom II X4s, not 10-30% performance loss because of the conjoining.

Ask yourself this: if AMD correctly labeled their Bulldozer products, would they have sold fewer of said products?  If yes, AMD defrauded the public.  If no, it's water under the bridge.


----------



## cdawall (Oct 7, 2016)

FordGT90Concept said:


> When I'm sold "8 cores," I expect the equivalent of two Phenom II X4s, not 10-30% performance loss because of the conjoining.
> 
> Ask yourself this: if AMD correctly labeled their Bulldozer products, would they have sold fewer of said products?  If yes, AMD defrauded the public.  If no, it's water under the bridge.



I expect nothing more than what I got because I have the ability to read reviews prior to purchase. The performance loss was also only single core IPC, or poorly threaded applications neither of which has to do with conjoining and does have to do the the 2 ALU/AGU setup, tiny cache and overall piss poor design of the individual cores. 

You know what I equate this two? Suing Intel because their 3ghz P4 wasn't as fast as AMD's Athlon 64. Is there a definition of what a ghz should equate to, is a small tire not a tire because it doesn't go on a dump truck? They are independent cores whether you see it that way or not, and if this ends up back in court and not dismissed, because it isn't worth the courts time AMD will end up winning because guess what? It's still a core even without and FPU or do you want everyone to start labeling boxes as 8 integer cores+ 4 floating point units? Hey you know something funny the integer section is called a core and the floating point a unit...Wonder if that's because one of them is required in a core, and the other is not.


----------



## FordGT90Concept (Oct 7, 2016)

People that go to Best Buy and the like generally don't read reviews.  They look at the labels and make their decision based on them.


Hz is a very scientific unit of measure (cycle per second).  The difference is in what it does with clocks and that's not something easily differentiated by a product label.  Perhaps there should be a standard made to make this clear--akin to horsepower and torque ratings on engines.

A tire on a bicycle shares little in common with a tire on a Caterpillar 797.  The distinction is very important.

They are not "independent cores" (even AMD said they never made that claim in their arguments) they are very clearly "conjoined cores"--a very important distinction.


----------



## BiggieShady (Oct 7, 2016)

cdawall said:


> I like how you keep calling them cores however.


We should all call them ATFKAC (a thing formerly known as core) after Dickie wins the case 


FordGT90Concept said:


> They are not "independent cores" (even AMD said they never made that claim in their arguments) they are very clearly "conjoined cores"--a very important distinction.


Every execution engine needs a frontend and cache/mem in all architectures. It's not a huge paradigm shift to have two execution engines need only a single frontend.
It works like having two frontends at half clockspeed (yes, uops are dispatched left, right, left, right ... alternating both cores every other cycle). 
Knowing that, how can you argue dependency? 
Hint: you could argue performance, but not dependency.
It's completely possible that you are using adjective "independent of something" as "will work without something", and not referring to execution of uops at all.
If that's the case, let's do something stupid ... let's make, say, a pin #917 part of every core because none of the cores will work without pin #917 ... because one depends on the other


----------



## cdawall (Oct 7, 2016)

FordGT90Concept said:


> People that go to Best Buy and the like generally don't read reviews.  They look at the labels and make their decision based on them.



People who shop at best buy don't care and wouldn't know one was faster than the other unless you handed them a graph.




FordGT90Concept said:


> Hz is a very scientific unit of measure (cycle per second).  The difference is in what it does with clocks and that's not something easily differentiated by a product label.



Oh yea where on the label does it state that a 3ghz pentium 4 is slower than a 3ghz athlon 64?









FordGT90Concept said:


> Perhaps there should be a standard made to make this clear--akin to horsepower and torque ratings on engines.



There would be no way to standardize what a ghz can do. It isn't HP/TQ which is an actual measurement of work done. That would be something akin to gflops, you know a measurement of work not speed. Ghz would be like measuring MPH, some take more HP/TQ to hit the same speed.



FordGT90Concept said:


> A tire on a bicycle shares little in common with a tire on a Caterpillar 797.  The distinction is very important.



Or is it assumed that they are not the same because people aren't oblivious to the world.



FordGT90Concept said:


> They are not "independent cores" (even AMD said they never made that claim in their arguments) they are very clearly "conjoined cores"--a very important distinction.



Prove it? They are clocked independently, perform instructions independently etc.


----------



## Recon-UK (Oct 7, 2016)

WOW i never seen this thread before.

I really dislike the FX series (i'm a huge fan of AMD though) but taking them to court over what they wanted to call cores?

REALLY you are going to take a top tier CPU maker to court on a subjective term?


LMFAO


----------



## FordGT90Concept (Oct 7, 2016)

cdawall said:


> People who shop at best buy don't care and wouldn't know one was faster than the other unless you handed them a graph.


Dickey proves this wrong.



cdawall said:


> Oh yea where on the label does it state that a 3ghz pentium 4 is slower than a 3ghz athlon 64?


Never said it did.  I said they should.



cdawall said:


> There would be no way to standardize what a ghz can do. It isn't HP/TQ which is an actual measurement of work done. That would be something akin to gflops, you know a measurement of work not speed. Ghz would be like measuring MPH, some take more HP/TQ to hit the same speed.


I think an IEEE standard for measuring instructions/second would suffice (would include a mix of standard instructions) not unlike how SAE measures towing and payload capacities now.



cdawall said:


> Or is it assumed that they are not the same because people aren't oblivious to the world.


You clearly have too much faith in the common consumer.  Remember, the population we're talking about are the type that buy Beats Audio products just because some celebrity was paid to say it's good.



cdawall said:


> Prove it? They are clocked independently, perform instructions independently etc.


@BiggieShady's linked IEEE paper + AMD's published core diagrams prove it.  They practically copied Kumar, Jouppi, & Tullsen's 2004 design with some tweaks (many of which hurt performance).  That same paper explains why two "conjoined" cores will never perform on par with two "conventional" cores.


----------



## BiggieShady (Oct 7, 2016)

FordGT90Concept said:


> ... many of which hurt performance ...


It's mentioned as how little performance is hurt compared on how much die space is saved. You are interpreting it as a bad thing.


----------



## FordGT90Concept (Oct 7, 2016)

What I was alluding to was suffocating it of L1 data cache.  Cutting back on cache always saves die space but it also reduces performance--not unique to Bulldozer.

Yes, saving die space is the main advantage of conjoined cores.  That's pretty much the only reason why any chip manufacturer would do it.


----------



## cdawall (Oct 7, 2016)

FordGT90Concept said:


> Dickey proves this wrong.



He sighted Tom's Hardware as his only source of evidence proving it wasn't an 8 core CPU, if you would like I can post a core mentioning how it is an 8 core CPU and we can site that instead?




FordGT90Concept said:


> Never said it did.  I said they should.



And there in lies the problem. AMD should mention that it uses a module design as opposed to a standard design of old, but you know what if every technology company had to write a dissertation or face criminal persecution every single time they attempted to innovate then guess what you will stop seeing. Remember Hyperthreading was an innovation saved from an atrocious netburst pile of shit design and currently exists in every single high end chip Intel sells almost as a status quo.



FordGT90Concept said:


> I think an IEEE standard for measuring instructions/second would suffice (would include a mix of standard instructions) not unlike how SAE measures towing and payload capacities now.



Standard instructions would show that AMD chip in a favorable light. Remember AVX isn't a standard instruction, most would consider it hardly used actually. Also how often do you update instructions sets? Every new generation of CPU typically has a new instruction set or two. A simple GFlops listing would suffice even if it is theoretical as they do with GPU's. It isn't AMD or Intels fault if current software doesn't utilize the chips.



FordGT90Concept said:


> You clearly have too much faith in the common consumer.  Remember, the population we're talking about are the type that buy Beats Audio products just because some celebrity was paid to say it's good.



I actually have a simple view on this if the consumer is too stupid to realize what they are buying they really have no place to sue. Companies aren't there to inform ignorant people of how life works, if they were Microsoft would have been put out of business for Windows 10 upgrades. Quite a few stupid people just clicked next without reading what they did.



FordGT90Concept said:


> @BiggieShady's linked IEEE paper + AMD's published core diagrams prove it.  They practically copied Kumar, Jouppi, & Tullsen's 2004 design with some tweaks (many of which hurt performance).  That same paper explains why two "conjoined" cores will never perform on par with two "conventional" cores.



Quite a few HPC articles show near perfect scaling on the design. Remember these were designed for an HPC environment and then loosely adapted to be used in a consumer level product. The cores show more scaling than any multi thread design ever could because they are cores not just the ability to process another instruction.


----------



## FordGT90Concept (Oct 7, 2016)

cdawall said:


> Quite a few HPC articles show near perfect scaling on the design. Remember these were designed for an HPC environment and then loosely adapted to be used in a consumer level product. The cores show more scaling than any multi thread design ever could because they are cores not just the ability to process another instruction.


It's not 100% and never will be which Kumar et. al. showed.  Benchmarks show the same.


----------



## cdawall (Oct 7, 2016)

FordGT90Concept said:


> It's not 100% and never will be which Kumar et. al. showed.



There isn't 100% scaling with two cores on any CPU. Software overhead will always prevent that.


----------



## FordGT90Concept (Oct 7, 2016)

Software overhead is pretty consistent and easy to adjust for.


----------



## cdawall (Oct 7, 2016)

FordGT90Concept said:


> Software overhead is pretty consistent and easy to adjust for.



Constant or not it is still not 100% scaling, adjusting numbers to compensate for that is actually telling one of those lie things. When you do that it can get you sued by consumers.


----------



## 64K (Oct 8, 2016)

cdawall said:


> I actually have a simple view on this if the consumer is too stupid to realize what they are buying they really have no place to sue. Companies aren't there to inform ignorant people of how life works, if they were Microsoft would have been put out of business for Windows 10 upgrades. Quite a few stupid people just clicked next without reading what they did.



I always due my research before buying too but this guy may not be responsible for his mistake if he can convince the jury that AMD failed at "truth in adverting" and that burden is on AMD and not the customer to advertise their CPUs honestly and clearly.


----------



## FordGT90Concept (Oct 8, 2016)

cdawall said:


> Constant or not it is still not 100% scaling, adjusting numbers to compensate for that is actually telling one of those lie things. When you do that it can get you sued by consumers.


You're missing (avoiding?) the point.  Bulldozer shows a decline that is not consistent with "conventional" cores.




-Yellow line represents ideal circumstances (800% for 8 threads, 700% for 7 threads, and so on)
-Maroon line represents software overhead.  Most of that comes from the main thread (UI updates) which created a minor conflict with four threads and repeated at eight threads (two worker threads on the same core as the main thread).
-Orange line is: time of one thread / time of thread (e.g. 339% for 4 threads)

Note how far orange deviates from maroon after 4 threads, that is the result of SMT--a performance gain over a processor without it but a far cry from the throughput of eight conventional cores.  I fully expect Bulldozer to land between those two lines.


----------



## Aquinus (Oct 8, 2016)

FordGT90Concept said:


> You're missing (avoiding?) the point. Bulldozer shows a decline that is not consistent with "conventional" cores.


Have you used a CPU with more than 4 "real cores" by your own definition to show that your application is actually capable of speeding up to a reasonable extent past 4 threads? Most applications don't speed up very well and start hitting some form of limitation when it comes to concurrent processing of data, even more so if it's just making the full calculation run in parallel and not using different threads to handle different stages of the task. I suspect that if the trend for the times for both the 8350 and the 6700K are the same, that the way it's written merely doesn't speed up past so many cores. This doesn't make AMD's CPU not have real cores, it's just part of the reality that is designing software.

If you don't believe me than maybe you should send the binary to @cdawall to run on his Opterons to see if it scales past 4 or 5 threads and maybe another member who has a 6c or 8c Intel CPU to generate some numbers for us. This is a claim that can be validated, so it should be because not all software scales and unless it has been tested on a machine fitting your "real core" criteria with more than 4 of them, I would say that you have insufficient data to assert that your benchmark is even capable of showing such optimistic speed up with additional cores.


----------



## Recon-UK (Oct 8, 2016)

PCSX2 can use an infinite amount of threads using GSDX Software mode, however it won't really scale passed 4 threads at all even with HTT.


----------



## FordGT90Concept (Oct 8, 2016)

I had a dual, quad-core Xeon server a while back.  Motherboard died.  Anyone that has access to a system with 8 cores is free to try it and I'll graph it:
https://www.techpowerup.com/forums/threads/random-password-generator.164777/

1. Set max threads to 8.
2. Set random seed to 0.
3. Set minimum characters to 32.
4. Set "Maximum number of attempts to generate a password" to 10000000 (default is 1 million, add a 0 to make it 10 million).
5. Uncheck "include special characters."
6. Check "number of special characters required."

Run it, note the time in the corner and the number of threads., change max threads to 7, rinse and repeat through 1.


----------



## FR@NK (Oct 8, 2016)

I'll test it on my 8 core.

I will need to turn off turbo boost and lock all the cores to the same speed. Did you do this when you tested it ford?

Update:

1 threads: 29.8149422
2 threads: 15.7981694
3 threads: 10.7508806
4 threads:   8.2506762
5 threads:   6.7012710
6 threads:   5.6937791
7 threads:   5.0004093
8 threads:   4.5316209


----------



## cdawall (Oct 8, 2016)

Aquinus said:


> Have you used a CPU with more than 4 "real cores" by your own definition to show that your application is actually capable of speeding up to a reasonable extent past 4 threads? Most applications don't speed up very well and start hitting some form of limitation when it comes to concurrent processing of data, even more so if it's just making the full calculation run in parallel and not using different threads to handle different stages of the task. I suspect that if the trend for the times for both the 8350 and the 6700K are the same, that the way it's written merely doesn't speed up past so many cores. This doesn't make AMD's CPU not have real cores, it's just part of the reality that is designing software.
> 
> If you don't believe me than maybe you should send the binary to @cdawall to run on his Opterons to see if it scales past 4 or 5 threads and maybe another member who has a 6c or 8c Intel CPU to generate some numbers for us. This is a claim that can be validated, so it should be because not all software scales and unless it has been tested on a machine fitting your "real core" criteria with more than 4 of them, I would say that you have insufficient data to assert that your benchmark is even capable of showing such optimistic speed up with additional cores.



I have a 5960x as well as the opteron. When testing on the 5960x I couldn't get consistent results so I didn't post them.



64K said:


> I always due my research before buying too but this guy may not be responsible for his mistake if he can convince the jury that AMD failed at "truth in adverting" and that burden is on AMD and not the customer to advertise their CPUs honestly and clearly.



They have 8 cores is why this will go in the trash. Without a previously written definition there is no malicious intent nor is there a way to prove these aren't cores. Hell even ford calls them cores, he just tried to preface it and make it seem like he disagrees.


----------



## FordGT90Concept (Oct 8, 2016)

FR@NK said:


> I'll test it on my 8 core.
> 
> I will need to turn off turbo boost and lock all the cores to the same speed. Did you do this when you tested it ford?
> 
> ...


Big surprise here, it's right where I expected it to be (compare "8-core" column to 80% scaling column).  I'm taking a screenshot of the raw data to prove I'm not bullshitting anyone:





What was this 8-core by the way?  I mean, model of the processor(s) so I can make it match.

This proves not only is the application extremely predictable, the curve you see where it deviates above 2-7 is mirrored in quad-core (deviates above on cores 2-3).  This is deliberate, again, to account for the UI overhead that doesn't show until all of the physical cores are loaded.


I tested with turbo enabled.  Turbo being enabled might explain why it is 20% and not, say 10%, because I'm using the single thread as the point of reference.  If it overclocked that single core, it would exaggerate the single thread test making the multicore look worse.  That said, I'm not really worried about it because FX processors would do the same.


Out of curiosity, is this octo-core system running about 3.246 GHz?



cdawall said:


> I have a 5960x as well as the opteron. When testing on the 5960x I couldn't get consistent results so I didn't post them.


My previous instructions didn't change the random seed.  These instructions should get a more consistent result because that was a pretty big omission on my part.

You'll also get wild results if anything CPU intensive is running.

It will kind of be all over the place when running the same test over and over.  A single test isn't really important... it's the trend that matters: generally shorter with each thread added.  Exception: 4th and 8th (with SMT) thread quad-core processors may be about equal to preceding and the same goes for 8th and 16th (with SMT) thread on octo-core processors because of the UI thread.


----------



## FR@NK (Oct 8, 2016)

FordGT90Concept said:


> Out of curiosity, is this octo-core system running about 3.246 GHz?



Yea its a 6900k set at all cores to 3.2GHz.


----------



## FordGT90Concept (Oct 8, 2016)

I added names + clockspeeds to spreadsheet.


----------



## FordGT90Concept (Dec 7, 2016)

I'm going to change OS SSDs soon and cleaning my computer desktop.  Here's the incomplete OpenOffice Document Spreadsheet should it ever be completed...


----------



## Aquinus (May 29, 2017)

You can all thank @FordGT90Concept for reminding me of this thread because I had honestly forgot. It sounds like:


> For the foregoing reasons, the court GRANTS defendant's motion to dismiss plaintiffs' claims. Within 14 days, plaintiff shall submit an amended complaint that corrects the deficiencies identified in this order.
> 
> IT IS SO ORDERED. Dated: October 31, 2016


https://casetext.com/case/dickey-v-advanced-micro-devices-inc-1

Just wanted to throw that out there since this happened after the last post on the matter.


----------



## FordGT90Concept (May 29, 2017)

> Defendant argues that (1) Alabama law applies to Dickey's claims, so Dickey's California claims should be dismissed; (2) plaintiffs fail to state a claim for fraud; (3) plaintiffs fail to state a claim for breach of warranty; and (4) plaintiffs lack standing to seek injunctive relief. Dkt. No. 52.


1. Declined
2. Granted


> *Lack of Factual Basis for Plaintiffs' Expectations*
> The court is not convinced that Dickey and Parmer were required to identify a particular statement by AMD or anyone else representing AMD's cores as completely independent so long as plaintiffs alleged a particular, plausible understanding of the term "core" as independent such that AMD's use of the term would be misleading. However, as discussed below, plaintiffs have failed to allege their expectations with sufficient particularity.





> *Lack of Specificity of Plaintiffs' Expectations and Industry Standards*
> The court finds that plaintiffs' amended allegations fail to cure the deficiencies previously identified by the court, *in particular whether plaintiffs believed that a core could not share resources as well as plaintiffs' particular understanding of what constitutes a core*. _See_ Dkt. No. 46 at 7. Accordingly, the court grants defendant's motion to dismiss plaintiffs' fraud-based claims. Plaintiffs will be given one final opportunity to amend to attempt to cure these deficiencies.


3. Granted because of #2.
4. Not addressed because of #2 and #3.

In other words, as observed before, the plaintiff doesn't understand the technology of it well enough to prosecute the case.  I think he could win if he argued that the dispatcher can create blocking scenarios which degrades the performance of both integer clusters thereby proving they are not independent.

Example:
http://pds.ucdenver.edu/document/hardware/AMDbulldozer-IEEE-Computer-2011.pdf
The authors use "core" in contexts that are confusing and inconsistent.

It opens saying "*It combines two independent cores* intended to deliver high per-thread throughput with improved area and power efficiency."  Core refers to the integer clusters (article usually refer to them as "interger cores") here.  Also note the word "combines" which contradicts the word "independent."  They were apart, but now they are together.

Look on page 9, it says "Figure 4 shows how the Bulldozer core uses these different mechanisms."  Core refers to the AMD's so-called "module" here and it repeats in the Figure 4 text: "Multithreading model that shows how the Bulldozer core uses different mechanisms."

If a bunch of geeks are using the word "core" interchangeably to describe two very different things, how is Joe Public supposed to know the difference when they see "8-core" in marketing?  AMD never made the distinction themselves on their products. I think someone could easily make this argument and win the lawsuit but Dickey doesn't understand the tech well enough to.


I believe the case is no longer being pressed because Dickey didn't motion to appeal within the 14 day limit.


----------



## Aquinus (May 29, 2017)

FordGT90Concept said:


> If a bunch of geeks are using the word "core" interchangeably to describe two very different things, how is Joe Public supposed to know the difference when they see "8-core" in marketing? AMD never made the distinction themselves on their products. I think someone could easily make this argument and win the lawsuit but Dickey doesn't understand the tech well enough to.


You say that but, his legal team was literally twice the size of AMD's. I find it hard to believe that someone who dumps that much time and money into something would just give up if there were options to "easily win." All things considered, there was a motion to appeal earlier and the court still struck it down. Not appealing it is basically saying that they concede and your entire premise runs on the assumption that his team or he didn't know what they were doing but, that doesn't seem to be the case. Even if that was the case, it also could indicate that their argument was flawed in the first place... but if you're such an expert on it, maybe Dickey should have added you to his team.


----------



## qubit (May 29, 2017)

So, is the breakdown of this case good or bad from the consumer's viewpoint?

Personally, I always thought those siamesed cores were a bit of a stretch to count them as two, because the operation of one hampered the other due to those shared resources, so I can see why the lawsuit happened. Whether it's enough to call it fraudulent and worthy of suing over I'm not so sure.


----------



## FordGT90Concept (May 30, 2017)

Aquinus said:


> I find it hard to believe that someone who dumps that much time and money into something would just give up if there were options to "easily win."


They literally only cited two sources which the judge reviewed:
http://www.tomshardware.com/reviews/fx-8150-zambezi-bulldozer-990fx,3043.html
http://www.anandtech.com/show/4955/the-bulldozer-review-amd-fx8150-tested



Aquinus said:


> All things considered, there was a motion to appeal earlier and the court still struck it down.


Because of the lack of standing (#1).  They fixed that in this appeal and added a second plaintiff to bolster the case but they failed furnish evidence explain why calling it a "core" is misleading.



Aquinus said:


> Not appealing it is basically saying that they concede and your entire premise runs on the assumption that his team or he didn't know what they were doing but, that doesn't seem to be the case.


Actually, it does, with any court case, standing has to be proven and they had to appeal to prove standing.  The law firm that handled this was clearly treading in untested waters.  A law firm that deals with class action suits regularly wouldn't make that novice mistake.

Dickey is not wrong: people see "8-core" and the price compared to Intel's 8-core offerings and they think it's a steal.  AMD did that deliberately too.



qubit said:


> So, is the breakdown of this case good or bad from the consumer's viewpoint?


Largely irrelevant because that Bulldozer/Excavator/Piledriver were the beginning and end of that processor design.  It's shit and I doubt any other chip manufacturer will revive it.  Because this case wasn't dismissed on specific substance, it likely wouldn't be cited against future misleading designs.


----------

