Oh boy... Here we go.
they are try to fix the single thread performance hit due to the smaller l1 data/instruction.
As if they would have had
any problems slapping in an equally sized or larger than Hammer's L1s... It's not like this is AMD's first CPU architecture ever, or that adding such and amount would be of any die area concern. And for comparison, Nehalem has 32kB per core, 16kB per thread AND a tiny 256kB L2 - I bet Intel must be struggling with similar performance hit.
each core "only" had 8kb l1 data
Err... No.
Each Bulldozer module has two set of integer pipelines and both of them have dedicated 16kB L1D. 16+16kB in total per module, 16kB per thread.
while the instruction cache is share by module which just only 64kb "2 way" in cache(could have be less...i think...)
Bulldozer's L1I
is 64kB, that's been public for some time now. About the bracketed comment; you think it could have been smaller, or you aren't sure what size it is?
which is roughly 40kb per core compare to core's 64kb per core. big disadvantage.
If you say so...
so all they can do is add more l3 cache to increase the performance (...) same thing intel did when realized northwood its poor l1 cache will drag down performance they increase l2 cache from 256kb to 512kb.
And by coincidence, Intel is doing the same. "Obviously" they too must be patching Core m-arch's "poor L1s and L2s" by adding cache levels and continuously increasing their size.
however orochi is 8 module 16 core processor
No. Orochi is 4 module, 8 thread core.
so featuring 16mb l3 meant each core can use up to 1mb l3. still way below nehalem's 2mb per core.
Durrr...
Bulldozer
does not have a 16MB L3, even reading the thread title should give away the L3 is 8MB. 2MB L2 + 2MB L3 per module, that is. Thus, per module, Orochi has 8× as much L2 vs. Nehalem and equal L3-ratio.
also unlike intel's architecture amd's cache heavily determine by the stage pipeline.
Strange conclusion considering the public, (that includes me and you) don't know Bulldozer's exact pipeline length, yet.
lower stage pipeline won't take advantage on bigger cache. but since bulldozer will featuring 4+ghz i doubt this will be at least 20+ stage pipeline in this processor.
Broken sentence. What are you trying to say?
You do believe it is 20+ stage or you do not?
Also, the clock rates are completely unknown to public.
but despite all these feature as long as intel decide to increase ivy bridge's l2 cache from 256k per core to 512k per core amd will experience same horror they faced when core 2 came out.
Oh really? Now one can only wonder why didn't Intel see such a shortcoming of their L2 before taping out Nehalem, Sandy Bridge... They must have missed the fact their chips' L2 had shrinked to a fraction of the size compared to Conroe, Penryn.
PS.
In case you find some parts of my reply sarcastic, it is highly likely you are right.
Abstract for those with the "TL;DR" -syndrome:
Burger, please
get your facts straight. The factual errors I've pointed out are public knowledge, go read them. And please do pay attention to writing proper English, often it is impossible to figure out what you're trying to say as many of your sentences are missing words and the words that are there are often misspelled.