that...is the most aggressive post i ever read...
first off intel is not 16kb per thread as you may think, largely a core that can do 2 thread is not necessary divide l1 cache in half as NOT all nehalem processor like i7 that came with hyperthreading...and pretty much you have no idea/no understanding about hyperthreading...hyperthread is pipeline measuring, when hyperthreading enable it will use the unused part of clock cycle/pipeline and simulate a "fake" core during process. which is technically still 32kb l1 data per thread.
the hard fact is bulldozer cores are NOT divide from module, they are the individual core that pair of core are wrap together into each module with a l1 instruction cache in the middle and wiring between two of core. so the instruction cache is uncored for sure(while nehalem's l1/l2 are bulit in each of their core and only left a larger l3 cache outside the core with ringbus connected.) why i said it was 8kb, because it was rumor to be between
8~16kb in early 2009...since most said that it will use far smaller cache than it was on k10 (128kb!!) some site took smaller number but wth? it's still in speculation period and who knows amd might increase their cache to 32k or even 64k l1 data per core? plus even under 16kb, with running 2 threads it will divide cache into 8kbx2 because they don't have hyperthreading like intel had that optimize single core in multi-threading without drop too much of performance...
now before you start hammer me with your ignorance...you have to understand one thing:
UNCORE MEANS ANYTHING THAT IS NOT BUILD INSIDE THE CORE! even they are still in the same die/module it doesn't change the fact these cache are uncore....which i'm not wrong at this point! it makes it look like each core only had 8`16k l1 data while no L1I built in!
before calling me troll you better measure how much you know about miroarchitecture first....
bulldozer has same latency as nehalem? that's news to me...no! amd has long history of bad performance on their cache because of low quality silicon yield during production and bad wiring in die area. what do you think why amd bother go 128k l1 cache design if their cache were so powerful? just so you know in phenom the l1 cache latency can be as high as 10~12 clock per cycle while core 2 is only 2.5clock per cycle. that just not for long ago and what makes you think they are be tweak over night? and with such small cache? it sounded more ridiculous than cayman that only has 32 rops... larger + faster l1 cache means better performance. neither amd will get better result if they keep such garbage design on bulldozer. all they need is just put 128k l1 data in each core and bulldozer will trump nehalem for sure. in IT field TDP means shit! performance is
ALWAYS measure by how big die size and how many transistor.it will be stupid because smaller l1 cache = performance loss. also they still had alot of room to put cache in their core because their new core are only 1/2~1/3 of nehalem's single core. if they want to win this they better increase their die size by adding more cache lik 256kb l1D per core +128kL1I, 4mb l2 per module and 32mb share l3 cache and feature 400mm^2
oh about pipeline, you can cehck this:
http://www.amdzone.com/phpbb3/viewt...id=1b04a4780b037a6ab2c09efd2ffe3f19&start=350
it was confirm that they are under work on more pipleline and feature higher frequency..don't know if it's true but that will be stupid too
also the l1 cache has confirmed to be 3 cycle rather than previously thought of 2 cycle. i mean seriously if they can't outperform intel why dont they just increase more cache instead? if 128k can't beat intel's 64k then why not 256 or even 512k on l1 cache?