Longer compared to what, no cache? If the data is not in the cache, the CPU has to get the data from memory, which is X times slower than from fetching it from cache. The more cache you have, the more frequently accessed data can be stored in the cache, and the faster data fetch and cpu operation can happen.
How the CPU know what is in the cache? Magic ?
No, it have to look up, and a simple way represent it is the larger the cache, the longer it take to look it up.
But to details this a bit, Cache do not cache data, that is a misconception. The CPU at that level is just aware of instruction and memory address. The way the cache work is by caching memory region.
A ultra fast lookup would be to just cache one contiguous memory region. The lookup would be just, is that memory address is in this region? yes/no, then done. The thing is caching a single 3 MB region would have a disastrous cache hit ratio so it wouldn't make sense to do it. Instead they cache smaller region of memory and this is were the trade off happen.
The smaller the region, the higher will be the hit ratio but at the same time, the longer it will take to see if the data is in there. Working with cache isn't just more is better. it's a balance you do when you design a CPU. By example, AMD frequently went with Larger L1 and L2 but had slower cache speed. And by example, Core 2 Duo had 3 MB of L2 per core (merged into a 6 MB shared L2). So 3 MB L2 isn't new. But at that time they didn't had L3.
The thing with cache is you have to look it up every time. If you get a L1 hit, perfect, you just had to look up at that. if you have a cache miss at all level, you still have to check if the data was in L1, then L2, then L3, Then you access it from memory. There is a cache miss penalty over not having cache at all, but if you make your stuff properly, it can way outperform accessing the memory all the time. But the way you do it can greatly impact performance. You need to find the right balance for your architecture.
Generally, L1 is ultra fast and very low latency and very close to the core itself. The L2 is generally dedicated to the core, contain a fair bit much of data needed by that core and L3 is shared across all core and is generally a victim cache (It contain the data that got evicted from L2). That setup worked well.
Intel isn't stupid, so they must think that the new core need a larger cache to be fed. It's possible that they are mitigating the larger cache size with longer pipeline or other technique. In the end it take transistor and the more you put, the larger your chip is and the more expensive it cost.
Designing is always about tradeoff and there, I was just wondering if that was the way to go. In the past, architecture that were near their last redesign frequently had larger cache than their successors because at that point, adding cache was the best thing to do without a full redesign.