Just wanted to make that clear because your comment seems to insinuate that cache is ineffective because of such workload which isn't true.
Ah, didn't mean to insinuate that ... less effective not ineffective. I was assuming that all instructions are indeed cached in this special case, and was thinking of data only - in retrospect, following the chain of replies, I see where the implication comes from since the discussion started on RAM speed affecting cpu's IPC.
Sure, it may be less effective but, the speed up is just from the application code, not memory accesses themselves for a scan. A cache doesn't do you any good if you're reading all of memory sequentially, that's not why it exists.
Bolded part is exactly the reason I picked that example (and called it extreme) to point out how type of application matters. Of course memtest is not written to make optimal use of cache, but rather to test RAM.
Now, question is to what extent it matters for the IPC, since super scalar processors are doing other instructions while waiting for memory controller or cache (and multiples of them per clock if they are mutually non-dependant) ... that would depend how many memory read/write instructions are in comparison to ALU/FPU instructions in total ... also the reason I was talking about memtest (lots of memory reads/writes, and almost no compute).
Just to be clear, I do agree that if instructions (or at least performance critical sections) can't fit in the cache, that takes the greatest toll for the IPC - having to get them from RAM. Let's just say I'm thankful for cache hierarchy ... and also curious how Broadwells with L4 eDram cache fare in that regard.