Cache and CPU performance
There are two processors A and B both running at 2.5 GHz, i.e. 2,500,000,000 clock cycles per sec. A basic CPU operation requires one clock cycle.
One processor A has a larger L2 cache, say 512 KB. Another processor B has a smaller L2 cache, say 256 KB.
L1 and L2 cache are for storing frequently used data for the CPU, temporarily until new data has to be swapped in from, and old data has to be swapped out to main memory. The processors can read from and write to the cache with very few clock cycles (cache latency).
Main memory (aka L3 in PC) can store much much more amount of data (e.g. 1 GB main memory would be 2000 times of 512 KB L2). To read/write the main memory, it requires much much more CPU cycle, say 30 - 80 times.
Hard drive (aka L4 in PC) can store even more data, ..., basically the universe of the data in your system, but it takes even more time, and it occurs during paging when data is not found in main memory in a computer system.
L1 cache, L2 cache, main memory (L3), hard disk (L4) form the so called
memory hierarchy.
The larger the cache, the chance (probability) of finding data there is higher. Ananlysis shows that when the cache size is above certain size for a given CPU architecture, CPI and cache latency, the probability will level off. Typically, the probability is around 85 - 95% for L2 ranging from 256 KB to 512 KB or even 1 MB.
The time to read/write data to the main memory typically requires many many more CPU cycles (see earlier number). So if the CPU needs data that is not in the cache (called cache miss), it would have to wait until the data arrives in the cache again from the main memory (many more cycles later than if it is found in the cache).
Even if both CPU A and B are running at the
same frequency of 2.5 GHz, CPU A will finish a given job
sooner than CPU B since the probability for CPU A to find data in the cache is higher than that of CPU B. CPU A has less cache miss than CPU B.
Analysis has shown that, by doubling the L2 cache size, the overall performance would be improved by 0 - 10%+ over a wide range of applications, some more and some less, averaged typically by say 5%.
That is why we usually say a Barton (512 KB L2) performs 5% better than a 1700+ (256 KB L2) running at same frequency, or the 1700+ has to run 125 MHz faster to break even with a Barton at 2.5 GHz. Few months ago, a Tbred B DLT3C 1700+/1800+ overclock about 100 MHz better than a desktop Barton, so they were about tie. But recently the mobile Barton overclocks equally good, and in many time even higher than the 1700+/1800+, so the mobile Barton is a better choice for performance (apart from the price difference).
What happens to programs running in CPU with smaller and bigger L2 cache (page 17)
Some remarks on cache latency, cache size, memory latecny and memory bandwidth (for A64's) (page 19)