Cache and more Cache

– What kind, how fast and how much? –

Cache is an important issue in a PC nowadays. When Intel came with their Celeron the whole world fell on top of them because it wasn’t equipped with L2 cache. People didn’t want a CPU without L2 cache. Overclockers at the other hand loved it, since the absence of L2 cache made the CPU very overclockable.

Intel improved their Celeron by adding 128kB L2 cache. This new cache runs at full CPU-speed. This turned out to improve performance by making cache more efficient per kB. Intel proved that more isn’t always the only solution. There seems to be more to it then just looking at size.

This article tries to give you a better impression of what cache is and what it does. We’ll look at some aspects of cache memory like size, speed and type of cache.

Introduction to cache

What does cache do?

First lets take a look of what cache does. Cache is nothing but fast and efficient memory. However cache-memory is also more expensive and therefore only small amounts of it are used when designing and building a PC.

Cache and how it works

While using your computer, all active programs (like documents, spreadsheets, games, etc.) are stored in active memory (RAM). Some parts of the memory – usually the parts used very recently – are also stored in the cache memory. When data is needed for the CPU it will first look in the fast cache memory, before looking in the slower main memory.

If the CPU needs to do some calculations with the same group of data, it’s likely that the required information will be stored in the cache. In such cases the CPU will not need to access the slower RAM. Especially business applications benefit from this, but also AMD’s 3DNow technology requires fast and efficient cache to keep the 3DNow units busy.

What kinds of cache are there?

There are different kinds of cache. Every CPU has a small amount of cache in the CPU. It’s called Level 1 (L1) cache because it’s the first place the CPU will look for information.

Besides the L1 cache there’s often a second layer of cache, called level 2 cache (L2). This cache is slower than the L1 cache but still much faster than regular memory. If information isn’t found in the L1 cache the CPU will then look in the L2 cache. If the information is still not found the CPU will finally get the information from the main memory.

In theory there can be more layers – the AMD K6-3 will have L3 cache – but performance gain gets less with every layer and could even decrease performance in some cases, as we will see.

The different kinds of cache

L1 Cache

The level 1 cache is the most important cache. It’s located very close to the CPU and can therefore be accessed very quickly. L1 cache is usually not very big, 32kB for a Pentium II system and 64kB for an AMD K6-2. Although L1 cache is very small it’s very important. I did a small test to illustrate this.

System:

  • AMD K6-2 350 MHz, ASUS P5A (512kB L2 Cache), 128MB PC100 memory and a Matrox Millenium G200 8MB videocard.
  • The tests were made with Final Reality 1.01 running at Windows 98 with DirectX 6 and G200 drivers version 4.26.
  • I did the test 4 times; once with the L1 cache enabled and once without the L1 cache enabled. After that again but then with L2 cache disabled too.

Results:

Cache and Performance

Figure 1, Cache and Performance.

As can be seen from this picture L1 cache is very important. Without your system will be incredibly slow. The lack of L2 cache has also impact, but not even remotely close to that of the L1 cache.

The L1 cache runs at CPU speed, so speed-differences between different CPU’s are not an issue. Therefore one is tempted to say more cache is better. Of course this is true, but there’s more to it. Intel Pentium CPU’s have L1 cache that is 4 way associative. AMD’s cache is bigger but ‘only’ 2 way. This means that with random access of the L1 cache the Intel will be slightly faster.

Another option to make cache more efficient is write allocation. If information isn’t found in the L1 cache it searching efficiently in the L2 cache is required. The better – more ‘intelligent’ – this is done the better the performance. AMD’s K6-2 400 for instance has a different core than the slower models. Even if you clock the K6-2 400 at 350 it will still outperform the original 350. This is mainly because the new core has improved write allocation.

L2 Cache

The next step is to look at the Level 2 cache. The exact amount and types of L2 cache vary very much between different types CPU. What also varies is the speed of the L2 cache.

Intel Pentium II systems are all equipped with 512kB L2 cache running at half CPU speed. To lower costs Intel designed a new CPU for the low-budget market: Intel Celeron. The first Intel Celerons (266 and 300) had no L2 cache at all. Because this lowered business performance to levels below those of the old Pentium MMX, the later versions of the Celeron (300A and 333) were equipped with 128kB of cache running at full speed.

So a Pentium II has 4 times as much L2 cache but it’s running ‘only’ at half CPU speed. Question is what is more efficient. Because the Celeron-core is (virtual) identical to the Pentium II Dechutes core, the effects of L2 cache can be seen very easily.

See figure 2 and 3 for a comparison between a Pentium II 300 and Celeron 300A, 128MB PC100 memory, Matrox Millenium G200 8MB and an ASUS P2B. For the Quake II test a Diamond Monster II 12MB card was also used. Both systems were again running Windows 98 with DirectX 6 and G200 drivers 4.26.

Cache and Winstone performance

Figure 2, Speed vs Size – Winstone.

Cache and Quake II performance

Figure 3, Speed vs Size – Quake II.

 

So big is beautiful, but not the only important factor. The Celeron with its fast 128kB keeps up with its bigger brother offering better value for money for most situations.

However there are situations were L2 cache size is the primary issue. For instance with disk-access size is more important than speed. So network-file-servers will rather have more cache than faster. Both of course is even more ideal: Intel’s Pentium II Xeon machines have 512kB or more L2 cache running at CPU speed.

When we look at AMD we see that the L2 cache isn’t located close to the CPU. Instead it’s located at the mainboard and running at FSB, normally 100MHz with the K6-2. So compared to the Intel Pentium II L2 cache it’s much slower. At lower CPU speeds the difference will not be noticed that quickly. A Pentium II 300MHz will have its cache running at 150MHz. An AMD K6-2 with an ASUS P5A will have it running at 100MHz. When we look at for instance the 400MHz level this picture changes: 200MHz for Intel vs ‘only’ 100MHz for AMD, so twice as fast. This is one of the main reasons the K6-2 has trouble keeping up with the Pentium II and Celeron.

K6-2’s successor the K6-3 will be equipped with 256kB on-die cache just like the Celeron. It seems that AMD took the best of both worlds: twice as much cache as a Celeron and twice as fast as an Intel Pentium II. First testresults show indeed that the K6-3 takes out an equally clocked Pentium II system in both gaming and business areas. Because the K6-3 has the same core as the K6-2 400 the performance gain can be totally subscribed to the fast L2 cache.

L3 Cache

The K6-3 brings us to an other interesting point. Since the K6-3 will have its own L2 cache, the L2 cache at the mainboard will move up to being L3 cache. Problem is that this L3 cache will not add much extra performance to the on-die L2 cache.

Why? Well, the ratio L2/L3 is important, just as the ration L1/L2 is important. If information is not found in a specific level the CPU will try the next level, which is slower but also bigger. However if the next level isn’t that much bigger, chances that the required information will be found are small. So the CPU will spend time looking for nothing, and could in theory even end up being slower instead of being faster. So if the L3 cache is not significly bigger benefits are small. A factor of at least 4:1 is often used. Look for instance at the L2/L1 values of the Celeron with 32:128 and the K6-3 with 64:256: both are exactly 4:1.

AMD claims in combination with 1MB cache, so L3/L2=4:1, on the mainboard an extra performance gain of 3-4%. Fine, but since most Super 7 boards are ‘only’ equipped with 512kB of cache this also means that those boards will probably gain nothing. I don’t think it will hurt performance, but disabling might have only positive effects: increasing the overclockability.

 

Conclusions

Big is beautifull when it comes to cache, but speed is just as important. When we, with that in mind, look at Intel’s Celeron we see a CPU offering excellent value for money, and in combination with its excellent overclocabilty we could just as well say its offering even better value for money than a equally clocked Pentium II system.

From this aspect the future for the K6-3 looks bright too: superior performance and still keep prices acceptable.

Finally in table 1 a listing of some of the features of L1, L2 and L3 cache.

 

K6-2

K6-3

K7

Celeron

Xeon

PII

L1 Size

64kB

64kB

128kB

32kB

32kB

32kB

L1 ‘associativety’

2

2

2

4

4

4

L2 Size

512kB+

256kB

512kB+

128kB

512kB+

512kB

L2 Speed

FSB

1xCPU

1/3xCPU

1xCPU

1xCPU

1/2xCPU

L3 Size

512kB+

L3 Speed

FSB

Memory Access

FSB

FSB

2xFSB

FSB

FSB

FSB

Table 1, Cache and memory features of different CPU’s.

 

Written by A.A.Gerritsen

for the CPU Site,

December ’98


Be the first to comment

Leave a Reply