• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

What is L1 cache

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

asmodeansedai

Member
Joined
Apr 9, 2004
Location
Montana
I have never realy thought about it before, and just taken it for granted about what l1 cache does, and then yesterday a freind asked me what it does, and im just like.........uh....hmm..... I get the basic idea of what l2 is though, and also, why do almost all procs that ive seen only have the 64 64, or 128k of l1 cache
 
L1 cache is basically, a storage shed on board the processor. It stores data that is used frequently, and instead of having to go to the RAM/Hard Drive each time, it has it in Storage, (Cache). L1 is faster than L2, and L2 is faster than L3.
 
Level 1 cache is a memory cache (holding area for data that has a high degree of possibility to be needed and thus accessed) packaged within the same module as the CPU. Also known as the "primary cache," an L1 cache is the memory closest to the CPU.

The Level 2 cache is the same (basically) as above but it is the "feed" to the Level 1 cache.

Think of it like a backpack when you go to school. Let's say that the books you might need for the day are at home. You (The Bus) carry the books to the school and leave some of them that you might need in the locker (Level 2 Cache) and the rest in your backpack (Level 1 Cache) for possible need.

Now your mind (The CPU) has decided early on (Branch Prediction) what books (The Data) you will need or possibly need. Then keeping this view in your mind (Prefetch Logic) you try to get the neccessary books (Data) as close as possible (Data Fetch).

R
 
There are different methods for data accessing and instructions to access the data. AMD uses a much larger L1 cache and addressing system than Intel.
So cache sizes do change but there is an area of size usefulness for the chip.

Intel's level 2 has changed many times as it has deepened it's pipeline and this has brought performance increases. AMD's L2 cache has followed but since the original method and pipeline has not there has not followed such a drastic increase.

Often there just is no need to change the L1 cache size to make it bigger as often when the L1 is made larger there is a latency (time it takes to reach into the bag or get to the locker) slowing down and this makes the size increase useless.

R
 
hmm... that makes sence, because that meens it checks the l1 cache before it checks any thing else correct? and if theres more to look through in l1 it ends up slowing down
 
asmodeansedai said:
hmm... that makes sence, because that meens it checks the l1 cache before it checks any thing else correct? and if theres more to look through in l1 it ends up slowing down
You've got a good understanding asmodeansedai :) Basically both AMD and Intel have found the best sizes for their architectures and barring the keeping up with the Jone's increases they are at that point with their robust architectures.

R
 
ok, wish i would have known this the otherday.... dont like it when some one asks me a question about comps and i realize i have no answer for it.
 
asmodeansedai said:
dont like it when some one asks me a question about comps and i realize i have no answer for it.
And it is this characteristic that will keep you questioning and finding out the needed information.

A very good characteristic in my opinion. :)
 
Just as another drop in the bucket...

L1 is typically split up into instruction cache and data cache (output).

Cache control and efficiency is actually more important than the overall size, but the differences between AMD and Intel's cache controllers aren't publicly available, and I have no clue about the details at the moment since I haven't read up on that stuff lately. I do know that keeping the cache filled with pertinent data is based on, as above, spatial locality, and repeated use. (It's also kinda cool how the cache controller works, but that's beyond the scope of this forum.)

The bigger a memory gets, the longer it takes to access it. This is one reason why L1s don't get bigger all the time, since they need to run fast to keep up with the flow of instructions and data.
 
even though I fell asleep half way through that, I do get what your sayin and it is exactly what i said just in more words
 
L1 is meant to capture small programming loops that repeat many times. It turns out that in programming such small loops represent a large portion of what a computer usually does. Even a very large program often uses only small areas at any one time hence enormous speed can be achieved by capturing these loops. Once the loop is in L1 the computer operates at full speed. However, to be fast, L1 must be small because of the time needed to find data in the cache and present it to the processor. But, that's ok because most loops are small and easily fit in L1.

L2 is bigger and so it takes longer to search which makes it slower and less suitable for program looping. However L2 can contain many loops from several programs or parts of the same program. Each loop can be presented as needed to L1 for fast execution. Essentially L2 is a resevoir of information that is faster than main memory, but slower than L1. So, L2 makes multitasking much faster, but L1 makes individual loops faster.

NOTE: It's interesting that L1 was 16k in the PIII but was reduced to 8k in the early P4 to speed up the memory access and thus keep pace with the faster processor speed.
 
What's also interesting is the trace cache that replaced the L1 I-cache in the P4. It stores instructions in their decoded form, decreasing latency and increasing bandwidth by avoiding having to run through the x86 instruction decoder (the p4 has only 1, while the athlon has 3, making it less of a bottleneck).

You can see the trade off between speed and size in different processors. The athlon 64 has a 64k L1 D-cache with a latency of while the P4 has only an 8k with a latency of 2 cycles or 16k with a latency of 4 cycles (depending on the core). Remember that the cycles on a P4 may be much shorter than that of an Athlon 64 (since it can reach higher speeds).

A larger, and therefore slower L1 cache on the p4 would mean more cycles lost while waiting on data. This is one of the disadvantages of a deeply-pipelined high-speed architecture. To combat this, intel uses a fast L1 cache, a huge L2 cache, massive amounts of memory bandwidth and hyperthreading. This way fewer cycles are wasted waiting for data because the wait is shorter, and hyperthreading gives the core something to do while waiting for data.
 
Back