You have to understand what cache is for what. (Rick cover your
ears eyes)
.
L1 is code and data in separate segments for the near core access.
L2 is code and date the cpu is about to use are has just been in use but may be needed again. (This gets real technical from here but I'll spare it).
L3 is now added as the buffer to ram and paging between cores. Buffering is just a way to move blocks fast while other operations continue. It's there when needed. As for the paging/sharing part, that is where cores use data and pass it around. In some case code is loaded there and remains when something is used often or on several cores.
As for how much cache is use depends on how a formula was devised to determine usage.
Continued 3 hours later!
As said a few posts back, L1 is expensive and power intense. I think they figured 64K+64K covered most needs as intel was running 16 and 32 for a long time. This just depends on aspect of the programs. What is needed here and now amounts to 16K-64K which is inside the low order address space. (This one is for you Archer).
L2 is where the most swapping occurs. First shot to the L1 and close by. In most cases this is the staging area to get code and data clode to the cores.
I think 1meg per core would improve some things but the average code block or data chunk is in the 16K, 64K and 256K range. A lot of this occurs because so much was built on the 16bit and 32bit systems that these rules still apply. Some large data access programs may have an avantage with more L2. Some games would be included here but 1 meg still would only be a drop in the bucket anyway. It comes down to trade offs.
L3 for the servers and buiness apps get more here from the sharing aspect with the multi-threaded apps.
L3 for gamers is mostly a buffer situation but greatly improves some games by keeping some code and data close at hand.
Both companies see this clearly and this is why cache sizes are what they are. Costs & real estate vs app needs drive what gets on die. Shrinking the die gets more space and costs less so this is why we see the larger caches. Power for the core is less leaving power available for the larger caches. Shanghai/Deneb are suppose to be able to shut down sections of cache to save power but I don't know if that spec made it to production.
I see 2M L3 doing a lot for most of what we run here. 4-8M does help many games and other apps.
Just looking at the Tri and Dual cores with the full 6M of L3 make a lot of sense. We use to have 64k to 4M on many of our first "PCs". Some programs may still run inside these specs but use more data. It's just a good size cache like having on die ram.