• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

The new "Pentium M" 1MB cache

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

markodude

Member
Joined
Jul 15, 2002
Location
Europe
The new "Pentium M" 1MB cache -56k warning!

I have one coming this evening "on loan" for a while, no bull.

I hope they will work in desktop boards, anyone any ideas or info they would care to share before I pop it in my board and blow it up? I dont even know what voltage it is meant to run at, the guy who has it for me doesn't even know if it is socket 478 :D
 
Last edited:
Interesting. Given how minimal the improvement was when the P4 L2 was doubled from 256 to 512K, a 1MB L2 may not be worth the effort for a desktop system. Notbooks still tend to have low main memory performance, amplifying the effect of the L2 cache hit ratio. I would expect the improvement to be only incremental for all but applications that process a small amount of data to a large extent, like SETI. But you never know, and if given one I'd surely be flogging the little sucker too. I hope you don't have the 12X multiplier issue the other mobile P4's suffer from.
 
Thanks NookieN, I was looking for that on Intel.com but couldnt find it!

Hmm, it seems there are two versions - " the intel pentium M processor is available in 478-Pin micro FCPGA and 479-ball, Micro FCBGA packages"

Hope I get the 478 pin one then!

Larva - I dont fully understand your viewpoint, the Pentium M is essentially a PIII (more ops per clock cycle than P4) with 1MB cache and a 478 pin package running on a 100(400) FSB, ideally I want as low a multi as possible so I can run the FSB high. I am hoping that the multi will be unlocked on this chip anyway.

So I could end up with a 1MB cache PIII that clocks to as high as 2ghz on 1.75v (total guess), and I may be able to run 200x10 or something like that, to get DDR400. That would outperform most chips on the market I think, as I have heard that a 1.6ghz Pentium M benches like a 2.5 Pentium 4.

All theory and best case scenario though!
 
I didn't even realize the chp was a P3 core, but the point about cache size remains. In the presence of a high perfromance memory subsystem 512K should perform very close to the level attained with a 1MB cache. To me the really attractive aspect of this experiment would be to get a P3 core running at competitive clock rates, and on the competent DDR platform that was never made available for S370 processors. P3's do indead do more work per clock cycle than P4's do, so if one were to get a P3 running say, 2GHz, the results might well be impressive.

The multiplier issues on the P4-M's arrise from the fact that Intel uses the multiplier to throttle the chip's clock speed based on a variety of factors. As the chip's multiplier is variably set via signals generated by the BIOS and chipset, it is unlike that on desktop chips. The lowest multiplier the P4-M's support is 12x, and that is what they defualt to in the absense of the appropriate signals from the chipset/BIOS. This is not high enough to maximize the results from a P4-M, as we run out of FSB capability before the chip hits its lmits.

It is likely that the multiplier range would be different for the P3 based cpu's. Whether it is optimal or not would be as much a matter of luck as anything esle.

Given that the archtecture of the chip is completely different from other S478 cpu's, I would be shocked if it will operate on a desktop board without an expressly developed BIOS and perhaps even chipset modifications. But this is conjecture on my part, your pending experience with the actual hardware should be more definitive.
 
It's not really a P3......

The Pentium-M is a mobile design which is pretty much laid out from the ground up. It uses a lot of the P3's design philosophies and layouts but the general design of the chip is pretty different.

If anything, it's more of a hybrid between the P4 and the P3. It uses a 15-stage integer pipeline with an branch prediction algorithm more advanced than that of the P4. A 5-issue execution port coupled with micro-ops fusion is quite difference between either the P4 or the P3.

BTW, the boost to 512KB of L2 cache helped the P4 quite a bit.
With the ratio of processor/memory clock ever growing, more cache definitely helps.
 
Perhaps the semantics could have been clearer, but I think people would be surprised how little impact doubling the L2 cache size again to 1MB would have on the P4. I agree, bigger is nearly always better when it comes to cache, but there definately is a point of diminishing returns in this case that would minimize the benefit realized.

Looking at the S478 Celeron's performance with the L2 cut down to 128KB is the context in which I would label the difference between 256 and 512KB as "minimal". This view would appear to be shared by Intel as they reduced the cache to 128KB from 256 on the Celeron to maintain a clear performance distinction between the otherwise identical S478 Celeron and P4 chips.
 
I would be surprised if this chip will work on a desktop board, as the pinout is different, so it may not fit, and it is not compatable with the P4-M's chipset, which makes me wonder if it will be compatable with a P4-desktop chipset. But i hope you prove me wrong :), as that chip will flat fly!

This 1MB of L2 is important, as it allows the chip to use a "deeper" sleep and causes the FSB to be used less, which saves battery while keeping the same performance. This is the real reason for the 1mb L2, it helps keep performance up while using more power-saving features on the northbridge and chip itself. You cannot look at the P-m like a desktop chip, as you could with the older mobile chips (because they were desktop chips running at low volts basically), because it was designed with mobile processing in mind. Take the example of trace cache for instance. A normal desktop chip with a pipeline as long as the p-m's would normally have trace cache to limit the performace impact of branch misprediction, but that trace cache is gate heavy (and therefore power hungry) so it was left out. This makes branch prediction more important, which is why they re-designed the brach prediction units. It also makes high speed cache more important, which is another reason the big L2 is necessary. The moral of the story is that you cannot look at the p-m like a normal chip, as it doesn't follow the same generalizations

whew,
there's my .02
________
Chevrolet Equinox Specifications
 
Last edited:
if Intel share this view why would they bother giving the Dothan a 2MB L2 Cache, surely tht would be severe overkill if 512K - 1MB is minimal

but I think there primary goal was to create a high performing mobile processor which has a nice array of power saving, like that Ultra-low version which runs what - 7Ws was it? - the low being something like 12W and the norm being 22-24Ws

It uses a 15-stage integer pipeline with an branch prediction algorithm more advanced than that of the P4.

the P4 didnt need it though, it had its Execution Trace Cache, which they found wasnt particulary viable with this processor. But is its branch prediction more advanced? - I thought the micro-ops fusion just sent them in groups rather one at a time, hardly a revolution
 
The P4 most certainly could benefit from a better branch prediction algorithm. One of the major features of the P4 upon its release was the advanced branch prediction algorithms which helped avoid much of the clock stalls due to branch penalties. Of course, it can't be avoided entirely but in non-branchy software, we're seeing throughput limited by other things rather than branch penalties, such as the relatively weak x87 FPU pipeline. The Pentium-M came with an even better branch prediction algorithm set (no surprise, it came almost 3 years later than the P4) and it definitely helps in avoiding branch penalties and would help the P4 significantly. The execution trace cache is really something else entirely although it does help offself branch penalties a bit as well.

As for the point of diminishing returns for cache. As clockspeed of a processor increases and the ratio of the processor/memory clock grows higher, larger caches will have bigger effects. So it's not really a "ceiling" that we'll hit someday that we won't need more cache or will get less benefit from more cache, it's just a matter of balancing out the right amount of cache with the right frequency-range for that processor.
 
OK - so this guy got on the frontside bus last night and handed me a 1.4 Pentium M Centrino and another sample of unknown origin or type.
As you will notice from the picture below, the central P4 Mobile is the only one with a standard socket 478 fitment, fitting the other two in the P4 socket by bending the extra pin did not make it work, the board refused to boot.
I would be interested to hear if anyone has any clue as to what the "unknown" chip is, and also if anyone has a spare intel 855 board or knows where to get hold of one, even for a short period of time!!
Id love to benchmark this, problem is, even if I got a Pentium M compatible board its highly unlikely I could overclock with it. At least not without some PLL mods ;)
p4edited.jpg

p42edited.jpg
 
Further things I have noticed
1. The unknown chip would prolly fit in a Pentium 'M' board, as I think it's socket still has a hole for pin B2 even though it is depopulated on the Pentium M.
2. The die size is different on all 3 chiips.
3. The only one with copyright of 02 is the Pentium M, perhaps the other one is an early Pentium M?
 
*BUMP* Where are all the Intel employees? Someone can surely tell me what that chip is?? An early Dothan sample ;)
What about the i855 board situation - wil we ever see it or be able to buy a board that has this chipset and is ATX?
 
On the topic of ratio between memory clock and cpu clock, although it does tend to increase as time progresses, there are bumps in this roadmap. The increase of the FSB to 200MHz on iminent Intel products as well as the adoption of faster memory types works counter to the general trend.

A 200fsb 2.4GHz P4 will obviously utilize a 12x multiplier, lower than any desktop P4 to date. While the technical difficulties in running the motherboard at increasing clockspeeds tend to discourage the notion of the motherboard and memory subsystem keeping pace with the rapid advancements in cpu technology, at some point the makers are forced to address the issue rather than always counting on brute-force caching to alleviate the effects.

I like big caches and agree that in general larger caches are favorable and beneficial, but there is indeed a point of diminshing returns that fixes the currnet P4 cache at the 512KB level it has currently sought. Enlarging the size of the L1 caches, increased levels of set associativity, and refinements to the general architecture and branch prediction are more beneficial than continuing to double the L2 cache beyond a certain point. After all, Alphas supported 8MB of L2 cache (externally) years ago, but caches this large are only attractive for certain applications, and have not and will not become the de facto standard for general purpose processors until long after the passing of the P4.
 
Last edited:
hmmm, have no idea what that chip is. the fellow that gave you this stuff didn't know either?

I don't know of any intention from intel or board manufacturers to release the 855 as a desktop board, as it contains performance compromises that are unnecessary for a desktop, that is the reason why the p-m will probably not be released for the desktop either.

Larva:
I don't believe that the 8meg cache of the alpha ran at full clock speed (correct me if I am wrong) and for that reason is not the best example to look at (the slot1 PIIIs also had big cache, but it was off die, and therefore not as useful). I think that a better example would be the 2meg xeon chips. In this case the main reason that intel would not have cache like that on a P4 (on the current process) is that cache is a huge die hog (look at chip maps, like 1/2 of the northwood and barton is L2 and L1), and thus would drastically increase production costs for no advantage in marketing (which is more important than speed, really). But the server chips do gain a big advantage from the big L2, look at some server benchmarks comparing the 1meg and 2meg parts, the difference is substantial. Also, the apps run on the desktop do not benifit from increased cache as much, which is another reason that we don't see 2meg L2 on a P4. BUT I think that my original point is still valid, that an increase in L2 may increase battery life (if implemented in a certain way, like the p-m's is) which is the reason that these new, mobile specific, chips have monster L2s
________
Ebony Webcams
 
Last edited:
seamadan000 said:

I don't believe that the 8meg cache of the alpha ran at full clock speed (correct me if I am wrong) ... But the server chips do gain a big advantage from the big L2, look at some server benchmarks comparing the 1meg and 2meg parts, the difference is substantial. Also, the apps run on the desktop do not benifit from increased cache as much, which is another reason that we don't see 2meg L2 on a P4. BUT I think that my original point is still valid, that an increase in L2 may increase battery life (if implemented in a certain way, like the p-m's is) which is the reason that these new, mobile specific, chips have monster L2s

I noted the Alpha L2 is indeed off chip, and yes, it is not full speed. And I agree a 2MB Xeon is another good example. But the point was that truly large L2 is indeed important for typical server tasks, but not nearly so important for typical desktop tasks. As such the northwood does pretty darn well with its 512KB L2, and I maintain that people would be surprised at how lmited the scope of the improvement would be if we had access to a 1MB version for the types of usage most of use subject our hobby machines to. SETI aside.

I also agree that the large L2 has significantly more allure for the mobile applications. My point was that typical notbook designs suffer for reduced memory subsystem performance compared to their desktop counterparts, and therefore stand to benefit more substantially from increases in the L2 size. But your points concerning battery life are valid as well, and only strengthen the argument that the value of the 1MB L2 on this particular chip would be substantially more pronounced for mobile applications than it would be for the vast majority of desktop machines.
 
Last edited:
Back