• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Processor Architecture and Types Guide

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.
Great guide. I think you are wrong about the Pentium 4-M though. It's not really that much more efficient that a regular Pentium 4. It just has SpeedStep. I think you mean the Pentium M (like Banias and Dothan) are much more efficent, in IPC and power consumption. A bit on CG A64s would be nice too. I think all the DTR's and Mobiles have CG steppings. Most Newcastles right now do too I think.
 
Only thing I have to ask about is the Xeon. This more of a work station/ mid range server chip. Intels "heavy duty" server chip is the Itanium series.
 
Update: I have finally figured out how long one of the "micro-operations" for the Pentium 4 is. Intel does not specifiy the actual size of their level 1 instruction trace caches, instead saying n-kuop trace cache. An x86 micro-operation is 72 bits long. Most P4en have a trace cache of 12K uops, which works out to 843Kbits of trace cache. (Conjecture.)

I would be very interested in knowing how you actually found this out. Intel certainly hasn't released any documentation hinting at their micro-op implementation. It's been quite a mystery to many many a people.

Execution Pipeline The pipeline is the process the CPU goes through to execute an instruction from start to finish. Complex pipelining concepts, such as data hazards, branch hazards, ...., are beyond the scope of this document, but the general rule of thumb is that the longer an execution pipeline, the higher the clock frequency has to be to perform in competition with a processor at lower frequency with small, compact and efficient pipeline. This, in part, is how AMD processors match Intel Pentium processors that have higher internal frequencies. There are a handful of very fundamental problems with long pipelines that demonstrate the microprocessor design law that less is more, but first, branch prediction must be explained.

Branch Prediction One could think of a micrprocessor as an assembly line, with various stages in the execution pipeline (below) that have instructions in various stages of execution throughout. The problem is that, unlike an assembly line, the CPU's direction of execution can change, through both conditional branch and unconditional jump. We will discuss the former. To reduce performance loss, pipelined computers use branch prediction to determine whether execution should continue at the instruction after the branch instruction, or at the branch target address. Modern branch prediction is something like ~95% accurate. However, that 5% miss rate is still significant because the processor must flush--clear out--all the instructions in the pipeline, and start anew. The longer the pipeline, the longer it will take to flush (one clock per stage).

Problems with long pipelining. Thus, we can come to the first problem with long pipelining: A longer pipeline takes longer to flush on a missed branch prediction.

The second problem is that, when one lengthens the processor pipeline (Prescott) and keeps the clock speed constant, performance will drop since an instruction will take longer to traverse the pipeline.

A minor correction. Your "second" problem is not really true at all. The whole point of pipelining is that instruction execution can be overlapped so that instruction latencies do not affect throughput. Assuming you mean "performance" to be CPI (or IPC as they're calling it nowadays. Which, according to ever computer architect out there, it isn't), unless you run into a branch mispredict, instruction throughput would not decrease as once the pipeline is full, instructions will come out of it at the rate in which they entered, no matter how long the pipeline. So a 10-stage pipeline design, once the pipeline is full (which takes 10 clockcycles at the beginning of the program) will spit out a completed instruction (assuming a scalar design) every clockcycle. A 20-stage pipeline design, once the pipeline is full (takes 20 cycles at the beginning of the program), will spit out a completed instruction every cycle just the same. There is a 10 cycle difference but when we're talking about programs that run for trillions of clockcycles, that bit of one-time difference doesn't really matter.
 
Captain Newbie said:
I was comparing relative theoretical CPI of a longer pipeline at an equivalent frequency (a la Prescott) to that of a Northwood.

Yes, and theoretically, the CPI of a longer-pipelined processor would not rise except for the case of a branch mispredict. You could argue that the CPI increases at higher clockspeeds but as you've mentioned, we're assuming the same clockspeed. Again, read my explaination above.
 
imposter said:
i dont no about anyone else but this thread really really help me out

Me too, I love P4's and it was nice to read up on amd's for once. Also....I LOVE MASH! :D . Thanx for the info dude....by far one of the most helpfull threads. :clap:
 
imgod2u said:
Yes, and theoretically, the CPI of a longer-pipelined processor would not rise except for the case of a branch mispredict. You could argue that the CPI increases at higher clockspeeds but as you've mentioned, we're assuming the same clockspeed. Again, read my explaination above.

I'll write a revision tomorrow when I'm conscious. Thanks :clap:
 
made good reading, have been out of the loop for a while. indeed im using my pops rig now. should be back up on mine within a couple of months and looking to upgrade so thanks for the info.
didnt realise how far behind i'd fallen!! :drool:
 
Back