Execution Pipeline The pipeline is the process the CPU goes through to execute an instruction from start to finish. Complex pipelining concepts, such as data hazards, branch hazards, ...., are beyond the scope of this document, but the general rule of thumb is that the longer an execution pipeline, the higher the clock frequency has to be to perform in competition with a processor at lower frequency with small, compact and efficient pipeline. This, in part, is how AMD processors match Intel Pentium processors that have higher internal frequencies. There are a handful of very fundamental problems with long pipelines that demonstrate the microprocessor design law that less is more, but first, branch prediction must be explained.
Branch Prediction One could think of a micrprocessor as an assembly line, with various stages in the execution pipeline (below) that have instructions in various stages of execution throughout. The problem is that, unlike an assembly line, the CPU's direction of execution can change, through both conditional branch and unconditional jump. We will discuss the former. To reduce performance loss, pipelined computers use branch prediction to determine whether execution should continue at the instruction after the branch instruction, or at the branch target address. Modern branch prediction is something like ~95% accurate. However, that 5% miss rate is still significant because the processor must flush--clear out--all the instructions in the pipeline, and start anew. The longer the pipeline, the longer it will take to flush (one clock per stage).
Problems with long pipelining. Thus, we can come to the first problem with long pipelining: A longer pipeline takes longer to flush on a missed branch prediction.
The second problem is that, when one lengthens the processor pipeline (Prescott) and keeps the clock speed constant, performance will drop since an instruction will take longer to traverse the pipeline.