Branch Prediction
If the chain of instructions branches, the CPU should try to predict further direction of the program to avoid decoding interruption and continue decoding the most probable branch. In this case branch prediction algorithms are used to fetch the next instructions block. K8 processors use two-level adaptive algorithm for branch prediction. This algorithm takes into account prediction history not only for the current instruction, but also for 8 previous instructions. The main drawback of K8 branch prediction algorithms was the inability to predict indirect branches with dynamically alternating addresses.
Indirect branches are the branches that use a pointer, which is calculated dynamically during program code execution. These indirect branches are usually inserted into switch-case constructions by the compiler. They are also used during addressed function calls and virtual function calls in object-oriented programming. K8 processor always tries to use the last branch address to grasp a block of code to be fetched. If the address has changed, the pipeline is cleared. If the branch address is alternating occasionally, the processor will make prediction mistakes all the time. The prediction of dynamically changing addresses for indirect branches was first introduced in Pentium M processor. Since there is no such algorithm in K8 CPUs, they are less efficient in object-oriented codes.
As we have expected, K10 boasts improved conditional branch prediction algorithms:
- It acquired prediction algorithms for dynamically changing indirect branches addresses. This algorithm uses a table of 512 elements.
- The global history register increased from 8 to 12 bits. It serves to determine the succession history for previous branch instructions.
- The depth of return-address stack increased from 12 to 24 positions. This stack serves to obtain the function return address quickly, so that the fetching could continue and there were no need to wait for the ret instruction to receive the stack return address.
These improvements should help K10 execute programs written in high-level object-oriented code much faster. Unfortunately, it is very hard to objectively estimate the efficiency of the K10 branch prediction unit, but according to some data, it may be lower in some cases than by Intel processors.