EM64T in Core 2 Duo: What’s the Theory?
All the claims of relatively low Core 2 Duo performance in 64-bit modes are based on two facts. According to some info confirmed by Intel representatives, there are two limitations imposed over the EM64T support in Core microarchitecture. Firstly, Core 2 Duo processors do not support Macrofusion technology in 64-bit mode. Secondly, the processor code decoding may slow down because of the instructions working with additional registers available only with EM64T enabled. Let’s try and get to the roots of these two problems.
Thanks to Intel’s marketing people, Macrofusion is known as one of the key peculiarities of the new Core microarchitecture. This technology serves to increase the number of instructions processed per clock cycle. Namely, the processor recognizes some pairs of sequential x86 instructions as a single microinstruction. A good example of a pair like that is a comparison followed by conditional branch, for instance. The scheduler and the execution units see this microinstruction as a single command and process it accordingly. This way the code is processed faster allowing the CPU to execute up to 5 instructions per clock cycle at best.
However, non-operational Macrofusion technology in 64-bit mode can hardly affect the CPU performance that dramatically. Ideally, when there is a branch per every five x86 instructions and when all these five instructions fall into the 16-byte sample processed within a single clock cycle, the theoretical acceleration will make 25%. However in reality, this technology will ensure steady performance improvement only if the whole bunch of conditions are fulfilled. At least because the above describe frequency of conditional branches is not realistic at all. Moreover, Macrofusion technology is really efficient only if the average instruction length equals less than 4 bytes. As a result, the engineers estimate the possible improvement to be 3%-5% at the most. In other words, the absence of Macrofusion support in EM64T should be no reason for panic, because it doesn’t really affect the performance that much.
As for the overall performance slowdown caused by instructions working with additional registers, it results from the single-byte REX prefix that is added for all 64-bit operations. This prefix probably affects the average length of instructions processed by the CPU in 64-bit modes. As a result, there may be fewer instructions within the 16-byte code sample from the L1 cache that is decoded in a single clock cycle. In other words, the average instruction length in x86 code is about 2.5-3.5 bytes, while in 64-bit mode it increases because of the REX prefix. When the average instruction length exceed 4 bytes, the CPU may lose its ability to process 4 instructions per clock.
To be fair we should say that the increasing instruction length caused by the REX prefix is typical not only of the CPUs from Intel on the new Core microarchitecture, but also of the competitor’s K8 processors. The only difference is that K8 can handle maximum 3 instructions from this 16-byte sample to load the execution units to the full extent, while Core 2 Duo from Intel can process 4 instructions per clock cycle thanks to Intel Wide Dynamic Execution technology.
This way, we don’t think that the EM64T implementation issues discussed above are that dead serious for Core based Intel processors. The code is fully similar to the regular 32-bit code and it is processed just a little bit slower on Core 2 Duo processors because of the non-operational Macrofusion technology. As for the performance drop caused by the 64-bit operations, the ability of the CPU to work with more registers with higher capacity will definitely make up for the slowdown.
Therefore, we do not feel like dramatizing the drawbacks revealed in 64-bit support implementation of the new Intel microarchitecture. Although, they will have some influence on the performance, of course. In order to avoid spreading panic we suggest taking a closer look at the performance of Core 2 Duo and Core 2 Extreme processors in 64-bit Windows XP Professional x64 Edition with 64-bit applications and comparing the obtained results with what we see in 32-bit Windows XP Professional environment.