NUMA and Node Interleave
Note that since AMD Quad FX is a fully-fledged dual-socket system it features two dual-channel DDR2 SDRAM controllers. As a result it theoretically features twice the memory bus bandwidth of the regular desktop platform and hence allows resolving some of the problems during data transfers between the cores. The communication between the processors and the memory is implemented in the NUMA technology (Non-Uniform Memory Architecture) that allows both processors to operate within the same address space. In other words, each processor can address the memory of the other processor via the HyperTransport bus between them. And no special tricks are required in this case. Moreover, it doesn’t really matter for either of the CPUs in which memory the requested data is located. However, it is important to understand that if this data is located in the memory of the other CPU, the latency of the corresponding operation will be significantly higher than in case the data were in the processor’s own memory.
Quad FX platforms can use the shared memory in two ways, which is controlled from the BIOS Setup by adjusting Node Interleave option accordingly. With Node Interleave enabled (the nodes in this case are the processors and the memory controller with the DDR2 SDRAM modules connected to it) the memory fills up evenly no matter what CPU initiates the writes. In this case the data appears “spread?over both memory subsystems of the two CPUs. Of course, it results into higher latency of memory operations than in case of regular single-processor systems, because half of all the memory requests go through the memory controller of the “other?CPU.
With Node Interleave disabled, the processor writing data into the memory first of all uses its one controller. However, this approach also doesn’t allow Quad FX systems to reach the efficiency of the single-processor platforms. The thing is that Microsoft Windows XP operating system does its best to load all CPUs of the multi-processor system evenly and hence constantly switches the tasks between the processors. As a result almost in half the cases the CPU will still have to address the memory subsystem of the other processor with higher latency.
As a result, it is important to understand that from the memory performance standpoint, single-processor AMD systems are overall faster than multi-processor ones. The launch of Microsoft Windows Vista should with enhanced scheduler should make things better for the Quad FX platform, because this scheduler will support NUMA technology and will not shift tasks forth and back all the time between the logical processors.
That is why AMD recommends enabling Node Interleave for Windows XP and disabling this feature for Windows Vista on its Quad FX platform.