New Instructions
K10 processor acquired a few new instructions, expanding its functionality:
Extended bit operations on general purpose registers:- LZCNT ?Count Leading Zeros ?counts the number of leading zero bits in operands;
- POPCNT ?Bit Population Count ?counts the number of bits having a value of 1 in operands;
SSE registers processing aka SSE4a instructions:- EXTRQ ?extracts the specified number of bits from the given position in the lower 64-bit part of SSE-register;
- INSERTQ ?inserts the specified number of bits into the given position in the lower 64-bit part of SSE-register;
- MOVNTSS, MOVNTSD ?streaming store (without involving cache-memory) of scalar floating-point values.
SSE4a instructions extension doesn?t intersect in any way with the new Intel SSE4.1 and SSE4.2 instructions.
Virtualization
AMD continued to improve their virtualization technology that serves to launch several operating systems on a single PC. One of the most significant virtualization improvements is the use of Nested Paging. In this system the virtual machines pages are nested in the global hypervisor page table. If there is no link to the page in the TLB, the CPU performs all table transformations automatically, unlike Shadow Paging that requires a lot of resources to manage the table transformations of the virtual machines.
Pic.7a: Shadow Paging mode: when switching between virtual systems hypervisor switches between page tables clearing the TLB at the same time.
|
Pic.7b: Nested Paging mode: when switching between virtual systems hypervisor doesn?t need to get involved to switch between page tables. TLB is not cleared at the same time.
|
Some data suggest that the use of Nested Paging increases the applications performance on a virtual system by 40% compared with the performance when Shadow paging mode is used.
Power and Frequency Management
New K10 processors will have new power and frequency management system. Each core will now work independently of the other, at its own frequency that may change dynamically depending on the load on each of the cores.
Pic.8: Independent core frequency management in K10 processors.
However, it is not clear yet how the performance of the shared L3 cache will be adjusted in this case. The core voltage is the same on all cores and is determined by the core under maximum workload. The memory controller manages its voltage independently of the cores and may lower the voltage in case of lower load.