AMD K10 Micro-Architecture --- TheThirdMedia Hardware

Decoding

The blocks received from the instructions cache are copied into the Predecode/Pick Buffer , where instructions are singled out from the block, their types are defined, and then they are sent to the corresponding decoder pipes. Simple instructions that can be decoded with one (Single) or two (Double) micro-operations are sent to the ?simple?decoder called DirectPath . Complex instructions that require 3 or more micro-operations to be decoded, are sent to the micro-program decoder aka VectorPath .

Pic.2: Decoder

Up to 3 macro-operations (MOPs) may leave decoder pipes each clock cycle. Every clock cycle DirectPath decoder may process 3 simple single-MOP instructions, or one 2-MOP instruction and one single-MOP instruction, or 1.5 2-MOP instructions (three 2-MOP instructions in two clocks). Decoding of complex instructions may require more than 3 MOPs that is why they may take a few clocks to complete. To avoid conflicts on leaving the decoder pipes, K8 and K10 simple and complex instructions may be sent for decoding simultaneously.

MOPs consist of two micro-operations (micro-ops): one integer or floating point arithmetic operation and one memory address request. Micro-operations are singled out from the MOPs by the scheduler, which then sends them to be executed independently from one another.

MOPs leaving the decoder every clock are combined into groups of three. Sometimes the decoder may generate a group of 2 or even only 1 MOP because of the alternating DirectPath and VectorPath instructions or different delays in the selection of instructions for decoding. An incomplete group like that is filled with empty MOPs to make three, and then is sent to be executed.

Vector SSE, SSE2 and SSE3 instructions in K8 processor are split into MOP pairs that process separately the upper and lower 64-bit halves of the 128-bit SSE register in 64-bit devices. It slows down the instructions decoding by half and cuts down in half the number of instructions in the scheduler queue.

Thanks to larger 128-bit FPU units in K10 processors, there is no need to split vector SSE-instructions into 2 MOPs any more. Most SSE-instructions that K8 used to decode as DirectPath Double, are now decoded in K10 as DirectPath Single in 1 MOP. Moreover, some SSE-instructions that used to be decoded through K8 micro-program VectorPath decoder, are now decoded in K10 through simple DirectPath decoder with fewer generated MOPs: 1 or 2 depending on the operation.

Decoding of stack instructions has also been simplified. Most stack operation instructions that are usually used for CALL-RET and PUSH-POP functions are now also processed by a simple decoder in a single MOP. Moreover, special Sideband Stack Optimizer scheme transforms these instructions into an independent chain of micro-operations that can be executed in parallel.

[Pages]
   Last Page
   [1]· AMD K10 Micro-Architecture
   [2]· AMD K10 Micro-Architecture - 2
   [3]· AMD K10 Micro-Architecture - 3
   [4]· AMD K10 Micro-Architecture - 4
   [5]· AMD K10 Micro-Architecture - 5
   [6]· AMD K10 Micro-Architecture - 6
   [7]· AMD K10 Micro-Architecture - 7
   [8]· AMD K10 Micro-Architecture - 8
   [9]· AMD K10 Micro-Architecture - 9
   [10]· AMD K10 Micro-Architecture - 10
   [11]· AMD K10 Micro-Architecture - 11
   Next Page



	TheThirdMedia Hardware → CPU Guide → CPU Article > AMD K10 Micro-Architecture