AMD's Next Generation Microarchitecture Preview: from K8 to K8L :
  TheThirdMedia HardwareCPU GuideCPU Article > AMD's Next Generation Microarchitecture Preview: from K8 to K8L

AMD's Next Generation Microarchitecture Preview: from K8 to K8L

Date: 2006-8-23

[Abstract]
   July 27, 2006, Intel officially introduced its new Core 2 processor to the public. Based on the Conroe core, it proved to be a breakthrough in terms of CPU performance. AMD just doesn't...

[Content] PCDigitalMobileGame


Decoding

The x86 instructions extracted from the block of bytes are decoded into macro-operations. A macro-op consists of two micro-operations: an integer or floating-point arithmetic micro-op and an address operation for memory access. The splitting into micro-ops is done by the scheduler prior to sending them for execution. The decoder of K8 processors distinguishes between three types of instructions:

  • DirectPath Single instructions are decoded into one macro-op in the hardware decoder
  • DirectPath Double instructions are decoded into two macro-ops in the hardware decoder

VectorPath instructions are decoded into three or more macro-ops using the on-chip microcode-engine ROM

    In a K8 processor, DirectPath and VectorPath instructions cannot be dispatched simultaneously. The decoders are issuing the decoded results at a rate of 3 macro-ops per cycle. Thus, the hardware decoder can decode 3 single instructions, 1 double and 1 single instruction or 1.5 double instructions (3 double instructions per two cycles). Since one VectorPath instruction can be decoded into more than 3 macro-ops, it can take more than 1 cycle to decode such instructions.

    The macro-ops produced by the decoder each clock cycle are united into groups. A group consisting of 2 or even 1 macro-op is possible due to alternation of DirectPath and VectorPath commands and to various instruction fetch latencies. Such a group is completed with empty macro-ops so that there are thee macro-ops in total and is then dispatched.

    VectorPath instructions from the SSE, SSE2 and SSE3 sets are divided in the K8 processor into pairs of macro-ops that separately process the top and bottom 64-bit parts of a 128-bit SSE register on 64-bit execution units. That?s why such instructions are decoded in the K8 processor at a rate of 3 instructions per 2 clock cycles. The width of the SSE devices in the future K8L processor will be expanded to 128 bits, so there is now no need to split vector instructions in two parts. The algorithm of decoding such instructions will obviously be changed in such a way that vector instructions could be decoded into single 128-bit macro-ops at a rate of 3 instructions per cycle.

    Although the decoder of the K8L processor may not be able to decode 4-5 instructions per cycle, just the way Conroe can do it under favorable conditions, it will not hinder programs execution, because the commands are on average executed at less than 3 commands per cycle. K8 usually decodes one x86 instruction into fewer macro-operations than Conroe CPU would do. This, as well as the 32-byte fetch set, make its decoder highly efficient.






    [ Remark ] [ Print ] [ Font: Large Standard Small ]

    Last News: AMD Athlon 64 X2 3600+ AM2 Review
    Next News: Socket 939 Heatsink Roundup

    Search News



     
    Class Title
    Home Page (0)
    CPU Guide (959)
    CPU News (744)
    CPU Article (215)
    Chipset Guide (193)
    Memory Guide (472)
    Mainboard Guide (464)
    Video Guide (1339)
    Storage Guide (410)
    Multimedia Guide (736)
    Mobile Guide (492)
    Other HD Guide (2471)
     
    Hot News
       
       
        >> Remark List   [Total 0 Remarks]
       
      Post Remark


      Remark: Letters0
      Name:   


        >> Related News