MIPS - R10000

Overview/History (top)

  • Four-way superscalar (can fetch, execute, and complete more than one instruction in parallel) microprocessor introduced in 1995.
    • Fetches and executes four instructions per cycle.
    • Clock speeds range from 200MHz – 250MHz.
  • 6.8 million transistors.
  • Five backward compatible revisions of the MIPS instruction set exist:
    • MIPS I, MIPS II, MIPS III, MIPS IV (which the R10000 uses), and MIPS 32/64.
  • MIPS chips are used in numerous devices such as SGI’s (Silicon Graphics) product line, Nintendo Gamecube, Sony Playstation 2, Cisco routers, and the PSP.
  • John Hennessey from Stanford University worked with a team that would eventually create the first MIPS processor in 1981.
  • The sole purpose of the MIPS chip was to be as efficient as possible by the use of instruction pipelining.
    • The next instruction would always be cached and started to be worked on before the current instruction finished.

ALU (top)

  • The MIPS R10000 has two separate integer ALU’s.
    • Each ALU has a 64-bit adder and logic unit both capable of address computation and arithmetic and logical operations on integers. In addition to the adder and logic unit they contain the following:
      • ALU1
        • 64-bit shifter and branch condition logic.
      • ALU2
        • Integer multiplier and division
        • Booth’s algorithm is used for multiplication and a nonrestoring algorithm is used for division

Accumulator (top)

  • The Hi and Lo registers make up the accumulator.
  • Multiplication
    • Double-precision product is contained in both.
  • Division
    • Quotient in the Hi register and the remainder is stored in the Lo register.

Addressable Memory (top)

  • MIPS R10000 memory is byte addressable with a 64-bit address. It has a mode bit that allows software to select either Big-endian or Little-endian.

Number Representation (top)

  • MIPS R10000 stores integers in two’s compliment.
  • MIPS R10000 stores floating point values in IEEE Standard 754 representation.

Instruction Formats (top)

  • 3 types of CPU instructions each of which are 32-bit aligned words.
    • I-type (Instruction)
    • J-type (Jump)
    • R-type (Register)
  • Opcode
    • 6-bit operation code
    • There are 3 different register specifiers:
      • RD - 5-bit destination register
      • RS - 5-bit source register
      • RT - 5-bit target register
  • Other
    • instr_index - 26-bit index which is shifted 2 bits to the left and makes up the jump target address (used in the jump instruction)
      SA - 5-bit shift amount

Instruction Formats

Instruction Formats

Addressing Modes (top)

  • Register Addressing
    • The operand is in the register specified
      • Example: ADD
  • Immediate Addressing
    • The operand is in the instruction
      • Example: ADDI
  • Displacement Addressing
    • The displacement in the instruction is added to the register to determine the memory address
      • Example: LB, LW
  • PC-relative Addressing
    • The address is the sum of the PC and the constant in the instruction
      • Example: BEQ
  • Pseudo-direct Addressing
    • The jump address is the 26-bits of the instruction concatenated with the upper bits of the PC
      • Example: J, JAL

Instruction Set (top)

Pipelining (top)

  • The MIPS processor features a 4-way superscalar pipeline.
  • This simply means that at each stage, 4 instructions are handled in parallel.
  • During the first timing cycle 4 instructions are fetched.
  • During the second timing cycle the next 4 instructions are fetched and the previous 4 instructions are decoded.
  • The pipeline clock is 200Mhz

4-way superscalar pipeline structure

 

As you can see here one instruction is executed during each cycle of the pipeline clock rather than 4. This type of pipelining is referred to as linear pipelining.

Interconnection Structures (top)

Cluster Bus

  • used in a multiprocessor system
  • “cluster coordinator performs the cluster bus arbitration and data flow management.”
  • a processor request issued by the master processor is observed as an external request by all slave R10000 processors

Cluster Bus

 

Uniprocessor System

 

Multiprocessor System (using dedicated external agents)

Multiprocessor System (using the cluster bus)

Microprocessing (top)

  • Each instruction is pre-decoded as it is loaded into the I-cache.
  • After being decoded, the instruction is placed into one of three 16-entry instruction queues (integer/branch, floating point, or address) for scheduling.
  • The instructions are executed out of order, then sent to another cache to be re-ordered.

R10000 Structure