Four-way
superscalar (can fetch, execute, and complete more than one instruction
in parallel) microprocessor introduced in 1995.
Fetches
and executes four instructions per cycle.
Clock
speeds range from 200MHz – 250MHz.
6.8 million
transistors.
Five backward
compatible revisions of the MIPS instruction set exist:
MIPS
I, MIPS II, MIPS III, MIPS IV (which the R10000 uses), and MIPS
32/64.
MIPS chips
are used in numerous devices such as SGI’s (Silicon Graphics)
product line, Nintendo Gamecube, Sony Playstation 2, Cisco routers,
and the PSP.
John Hennessey
from Stanford University worked with a team that would eventually create
the first MIPS processor in 1981.
The sole
purpose of the MIPS chip was to be as efficient as possible by the use
of instruction pipelining.
The
next instruction would always be cached and started to be worked
on before the current instruction finished.
Each
ALU has a 64-bit adder and logic unit both capable of address computation
and arithmetic and logical operations on integers. In addition to
the adder and logic unit they contain the following:
ALU1
64-bit
shifter and branch condition logic.
ALU2
Integer
multiplier and division
Booth’s
algorithm is used for multiplication and a nonrestoring
algorithm is used for division
3 types
of CPU instructions each of which are 32-bit aligned words.
I-type
(Instruction)
J-type
(Jump)
R-type
(Register)
Opcode
6-bit
operation code
There
are 3 different register specifiers:
RD
- 5-bit destination register
RS
- 5-bit source register
RT
- 5-bit target register
Other
instr_index
- 26-bit index which is shifted 2 bits to the left and makes up
the jump target address (used in the jump instruction)
SA - 5-bit shift amount
The MIPS
processor features a 4-way superscalar pipeline.
This simply
means that at each stage, 4 instructions are handled in parallel.
During
the first timing cycle 4 instructions are fetched.
During
the second timing cycle the next 4 instructions are fetched and the
previous 4 instructions are decoded.
The pipeline
clock is 200Mhz
4-way
superscalar pipeline structure
As
you can see here one instruction is executed during each cycle of the
pipeline clock rather than 4. This type of pipelining is referred to as
linear pipelining.
Each instruction
is pre-decoded as it is loaded into the I-cache.
After
being decoded, the instruction is placed into one of three 16-entry
instruction queues (integer/branch, floating point, or address) for
scheduling.
The instructions
are executed out of order, then sent to another cache to be re-ordered.