F21 CPU

Updated 6/4/97

The F21 CPU is based on the MuP21 however F21 has many improvements over MuP21 such as a 5 times faster CPU clock, deeper stacks, more instructions, more branch instruction modes, more SRAM addressing, and more coprocessors on chip so that in addition to MuP21's video output F21 offers video I/O, analog I/O, serial network I/O, and a parallel I/O port on chip. F21 has a transistor count of about 15,000 vs about 7,000 for MuP21.

The F21 CPU contains an on chip DATA STACK with 18 levels, and an on chip RETURN STACK with 17 levels, a memory addressing register A, and a program counter. Instructions are 5 bits wide, and with 4 instructions packed into a 20 bit word the CPU can run at up to four times the speed of memory access (when executing sequential stack based instructions) and executes instructions at 2 nanoseconds each. Internally the CPU operates at 500 MIPS, but memory access timing will limit the actual throughput of the F21 CPU to about 333MIPS in ROM, 200 MIPS in SRAM, and 100 MIPS in DRAM. Memory bandwidth is also used by both the CPU and I/O processors, so video or other coprocessor operation further reduces CPU access to memory.

CPU Instructions are based on Forth:


JMP   unconditional jump           ( 3 types, 10 bit, 14 bit, home page)
T0    branch if TOP of stack =0    ( DUP IF ) ( 3 types)
C0    branch if CARRY bit not set  ( 3 types)
CALL  subroutine call              ( 3 types)
RET   subroutine return
#     literal, place immediate word into top of stack
@A+   place memory contents pointed to by register A in TOP, increment A
@R+   place memory contents pointed to by register R in TOP, increment R 
@A    place memory contents pointed to by register A in TOP of stack
!A+   store TOP of stack into memory pointed to by A, increment A
!R+   store TOP of stack into memory pointed to by R, increment R
!A    store TOP of stack into memory pointed to by A
COM   complement TOP of stack
AND   AND TOP of stack with NEXT and leave result in TOP
-OR   EXCLUSIVE OR TOP of stack with NEXT and leave result in TOP
+     ADD TOP of stack to NEXT and leave result in TOP
2*    left shift TOP
2/    right shift TOP
+*    ADD TOP of stack to NEXT and leave result in TOP, NEXT unchanged
      (perform add only if the least signifigant bit of T = 1)
A     copy A to TOP of stack
A!    move TOP of stack to A
DUP   duplicate TOP of stack
DROP  discard TOP of stack
OVER  duplicate the second item to the TOP of Data stack
PUSH  TOP of DATA stack to TOP of RETURN stack
POP   TOP of RETURN stack to TOP of DATA stack
NOP   No CPU operation

F21 STACK PROCESSOR CPU DESCRIPTION

ARCHITECTURE

This is a 21-bit Forth engine with 2 push-down stacks:
data stack (S) - 18 deep
return stack (R) - 17 deep.
It has a 20-bit memory bus, the 21st bit serving as address or carry.

A register (T) acts as the top of the data stack. All data are placed in T; its prior contents are pushed onto S.
The ALU acts upon T and S, leaves its result in T and pops S for binary operations (+ -or and).
A register (A) is used to address data.
A program counter (P) is used to address instructions.
The return stack stores subroutine return addresses (and occassional data).
A configuration register (C) specifies timing and addressing options.

F21 uses a physical bus that represents the number or address 00000 with positive logic on even bits and negative logic on odd bits. This means that the package pins show AAAAA for the number or address 00000. Alternate bits on the pins are complemented. -or a number with 0AAAAA to determine its pattern. Thus, the number 00100F has the pattern 0ABAA5 on the pins. The ALU acts upon numbers; addresses are numbers. The configuration register stores patterns; the package pins display patterns.

INSTRUCTIONS

The CPU powers-up at address 1AAAAA (pin pattern 100000, slow ROM). Boot code will normally copy a program from 8-bit FLASH or ROM to 20-bit RAM or DRAM. 5-bit instructions are packed 4 per 20-bit word. Jump instructions have a 10 or 15-bit address. The 15-bit form jumps either within the current 14-bit page, or to a home page in RAM or DRAM.
           bit 20 ...15 ...10  ....5 ....0
                  slot0 slot1  slot2 slot3

   10-bit jump    slot0  jump aa aaaa aaaa
       address  p pppp pppp ppaa aaaa aaaa   (p from P register)

   14-bit jump     jump 0aa aaaa aaaa aaaa
       address  p pppp ppaa aaaa aaaa aaaa

     home jump     jump 1aa aaaa aaaa aaaa
       address  c 0c00 00aa aaaa aaaa aaaa   (c is C17)
The contents of slots 1-3 must be complemented, whether instructions or address. The 3rd jump format, home page jumps, facilitate jumping from DRAM into SRAM since the home page location may be set to DRAM or SRAM by setting memory configuration register bit 17 (c17). A full 21-bit jump requires pushing an address into R and executing the ; instruction.

If the configuration register bit c17 is set then the home page address becomes address 140000, which is high speed SRAM, if it is not set then the home page address is 0 in DRAM. The 10-bit jumps are faster than offpage jumps in DRAM and frees the first instruction slot for use by another opcode. Single cell branch instructions can cover a range of 16k words in DRAM and 8k words in SRAM, and subroutine returns move freely between SRAM and DRAM since the return stack is 21 bits wide.

The 27 instruction codes are:

00 else unconditional jump   08 @R+  fetch, address in R, increment R
01 T=0  jump if T0-19 zero   09 @A+  fetch, address in A, increment A
02 call push P+1 to R, jump  0A #    fetch 20-bit in-line literal
03 C=0  jump if T20 zero     0B @A   fetch, address in A
04                           0C !R+  store, address in R, increment R
05                           0D !A+  store, address in A, increment A
06 ret  pop P from R         0E
07                           0F !A   store, address in A

10 com  complement T         18 pop  pop R, push into T
11 2*   shift T, 0 to T0     19 A@   push A into T
12 2/   shift T, T20 to T19  1A dup  push T into T
13 +*   add S to T if T0 one 1B over push S into T
14 -or  exclusive-or S to T  1C push pop T, push into R
15 and  and S to T           1D A!   pop T into A
16                           1E nop
17 +    add S to T           1F drop pop T


Code Name  Description            As Forth (where A is a variable)

  00 else  unconditional jump                  ELSE
  01 T=0   jump if T0-19 zero                  DUP IF
  02 call  push P+1 to R, jump                 :
  03 C=0   jump if T20 zero                    CARRY? IF
  04
  05
  06 ret   pop P from R                        ;
  07
  08 @R+   fetch, address in R, increment R    R @ R> 1+ >R
  09 @A+   fetch, address in A, increment A    A @ @ 1 A +!
  0A #     fetch 20-bit in-line literal        LIT
  0B @A    fetch, address in A                 A @ @
  0C !R+   store, address in R, increment R    R ! R> 1+ >R
  0D !A+   store, address in A, increment A    A @ ! 1 A +!
  0E
  0F !A    store, address in A                 A @ !
  10 com   complement T                        -1 XOR
  11 2*    shift T, 0 to T0                    2*
  12 2/    shift T, T20 to T19                 2/
  13 +*    add S to T if T0 one                DUP 1 AND IF OVER + THEN
  14 -or   exclusive-or S to T                 XOR
  15 and   and S to T                          AND
  16
  17 +     add S to T                          +
  18 pop   pop R, push into T                  R>
  19 A@    push A into T                       A @
  1A dup   push T into T                       DUP
  1B over  push S into T                       OVER
  1C push  pop T, push into R                  >R
  1D A!    pop T into A                        A !
  1E nop                                        NOP
  1F drop  pop T                               DROP
                        Forth macros
     A! @A                                     @
     A! !A                                     !
     dup dup -or com                           -1
     dup dup -or                               0
     over com and -or                          OR
     A! push A@ pop                            SWAP
     # (com) push ;                            long_jump

Ripple Carry

The F21 CPU uses ripple carry. This makes the + and +* instructions potentially slower than other instructions (if the carry bit must propagate through more than a few bits). An add instruction must be coded " nop + " if carry needs to propagate 4 places, or " nop nop + " for carry to move 6 places, or " nop nop nop + " for carry to propagate about 8 places. This is not required if the " + " instruction is in the first instruction slot. When the instruction is the first instruction it is preceeded by the delay for loading the instruction from memory and provides enough time for ripple to carry through all 20 bits.

INTERRUPTS

An interrupt can occur when an instruction word is fetched. The requested instruction is replaced by a 15-bit call to 00000 in home page in either DRAM or SRAM. The current address will be pushed onto R when the call is executed.

At least 3 stack positions must be available (reserved) for interrupts, 2 on data and 1 on return, this is because a useful interrupt service routine will need at least two data and one return stack positions. Register A must be saved and restored in the interrupt service routine.

The cause of the interrupt is in configuration register bits 2 through 0 (C2-0). The configuration register (C) must be read to determine the interrupt source. The interrupt is cleared when C is rewritten, which may only occur once. It is intended that this code be executed at the end of interrupt processing (say for C0):

A 015554 # com A! ( pattern 1 1110 0000 0000 0000 0--1)
@A !A A! ;
The address bits A2-0 specify the interrupt(s) to be cleared. @A !A A! ; must all be in the same word. Another interrupt may occur immediately.

Interrupts are edge-triggered. If one is repeated before being cleared, it's lost.

SPEED

Up to 4 instructions are obtained with each fetch. Each instruction is executed in 2ns plus memory access time, for a maximum internal speed of 500 MIPS. Sustained rate depends upon the number of memory-access instructions. With no data memory accesses but with instruction memory setup and access:
    RAM                 DRAM            ROM
10   12    15    25       40   140        4  Memory speed in ns
200  180   160   115      80   27       333  MIPS

With 1 instruction accessing data in the same memory:
     110                  40            150  MIPS


Return to main F21 document