FPU Accumulator

From Foenix F256 / Wildbits/K2 Wiki
Revision as of 03:31, 8 March 2026 by WF (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

This design for a better FPU block for 8/16 bit access, which also integrates 32-bit integer processing. While it's 32 bits, copying around 32-bit values with the CPU is mostly avoided.

There are 256 32-bit zeropage-like registers and one 32-bit accumulator. Any of the registers can hold useful constants (including 0, 1, etc) for reference. It's similar to the 6502 accumulator and zeropage values.

(TBD - registers in RAM or FPGA? Former allows swapping in new sets, but more complicated bus mastering, steal cpu cycles.)

Commands are issued by writing a byte to an I/O location. The location identifies the command to trigger, and the byte written is often a selector for which register number to use, sometimes a literal integer value, or ignored.

Status bits (TBD) are available to read, with various errors, and integer as well as fp Z/N/V/C flags just like the 6502. V and C are only for integer ops.

  • Overflow/underflow (fp/int range, as well as f2i)
  • Division by zero
  • Int conversion lost fractional portion

Set the integer fixed point location (TBD) which affects multiply, divide, and conversion with floats.

TODO - should there be separate signed & unsigned variants of integer operations (including F2I), or a mode bit for signedness? Should all integer commands respect a single signedness control flag?

Scripting

For optimization of applying a stream of mathematical ops in repetitive ways, you can save bytecodes in the FPGA registers. These are generally in the form of <op> <param> in terms of applying a write of byte param at op location <op>. This is fine for very fixed pipelines with input data always at one location. However, for general use this will require its own set of flow control and register indices to loop over sets of registers, indirect through computed register values, or receive "pointers" to a register and using offsets.

Op run <reg> and exit, although these could nest generally to support subroutines. Not sure how to handle the stack.

Control flow would be unconditional loops, with tests to exit the loop on: Accumulator == 0, or offset = val.

Option 1: Offset register

Add another 8-bit "offset" register, and an enable bit to config. Op to write, set mask, or reset mask against the config register. When enabled, all registers have an offset added to it. Op to set the offset, or add a signed 8-bit value to the offset. This allows scripting to advance through the register set.

Option 2: Indirection

Have a mode bit for indirection. Any reg access will indirect through that register, but we'll need an offset for that as well? Annoying to swap between indirect and non-indirect, unless we move to 128 regs per bank?


TODO - directly support matrix multiplication and matrix addition, with width/height registers. Should hopefully be applicable to vector ops as well. BRAM is dual-ported, so we could read 2 registers to multiply per cycle, though multiplication is probably a couple of cycles. Could instantiate a couple of multipliers to really pipeline this hard.

The given register (offset ignored!) is input A, the "matrix param" is input B, register A + offset is the output register? But then we can't iterate matrices with the output register. But we also likely can't overwrite a matrix as we compute, because we're using it as an input. Don't want to waste LUT space copying a matrix into a reserved area. But we could also use regs 0-8 as a temp 3x3 matrix? Nah, that's another copying a matrix, dummy operation. This could be a 4-byte op instead of 2: MATMUL, A, B, Result? Have 3 registers, when you write to Result it triggers it? Writing a register should capture whether it's in RAM and the register bank number.

Config registers
7 6 5 4 3 2 1 0
Integer Control Signed 64bit reg0 Fixed point frac bits (0-31)
Mode RAM Regnum Hi (0-1 for fpga)
Register offset Offset
Matrix Size Rows Columns
Offset Enable All non-matrix Result MATB MATA
Bitmask Mask, for setting/resetting
RAM Start 24-bit address? Or in multiples of 4kB banks?
Shared Operations
Cmd # Name Parameter Description
LOAD Reg A = Reg
STORE Reg Reg = A
SWAP Reg (A, Reg) = (Reg, A)
LEXTRA Ignored A = Extra
MASKW Reg Write mask to register
MASKS Reg Set mask bits in register
MASKC Reg Clear mask bits in register
Floating Point Operations
Cmd # Name Parameter Description
FADD Reg A = A + Reg
FSUB Reg A = A - Reg
FRSUB Reg A = Reg - A
FMUL Reg A = A * Reg
FDIV Reg A = A / Reg
FRDIV Reg A = Reg / A
F2I Signed flag A = Int(A) or UInt(A)
FMIN Reg A = Min(A, Reg)
FMAX Reg A = Max(A, Reg)
FABS Ignored A = Abs(A)
Maybe
FADDB Byte A = A + Val
FSUBB Byte A = A - Val
FMULB Byte A = A * Val
FDIVB Byte A = A / Val
FCLR Ignored A = 0.0 (LOAD instead)
FNEG Ignored A = -A (FRSUB instead)
FRECIP Ignored A = 1.0/A (FRDIV instead)
FPOW Reg A = A ^ Reg
FSQRT Ignored A = Sqrt(A)
FTRIG 0 A = Sin(A)
1 A = Cos(A)
2 A = Tan(A)
FATAN Reg A = Atan(A, Reg)
A = Sec(A)
A = Csc(A)
A = Cot(A)
FACOT Reg A = Arccot(A, Reg)
A = Arcsin(A)
A = Arccos(A)
FFLOOR Reg A = Floor(A), Reg = Frac(A) (options?)
FCEIL Reg A = Ceil(A), Reg = Frac(A) (options?)
Matrix Operations
Cmd # Name Parameter Description
MATA Reg Set param A
MATB Reg Set param B
MATMUL Reg Reg = MATA * MATB
MATADD Reg Reg = MATA + MATB
Integer Operations
Cmd # Name Parameter Description
IADD Reg A = A + Reg
IADDC Reg A = A + Reg + C
ISUB Reg A = A - Reg
ISUBC Reg A = A - Reg - !C
IRSUB Reg A = Reg - A
ISETC 0 or 1 C = Value
IMUL Reg A = A * Reg (TBD - 64-bit result? Could auto-write Reg 0?)
IDIV Reg A = A / Reg (TBD - combined DIVMOD? 64/32 DIV?)
IMOD Reg A = A % Reg
ICMP Reg Status = A - Reg
INEG 0 A = -A
IAND Reg A = A & Reg
IOR Reg A = A | Reg
IXOR Reg A = A ^ Reg
I2F Signed Flag A = Float(Int(A)) or Float(UInt(A))
ISHL Count A = A << Value
ISHR Count A = A >> Value
ISSHR Count A = A >>(signed) Value
IROLL Count A = (A << Count) | (A >> (32 - Count))
IUMIN Reg A = Min(A, Reg)
ISMIN Reg A = Min(A, Reg)
IUMAX Reg A = Max(A, Reg)
ISMAX Reg A = Max(A, Reg)
IABS Ignored A = Abs(A)
ISET0 Byte Set byte 0 of accumulator
ISET1 Byte Set byte 1 of accumulator
ISET2 Byte Set byte 2 of accumulator
ISET3 Byte Set Byte 3 of accumulator