FPU Accumulator

This design for a better FPU block for 8/16 bit access, which also integrates 32-bit integer processing. While it's 32 bits, copying around 32-bit values with the CPU is mostly avoided.

There are 256 32-bit zeropage-like registers and one 32-bit accumulator. Any of the registers can hold useful constants (including 0, 1, etc) for reference. It's similar to the 6502 accumulator and zeropage values.

(TBD - registers in RAM or FPGA? Former allows swapping in new sets, but more complicated bus mastering, steal cpu cycles.)

Commands are issued by writing a byte to an I/O location. The location identifies the command to trigger, and the byte written is often a selector for which register number to use, sometimes a literal integer value, or ignored.

Status bits (TBD) are available to read, with various errors, and integer as well as fp Z/N/V/C flags just like the 6502. V and C are only for integer ops.

Overflow/underflow (fp/int range, as well as f2i)
Division by zero
Int conversion lost fractional portion

Set the integer fixed point location (TBD) which affects multiply, divide, and conversion with floats.

TODO - should there be separate signed & unsigned variants of integer operations (including F2I), or a mode bit for signedness? Should all integer commands respect a single signedness control flag?

Scripting

For optimization of applying a stream of mathematical ops in repetitive ways, you can save bytecodes in the FPGA registers. These are generally in the form of <op> <param> in terms of applying a write of byte param at op location <op>. This is fine for very fixed pipelines with input data always at one location. However, for general use this will require its own set of flow control and register indices to loop over sets of registers, indirect through computed register values, or receive "pointers" to a register and using offsets.

Op run <reg> and exit, although these could nest generally to support subroutines. Not sure how to handle the stack.

Control flow would be unconditional loops, with tests to exit the loop on: Accumulator == 0, or offset = val.

Option 1: Offset register

Add another 8-bit "offset" register, and an enable bit to config. Op to write, set mask, or reset mask against the config register. When enabled, all registers have an offset added to it. Op to set the offset, or add a signed 8-bit value to the offset. This allows scripting to advance through the register set.

Option 2: Indirection

Have a mode bit for indirection. Any reg access will indirect through that register, but we'll need an offset for that as well? Annoying to swap between indirect and non-indirect, unless we move to 128 regs per bank?

TODO - directly support matrix multiplication and matrix addition, with width/height registers. Should hopefully be applicable to vector ops as well. BRAM is dual-ported, so we could read 2 registers to multiply per cycle, though multiplication is probably a couple of cycles. Could instantiate a couple of multipliers to really pipeline this hard.

The given register (offset ignored!) is input A, the "matrix param" is input B, register A + offset is the output register? But then we can't iterate matrices with the output register. But we also likely can't overwrite a matrix as we compute, because we're using it as an input. Don't want to waste LUT space copying a matrix into a reserved area. But we could also use regs 0-8 as a temp 3x3 matrix? Nah, that's another copying a matrix, dummy operation. This could be a 4-byte op instead of 2: MATMUL, A, B, Result? Have 3 registers, when you write to Result it triggers it? Writing a register should capture whether it's in RAM and the register bank number.

Config registers
	7	6	5	4	3	2	1	0
Integer Control	Signed	64bit reg0	—	Fixed point frac bits (0-31)
Mode	RAM				Regnum Hi (0-1 for fpga)
Register offset	Offset
Matrix Size	Rows				Columns
Offset Enable	All non-matrix	—	—	—	—	Result	MATB	MATA
Bitmask	Mask, for setting/resetting
RAM Start	24-bit address? Or in multiples of 4kB banks?

Shared Operations
Name	Parameter	Description
LOAD	Reg	A = Reg
STORE	Reg	Reg = A
SWAP	Reg	(A, Reg) = (Reg, A)
LEXTRA	Ignored	A = Extra
MASKW	Reg	Write mask to register
MASKS	Reg	Set mask bits in register
MASKC	Reg	Clear mask bits in register

Floating Point Operations
Cmd #	Name	Parameter	Description
	FADD	Reg	A = A + Reg
	FSUB	Reg	A = A - Reg
	FRSUB	Reg	A = Reg - A
	FMUL	Reg	A = A * Reg
	FDIV	Reg	A = A / Reg
	FRDIV	Reg	A = Reg / A
	F2I	Signed flag	A = Int(A) or UInt(A)
	FMIN	Reg	A = Min(A, Reg)
	FMAX	Reg	A = Max(A, Reg)
	FABS	Ignored	A = Abs(A)
Maybe
	FADDB	Byte	A = A + Val
	FSUBB	Byte	A = A - Val
	FMULB	Byte	A = A * Val
	FDIVB	Byte	A = A / Val
	FCLR	Ignored	A = 0.0 (LOAD instead)
	FNEG	Ignored	A = -A (FRSUB instead)
	FRECIP	Ignored	A = 1.0/A (FRDIV instead)
	FPOW	Reg	A = A ^ Reg
	FSQRT	Ignored	A = Sqrt(A)
	FTRIG	0	A = Sin(A)
		1	A = Cos(A)
		2	A = Tan(A)
	FATAN	Reg	A = Atan(A, Reg)
			A = Sec(A)
			A = Csc(A)
			A = Cot(A)
	FACOT	Reg	A = Arccot(A, Reg)
			A = Arcsin(A)
			A = Arccos(A)
	FFLOOR	Reg	A = Floor(A), Reg = Frac(A) (options?)
	FCEIL	Reg	A = Ceil(A), Reg = Frac(A) (options?)

Matrix Operations
Name	Parameter	Description
MATA	Reg	Set param A
MATB	Reg	Set param B
MATMUL	Reg	Reg = MATA * MATB
MATADD	Reg	Reg = MATA + MATB

Integer Operations
Name	Parameter	Description
IADD	Reg	A = A + Reg
IADDC	Reg	A = A + Reg + C
ISUB	Reg	A = A - Reg
ISUBC	Reg	A = A - Reg - !C
IRSUB	Reg	A = Reg - A
ISETC	0 or 1	C = Value
IMUL	Reg	A = A * Reg (TBD - 64-bit result? Could auto-write Reg 0?)
IDIV	Reg	A = A / Reg (TBD - combined DIVMOD? 64/32 DIV?)
IMOD	Reg	A = A % Reg
ICMP	Reg	Status = A - Reg
INEG	0	A = -A
IAND	Reg	A = A & Reg
IOR	Reg	A = A \| Reg
IXOR	Reg	A = A ^ Reg
I2F	Signed Flag	A = Float(Int(A)) or Float(UInt(A))
ISHL	Count	A = A << Value
ISHR	Count	A = A >> Value
ISSHR	Count	A = A >>(signed) Value
IROLL	Count	A = (A << Count) \| (A >> (32 - Count))
IUMIN	Reg	A = Min(A, Reg)
ISMIN	Reg	A = Min(A, Reg)
IUMAX	Reg	A = Max(A, Reg)
ISMAX	Reg	A = Max(A, Reg)
IABS	Ignored	A = Abs(A)
ISET0	Byte	Set byte 0 of accumulator
ISET1	Byte	Set byte 1 of accumulator
ISET2	Byte	Set byte 2 of accumulator
ISET3	Byte	Set Byte 3 of accumulator

FPU Accumulator

Scripting

Option 1: Offset register

Option 2: Indirection

Navigation menu

Page actions

Page actions

Personal tools

Navigation

Search

Tools