WF16 Video Architecture: Difference between revisions

From Foenix F256 / Wildbits/K2 Wiki
Jump to navigationJump to search
Line 608: Line 608:
| colspan="5" |Palette (0-31)
| colspan="5" |Palette (0-31)
|}
|}
SNES uses 2 bits per sprite packed in another table, so 4 bytes + 2 bits per sprite. 2 sizes are globally selectable, and the bit sets which to use.
SNES has 2 sprite sizes globally selectable, and the per-sprite bit sets which to use.


One Neo Geo sprite is a tower of up to 32 tiles (first 2 words above), which makes the attribute size that large: 2 words × 32 tiles, plus 3 other attribute/position words.
One Neo Geo sprite is a tower of up to 32 tiles (first 2 words above), which makes the attribute size that large: 2 words × 32 tiles, plus 3 other attribute/position words.


Neo Geo "VRAM" is fixed use only, for tons of sprite registers & the fixed char overlay. Pixel data are in ROMs, so I consider this N/A.
"ROM is listed for "VRAM" when there's no RAM for pixel data, and that's stored in ROM. VRAM tends to describe register tables and the char matrix in these platforms.


CPS-1 sprite tiles are all 16x16 pixels.
CPS-1 sprite tiles are all 16x16 pixels.

Revision as of 23:44, 3 February 2026

Global settings

CRT emulation, only for low resolution layers.

640×480 for 4:3 output, or 960×540 for 16:9 output, if bandwidth can run it. Requires 1.6875× the bandwidth.

Non-integer pixel aspect flags, again only for low res layers. Match 320×200 and 256×200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz

Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640×400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.

Global scroll register, for setting where 0,0 is on the display. This might also change how the raster lines are counted, going from say -100 to 380 instead of 0 to 480, to line up with borders and such.

Let the mouse pointer pick a CLUT, instead of being locked to grayscale. Or have its own dedicated 16-color one.

Palettes

Reduce from 24-bit to 16-bit, better suited to 65816, and makes a lot of addressing simpler.

5-5-5-1 masked, or 4-4-4-4 RGBA? Leaning towards the latter. Using transparency 0=opaque, 15=fully transparent is probably easier.

Have a FPGA block which separates & combines 4 values into 1, all R/W registers, avoid all the shifting. Include signed clamping when converting to the single RGB word.

Rougher ideas

Palettes are always 4-4-4-4, but direct color 5-5-5-1 or 4-4-4-4 can be used for bitmap layers? Probably best to keep it the same, but given clear displays today, the 5-5-5 would look better for full-color backgrounds.

4bpp Palette Configurations
Entries Depth Palettes Tiles Sprites Notes
SNES 256 555 16×16 0-7 8-15
Genesis 64 333 4×16 0-3 0-3
TG16 512 333 32×16 0-15 16-31
CX16 256 444 16×16 0-15 0-15
Neo Geo 4096 444 256×16 0-255 Everything is sprites
Commando 256 444 16×16 0-7 8-15
CPS-1 3072 4444 192×16 0-31 + layer 0-31 4bit brightness, each layer/type has its own 32 palettes

HDMA

After a line has been rendered to the line buffer, run the HDMA list. If the HDMA list is done, then trigger the EOL interrupt if enabled. The timing that a line & HDMA takes is dynamic. The EOL should always be fired for a line even if there is no time left.

Ideally, if the NMI line could be connected, the EOL interrupt can be dedicated there for minimizing latency.

At first, the HDMA list should contain a line number to wait for, a count, and number of address/data pairs in video IO space to write, and whether to fire an interrupt.

Advanced features would be to load a value from a table, offset by the raster number. And run the HDMA (with optional interrupt) every line until the target line is reached.

There could be some BRAM dedicated to HDMA use or other on-chip variable storage. Would free up the bus, and take smaller indexing for where to copy from. Also, could set some state vars.

The rasterline would likely be the graphics line, not the hi-res line but this could also be an option.

The HDMA "program counter" is visible and editable, and can be safely modified after EOL interrupt. If it's $000000, then it's disabled? Should HDMA lists be on-chip? Have HDMA variables on-chip, referred to by the copies by small index?

This has the effect of externalizing and obsoleting the rasterline interrupt registers, as the HDMA could just fire them on a list of lines instead, without running any actual DMA.

SNES sets up the destination as part of the HDMA config, then each line only has the data to write. Saves cycles, and is probably reasonable to implement, given what these tend to be used for. Examples at https://snes.nesdev.org/wiki/HDMA_examples

While it might be a lot for HDMA, DMA'ing in a chunk of sprite registers from SRAM would be nice. This might be something for normal DMA during VBLANK.

Line Buffers

Use 36bit BRAMs, storing 3 12-bit pixels per word. This is the final output buffer, no alpha necessary.

2 line buffers would be used for CRT emulations, or 3 if we need a work one. 320px wide / 3 = 107 words per line, 321 words total (3 linebuffers), 11,556 bits.

Layers

Every layer def has these:

  • Type (tile, bitmap, RLE, sprite?)
  • Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
  • x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these. This includes scrolling sprite layers
  • CLUT selection
  • Bit depth?
  • Clip window?
  • Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
  • High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p

64-byte cache per layer, for retaining sprite/tile graphics for reuse (more if bpp is smaller), or a readahead buffer for DDR3 for RLE and bitmap modes, which don't reuse anything anyway

No-overdraw bandwidth reduction

Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is transparent and dispatches deeper, and 16-bit wide reads where not all the bits are used.

Tilemap data is probably fetched regardless if it's in DDR3. For SRAM, just grabbing & caching the current tile is fine. Tile 0 assumed blank means avoiding fetching any of those pixels.

Alpha Transparency

A pixel entry in the line buffer contains a 12-bit ARGB value, and goes through potentially 3 states: Uninitialized, holding transparent, opaque, with holding transparent being optional.

Action→Next State
Init State Empty Pixel Transparent Pixel Opaque Pixel
Uninit No-op → Uninit Set → Transparent Set → Opaque
Transparent No-op → Transparent Blend (or no-op) → Transparent Blend → Opaque

When transitioning to opaque, the pixel is complete. The common case is uninit to opaque, and shouldn't be done with extra clock cycles, unless the blend is constant and pipelined.

For simplicity, the 'alpha' is actually a transparency channel, with 0 = opaque, and 15 = fully transparent. A layer can override transparency, or any individual palette entry can have non-zero transparency. For most cases, a pixel index value of 0 is a skipped pixel and the equivalent of $F000 (fully transparent black).

Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.

TODO

Buffering alpha sprites with no-overdraw is harder, probably needs its own linebuffer. Both the alpha and opaque layers need to track their own layer depth per pixel when merging together, as maybe only the alpha is visible but not the further back opaque sprite. Simplifications would be that alpha sprites don't show any sprites underneath, only tiles, but that's lame. Another would be that alpha sprites that overlap other sprites always combine with sprites and can't have tiles underneath. That is lame as well. The only real solution is to have the f2b trace through the sprite priority layers instead of having a single combined sprite line buffer, or split out sprite transparency into its own line buffer, with each pixel having its own priority number in both sprite line buffers. Or just support alpha transparency in tile/bitmap/rle layers and not sprites. Again lame, but probably the most workable solution. A small bitmap layer basically acts like a sprite anyway.

Oh, I guess we need an alpha mode as well. Averaging, brightening, dimming, threshold, gel, etc.

Indexed Bit Depth

Currently, everything is 8bpp, which is high bandwidth, more work to create artwork, and doesn't port as easily from other systems that use multiple palette swaps onscreen.

For tiles, sprites, and bitmaps, choose 1/2/4/8 bpp. Direct color 16-bit bitmaps would be separate from paletted bitmaps.

Each layer can select a CLUT. Each sprite or tile points to either a starting palette entry, or maybe a bitmask to OR into it. Pixels that are affected by local color would have those bits be 0, while static colors have those bits all 1. This can drastically reduce the potential colors available, though. If you have 8 shades of the selected color, that means only 32 static colors in the rest. Very wasteful, but functional.

Bitmaps

Option to wrap. Else, it shows blank pixels outside its range. Divmod can be done once per line.

Size x/y pixels

Small bitmaps with wrapping could make for easy wipes and parallax effects, covering the entire screen with a pattern.

During render, recompute the starting address to allow raster effects including Y stretch/squash.

1/2/4/8 bpp for indexed bitmaps, option for 4-4-4-4 or 5-5-5-1 for direct color bitmaps.

DDR3 suits bitmaps the best, ensure that gets supported. A 640×480×8bpp (+307,200 $4B000 bytes) single layer mode would also be nice for GUI and PC game ports, especially with blitter support. Scrolling can still be supported, maybe more than 1 layer might be doable as well.

Tiles

Layer: Base pointer = Tilemap pointer, add tilegfx pointer

Map size x/y tiles

Option to wrap.

By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway.

For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.

Tile attributes of various platforms
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SNES VFlip HFlip Priority Palette (0-7) Tile (0-1023)
Genesis Priority Palette (0-3) VFlip HFlip Tile (0-2047)
TG16 Palette (0-15) Tile (0-4095)
CX16 Palette (0-15) VFlip HFlip Tile (0-1023)
Neo Geo attr Palette (0-255) Tile MSB (0-1,048,575) Auto-anim 3b Auto-anim 2b VFlip HFlip
Commando Tile MSB (0-1023) VFlip HFlip Palette (0-15) Tile LSB
CPS-1 attr Priority VFlip HFlip Palette (0-31)

Neo Geo and CPS-1 uses 32 bits per tile. 16-bit LSB tile number is excluded, 16-bit attributes (above).

Tilegfx

These are self-describing, with a header that takes up tile 0's pixel space? Having tile $FFFF be transparent actually would be easier for tileset creation, because that would always exist and take no memory or extra flags.

bpp (1,2,4,8), size (8×8, 16×16 (32×32, 64×64?)), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline).

1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid).

I've never seen a platform with the Foenix's notion of multiple tilesets addressable from a single tilemap. I'm not sure that's really ever used anywhere either. It's interesting, but not sure how useful it would be to keep, vs the >8bit tile indices above. The only time it would be specifically useful if 2 tile layers share same partial tileset, but have additional differences. I don't really see a super pressing need for that. I also think that the size, bpp, etc should be part of the tilemap/layer definition, not separated off into its own table of tilesets.

Rougher ideas

Neo Geo has auto-animating tiles/sprites. 4 or 8 tiles in a row in the tileset can be cycled through for animation. (cycling through low 2 or 3 bits of tile index). Global or layer-specific config for how may frames per step.

Meta-tilemap mode, so the tilemap holds entries that are 2×2 or 4×4 hardware tilemap entries.

Priorities aren't all that often use, but might be a good feature to have, especially to save on tilemap layers and scads of empty entries in those additional layers. Place the "background" in front of the sprite layer, with the tile either high priority drawing, or deferring to the sprite layer below it. If the priority is low, it passes the color to the sprite layer, and if the sprite layer has no pixel, it draws that bg pixel. If the tile pixel is transparent, then transparent behavior applies regardless of priority and nothing is different.

A CPS-1 style priority bit setting (16-bit, priority per color in that tile) might be interesting, but I'd say a half-and-half mode (lower 8 colors are lower priority, higher 8 colors are higher priority), or higher 4 colors etc, might be more compact and useful. It puts more constraints on the colors, but I would think certain foreground objects might have their own colors anyway?

Sprites

H-flip at the very minimum. If V-flip and 90° (since all sprites/tiles are square sized), then all 8 orientations and flips are possible. Rotation is only available in SRAM.

16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word. 3 flip bits means 8192 sprite images.

Sprite sheet mode (kinda like square for tilesets), with a declared stride.

Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Note that both of these could be expressed by OR'ing a mask on top of the color.

Need to figure out something for 8bpp color, to have palette swaps for limited ranges. The layer could have a color range/mask where all colors less than that use the sprite color. But this can probably be done with just an OR mask (or XOR for fun?). 8bpp sprites would normally have a 0 'color', palette swaps would leave a range of bits open and use the individual sprite color to set that, but we don't want that done on all bits, only those that are in "palette range", hence the layer setting. Maybe also only when the color is nonzero? or non-FF? Or if the color is 0-16, then it has 16 color selections as a 4bpp color? Colors 17-255 would be normal. Use the top bit of the color byte to select this coloring mode.

8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24×24) TG16 had 64×64 and did interesting pseudo-layer and huge enemy stuff with it.

1,2,4,8 bpp (2 bit selection)

(or should bpp & size be for the layer? might make for simpler implementation, but varying sprite sizes are probably good. Bpp might still be a consideration for layer config

Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.

Sprite definitions of various platforms
VRAM Num Bytes 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SNES 64kB 128 Y pos (0-255) X Pos (0-255)
VFlip HFlip Priority Palette (0-7) Tile (0-511)
(packed with other sprite x msb & size) X Pos MSB Size
Genesis 64kB 80 8 Y Pos (0-1023) -256
Width (1-4)×8 Height (1-4)×8 Link (next sprite in draw order)
Priority Palette (0-3) VFlip HFlip Tile (0-2047)
X Pos (0-511) -128
TG16 64kB 64 8 Y Pos (0-1023) relative
X Pos (0-1023) relative
Tile (0-2047), address = tile×32 regardless of sprite size
VFlip Height (16,32,64) HFlip Width (16,32) Priority Palette (0-15)
CX16 128kB 128 8 8bpp Tile (0-4095), address = tile×32 regardless of sprite size
X Pos (0-511)
Y Pos (0-511)
Height (8,16,32,64) Width (8,16,32,64) Palette (0-15) Collision mask Priority VFlip HFlip
Neo Geo ROM 380 134 Tile LSB
Palette (0-255) Tile MSB (0-1,048,575) Auto-anim 3b Auto-anim 2b VFlip HFlip
HShrink (15-0) VShrink (255-0)
Y Pos (0-511) Sticky Height in tiles (0-32, 33 = special wrap mode)
X Pos (0-511)
Commando ROM 128 4 Tile MSB (0-2047) X Pos MSB HFlip Palette (0-7) Tile LSB
X Pos (0-255) Y Pos (0-255)
CPS-1 ROM 256 8 X Pos (0-65535?)
Y Pos (0-65535?)
Tile (0-65535)
Height (in sprites) Width (in sprites) YFlip XFlip Palette (0-31)

SNES has 2 sprite sizes globally selectable, and the per-sprite bit sets which to use.

One Neo Geo sprite is a tower of up to 32 tiles (first 2 words above), which makes the attribute size that large: 2 words × 32 tiles, plus 3 other attribute/position words.

"ROM is listed for "VRAM" when there's no RAM for pixel data, and that's stored in ROM. VRAM tends to describe register tables and the char matrix in these platforms.

CPS-1 sprite tiles are all 16x16 pixels.

Rougher ideas

Are all sprites independently defined and layered? Are they all from the same base pointer? Would need multiple layer definitions to give multiple base pointers, and that might be a good idea? Each layer involved could take 64 sprites out of 256, max 4 sprite "layers". Maybe that's not something that should be layer-based, but global to the sprite system. 4 base pointers, 4 groups of 64 sprites, 4 hardware elements scanning the sprite registers in parallel.

Unlimited height sprites? fixed width

8bpp color register could be used to bank a subset of colors, maybe a color range (0-7) can be cycled while others are fixed. Think of Age of Empires 1 recoloring for instance. Or just use that to select a CLUT, overriding the layer's selection.

Select the color to be transparent? If using a fixed smaller palette, like DB16/32, then each sprite could pick a different one. Pico-8 has a 16-bit mask for which colors to include or not, which is interesting.

Cut-out sprites wouldn't display, but would clear any pixel from sprites above it, allowing sprites below to show through. Or, it could skip the sprite immediately below if it has a pixel, masking a single sprite, which might be easier to implement.

Figure out sprite zooming. No rotation,just scaling, not inverting with this? Can grow or shrink independently in x & y. Maybe bresenham? Probably want sub-pixel accuracy, 16-bit with fixed point? Or full 32-bit fixed? x1/x2/y1/y2 dest rectangle maybe?

RLE

Layer has a height (width is undefined).

The base pointer points to an array of height× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an non-demarcated concatenation of tags:

0lllllll cccccccclength + color pair, color 0 is typically transparent

1lllllll c...length + literal pixels.

Uses:

  • Wipes
  • Solid color borders
  • Raster bars without interrupts
  • SNES-style explosion or light ray effects (esp with transparency)
  • Compressed cel style images or animations
  • Polygon fiilling, especially with multiple layers instead of merging spans into 1 layer
  • Cheap enough to run in hi-res?