WF16 Video Architecture: Difference between revisions

From Foenix F256 / Wildbits/K2 Wiki
Jump to navigationJump to search
 
(13 intermediate revisions by the same user not shown)
Line 2: Line 2:
CRT emulation, only for low resolution layers.
CRT emulation, only for low resolution layers.


640×480 for 4:3 output, or 960×540 for 16:9 output, if bandwidth can run it. Requires 1.6875× the bandwidth.
640×480 for 4:3 output, or 960×540 for 16:9 output, if bandwidth can run it. Requires 1.6875× the bandwidth, which should on its own be feasible.
 
{| class="wikitable"
Non-integer pixel aspect flags, again only for low res layers. Match 320×200 and 256×200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz
!
!Dot clock
!HDMI Serializer
|-
|640x480
|25.175 MHz
|125.875 MHz
|-
|960x540
|37.125 MHz
|186.625 MHz
|}
Non-integer pixel aspect flags, again only for low res layers. Match 320×200 and 256×200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz.


Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640×400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.
Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640×400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.
Line 26: Line 38:
|+4bpp Palette Configurations
|+4bpp Palette Configurations
!
!
!Colors
!Entries
!Depth
!Palettes
!Palettes
!Depth
!Tiles
!Sprites
!Sprites
!Tiles
!Notes
|-
|-
|'''SNES'''
|'''SNES'''
|256
|256
|555
|16×16
|16×16
|RGB 555
|0-7
|0-7
|8-15
|8-15
|
|-
|-
|'''Genesis'''
|'''Genesis'''
|64
|64
|333
|4×16
|4×16
|RGB 333
|0-3
|0-3
|0-3
|0-3
|
|-
|-
|'''TG16'''
|'''TG16'''
|512
|512
|333
|32×16
|32×16
|RGB 333
|0-15
|0-15
|16-32
|16-31
|
|-
|'''F256'''
|1024
|888
|4×256
|0-3
|0-3
|No 4bpp, this is all 8bpp
|-
|-
|'''CX16'''
|'''CX16'''
|256
|256
|444
|16×16
|16×16
|RGB 444
|0-15
|0-15
|0-15
|0-15
|
|-
|-
|'''Neo Geo'''
|'''Neo Geo'''
|4096
|4096
|666
|256×16
|256×16
|RGB 444
|
|0-255
|0-255
|
|Everything is sprites. RGB channels share low bit, 5551 = 16 bits
|-
|-
|'''Commando'''
|'''Commando'''
|256
|256
|444
|16×16
|16×16
|RGB 444
|0-7
|0-15
|8-15
|0-15
|
|-
|-
|'''CPS-1'''
|'''CPS-1'''
|4096
|3072
|256×16
|4444
|RGBB 4444
|192×16
|0-255
|0-31 + layer
|0-255
|0-31
|4bit brightness, each layer/type has its own 32 palettes
|}
|}
RGBB = additional brightness channel
== HDMA ==
== HDMA ==
After a line has been rendered to the line buffer, run the HDMA list. If the HDMA list is done, then trigger the EOL interrupt if enabled. The timing that a line & HDMA takes is dynamic. The EOL should always be fired for a line even if there is no time left.
After a line has been rendered to the line buffer, run the HDMA list. If the HDMA list is done, then trigger the EOL interrupt if enabled. The timing that a line & HDMA takes is dynamic. The EOL should always be fired for a line even if there is no time left.
Line 112: Line 138:
Every layer def has these:
Every layer def has these:


* Type
* Type (tile, bitmap, RLE, sprite?)
* Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
* Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
* x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these
* x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these. This includes scrolling sprite layers
* CLUT selection
* CLUT selection
* Bit depth?
* Bit depth?
* Clip window?
* Clip window?
* Masking enable & layer target
* Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
* Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
* High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p
* High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p
Line 152: Line 179:


Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.
Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.
=== Masking Layers ===
Any layer can be used as a mask, to clip out layers underneath it. Wherever a solid pixel would be drawn, this would cause the pixel rendering to skip down to a layer lower in the stack. RLE layers would probably be the best for this, but any layer can be used this way to stencil out graphics. Instead of just masking out the immediate layer below it, it can choose which layer to skip to, so it can mask out an entire consecutive stack of layers.


'''TODO'''
'''TODO'''
Line 187: Line 217:


By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway.
By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway.
Need to ensure that tile layers are max 65536 pixels wide/tall, since that's what the scroll layers deal with.


For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.
For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.
Line 227: Line 259:
| colspan="4" |Palette (0-15)
| colspan="4" |Palette (0-15)
| colspan="12" |Tile (0-4095)
| colspan="12" |Tile (0-4095)
|-
|'''F256'''
|—
|—
|—
| colspan="2" |Palette (0-3)
| colspan="3" |Tileset (0-7)
| colspan="8" |Tile (0-255)
|-
|-
|'''CX16'''
|'''CX16'''
Line 233: Line 273:
|HFlip
|HFlip
| colspan="10" |Tile (0-1023)
| colspan="10" |Tile (0-1023)
|-
|'''Commando'''
| colspan="2" |Tile MSB (0-1023)
|VFlip
|HFlip
| colspan="4" |Palette (0-15)
| colspan="8" |Tile LSB
|-
|-
|'''Neo Geo attr'''
|'''Neo Geo attr'''
Line 248: Line 281:
|VFlip
|VFlip
|HFlip
|HFlip
|-
|'''Commando'''
| colspan="2" |Tile MSB (0-1023)
|VFlip
|HFlip
| colspan="4" |Palette (0-15)
| colspan="8" |Tile LSB
|-
|-
|'''CPS-1 attr'''
|'''CPS-1 attr'''
Line 267: Line 307:
These are self-describing, with a header that takes up tile 0's pixel space? Having tile $FFFF be transparent actually would be easier for tileset creation, because that would always exist and take no memory or extra flags.
These are self-describing, with a header that takes up tile 0's pixel space? Having tile $FFFF be transparent actually would be easier for tileset creation, because that would always exist and take no memory or extra flags.


bpp (1,2,4,8), size (8×8, 16×16), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline).
bpp (1,2,4,8), size (8×8, 16×16 (32×32, 64×64?)), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline).


1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid).
1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid).
Line 301: Line 341:


Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.
Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.
I do like the genesis idea of having a linked list of active sprites, as it can save time scanning the active sprites and finishing early. Question is should there be multiple sprite layers, each with their own list head, or one sprite list with each sprite having its depth independently set with its priority bits?
For transparency, we probably need 2 sprite linebuffers, one for the transparent part, and one for the solid part. Each holds its own layer in the upper bits. 18 bit entries = 16-bit 4444 RGBA, plus 2 bits layer selection. Now, there's nothing for enabling/disabling the pixel, though. The transparency layer could be enabled by alpha channel, and because of stacked blending of transparent requires the actual full color value. The solid layer could do clut + 8bpp + enable bit, for 13 bits total, or 12bpp plus 2bpp layer plus 1bbp enable, for 15 bits total, since we know that's not transparent. We probably want a flag for the layer/global as to whether sprite transparency stacks or not, since enabling it will blend deeper transparent sprite pixels from being obscured to the topmost transparency layer.
Sprite register should probably be a structure of arrays, so you can wholesale copy x/y locations faster than copying literally the entire sprite definition block every frame. Individual sprite updates for color, frame, etc can be set during animation handling, and directly set the sprite register instead of that needing to be part of the update loop. Probably have an 8-bit register access mode (high bytes & low bytes in different areas, for 256 sprites) vs 16-bit register access mode (for 65816 16-bit copies with indices from 0-510).
{| class="wikitable"
|+Sprite definitions of various platforms
!
!VRAM
!Num
!Bytes
!15
!14
!13
!12
!11
!10
!9
!8
!7
!6
!5
!4
!3
!2
!1
!0
|-
|'''SNES'''
|64kB
|128
|4¼
| colspan="8" |Y pos (0-255)
| colspan="8" |X Pos (0-255)
|-
|
|
|
|
|VFlip
|HFlip
| colspan="2" |Priority
| colspan="3" |Palette (0-7)
| colspan="9" |Tile (0-511)
|-
|
|
|
|
| colspan="14" |(packed with other sprite x msb & size)
|X Pos MSB
|Size
|-
| colspan="20" |
|-
|'''Genesis'''
|64kB
|80
|8
|—
|—
|—
|—
|—
|—
| colspan="10" |Y Pos (0-1023) -256
|-
|
|
|
|
|—
|—
|—
|—
| colspan="2" |Width (1-4)×8
| colspan="2" |Height (1-4)×8
|—
| colspan="7" |Link (next sprite in draw order)
|-
|
|
|
|
|Priority
| colspan="2" |Palette (0-3)
|VFlip
|HFlip
| colspan="11" |Tile (0-2047)
|-
|
|
|
|
|—
|—
|—
|—
|—
|—
|—
| colspan="9" |X Pos (0-511) -128
|-
| colspan="20" |
|-
|'''TG16'''
|64kB
|64
|8
|—
|—
|—
|—
|—
|—
| colspan="10" |Y Pos (0-1023) relative
|-
|
|
|
|
|—
|—
|—
|—
|—
|—
| colspan="10" |X Pos (0-1023) relative
|-
|
|
|
|
|—
|—
|—
|—
|—
| colspan="11" |Tile (0-2047), address = tile×32 regardless of sprite size
|-
|
|
|
|
|VFlip
|—
| colspan="2" |Height (16,32,64)
|HFlip
|—
|—
|Width (16,32)
|Priority
|—
|—
|—
| colspan="4" |Palette (0-15)
|-
| colspan="20" |
|-
|'''F256'''
|2MB
|64
|6
| colspan="8" |Tile Address MSB
|—
| colspan="2" |Size
| colspan="2" |Priority
| colspan="2" |Palette (0-3)
|Enable
|-
|
|
|
|
| colspan="16" |Tile Address LSB
|-
|
|
|
|
| colspan="16" |X Pos (0-65535)
|-
|
|
|
|
| colspan="16" |Y Pos (0-65535)
|-
| colspan="20" |
|-
|'''CX16'''
|128kB
|128
|8
|8bpp
|—
|—
| colspan="13" |Tile (0-4095), address = tile×32 regardless of sprite size
|-
|
|
|
|
|—
|—
|—
|—
|—
|—
| colspan="10" |X Pos (0-511)
|-
|
|
|
|
|—
|—
|—
|—
|—
|—
| colspan="10" |Y Pos (0-511)
|-
|
|
|
|
| colspan="2" |Height (8,16,32,64)
| colspan="2" |Width (8,16,32,64)
| colspan="4" |Palette (0-15)
| colspan="4" |Collision mask
| colspan="2" |Priority
|VFlip
|HFlip
|-
| colspan="20" |
|-
|'''Neo Geo'''
|ROM
|380
|134
| colspan="16" |Tile LSB
|-
|
|
|
|
| colspan="8" |Palette (0-255)
| colspan="4" |Tile MSB (0-1,048,575)
|Auto-anim 3b
|Auto-anim 2b
|VFlip
|HFlip
|-
|
|
|
|
|—
|—
|—
|—
| colspan="4" |HShrink (15-0)
| colspan="8" |VShrink (255-0)
|-
|
|
|
|
| colspan="9" |Y Pos (0-511)
|Sticky
| colspan="6" |Height in tiles (0-32, 33 = special wrap mode)
|-
|
|
|
|
| colspan="9" |X Pos (0-511)
|—
|—
|—
|—
|—
|—
|—
|-
| colspan="20" |
|-
|'''Commando'''
|ROM
|128
|4
| colspan="3" |Tile MSB (0-2047)
|X Pos MSB
|HFlip
| colspan="3" |Palette (0-7)
| colspan="8" |Tile LSB
|-
|
|
|
|
| colspan="8" |X Pos (0-255)
| colspan="8" |Y Pos (0-255)
|-
| colspan="20" |
|-
|'''CPS-1'''
|ROM
|256
|8
| colspan="16" |X Pos (0-65535?)
|-
|
|
|
|
| colspan="16" |Y Pos (0-65535?)
|-
|
|
|
|
| colspan="16" |Tile (0-65535)
|-
|
|
|
|
| colspan="4" |Height (in sprites)
| colspan="4" |Width (in sprites)
|—
|YFlip
|XFlip
| colspan="5" |Palette (0-31)
|}
SNES has 2 sprite sizes globally selectable, and the per-sprite bit sets which to use.
One Neo Geo sprite is a tower of up to 32 tiles (first 2 words above), which makes the attribute size that large: 2 words × 32 tiles, plus 3 other attribute/position words. The sticky bit is to place the next sprite immediately to the right of the current one.
"ROM is listed for "VRAM" when there's no RAM for pixel data, and that's stored in ROM. VRAM tends to describe register tables and the char matrix in these platforms.
CPS-1 sprite tiles are all 16x16 pixels.
{| class="wikitable"
|+Per Line Sprite Limits
!
!Sprites
!Pixels
!
|-
|'''SNES'''
|32
|34*8 = 272
|
|-
|'''Genesis'''
|20
|20*16 = 320
|
|-
|'''TG16'''
|16 single-wide
|256
|
|-
|'''F256'''
|64 or 128 (no limit)
|
|
|-
|'''CX16'''
|6-57
|≤512
|[https://cx16forum.com/forum/viewtopic.php?p=26089#p26089]
|-
|'''Neo Geo'''
|96
|1536
|
|-
|'''Commando'''
|?
|
|
|-
|'''CPS-1'''
| colspan="3" |none, sprite frame buffer
|}
Since transparency doesn't work well with a separate sprite line buffer, I wonder how many parallel sprite units could be in wait to be polled per layer for proper stacking of transparency. These would be similar to shift registers, but basically just a word cache that can render a pixel given an X coordinate that's asked of it. It would be a priority queue that the topmost one that serves that xcoord could present its pixel, and interest would trickle down and vie for a pull from the sram bus.


'''Rougher ideas'''
'''Rougher ideas'''
Line 324: Line 755:


<code>1lllllll c...</code>length + literal pixels.
<code>1lllllll c...</code>length + literal pixels.
Length is 1-128, there is no zero-length span.


Uses:
Uses:
Line 332: Line 765:
* SNES-style explosion or light ray effects (esp with transparency)
* SNES-style explosion or light ray effects (esp with transparency)
* Compressed cel style images or animations
* Compressed cel style images or animations
* Polygon fiilling, especially with multiple layers instead of merging spans into 1 layer
* Polygon filling, especially with multiple layers instead of merging spans into 1 layer
* Cheap enough to run in hi-res?
* Cheap enough to run in hi-res?

Latest revision as of 00:48, 21 February 2026

Global settings

CRT emulation, only for low resolution layers.

640×480 for 4:3 output, or 960×540 for 16:9 output, if bandwidth can run it. Requires 1.6875× the bandwidth, which should on its own be feasible.

Dot clock HDMI Serializer
640x480 25.175 MHz 125.875 MHz
960x540 37.125 MHz 186.625 MHz

Non-integer pixel aspect flags, again only for low res layers. Match 320×200 and 256×200 non-square aspects blended on a 480p/540p base output. Keep it at 60Hz.

Select 50 or 60 Hz in any resolution. Ditch 70Hz, as nothing syncs to that in the PC space for compatibility. 50Hz is much lower priority, but can be done by extending vblank time and keeping pixel clock the same. If 640×400 res is still used, have it at 60Hz, again same pixel clock but longer vblank.

Global scroll register, for setting where 0,0 is on the display. This might also change how the raster lines are counted, going from say -100 to 380 instead of 0 to 480, to line up with borders and such.

Let the mouse pointer pick a CLUT, instead of being locked to grayscale. Or have its own dedicated 16-color one.

Palettes

Reduce from 24-bit to 16-bit, better suited to 65816, and makes a lot of addressing simpler.

5-5-5-1 masked, or 4-4-4-4 RGBA? Leaning towards the latter. Using transparency 0=opaque, 15=fully transparent is probably easier.

Have a FPGA block which separates & combines 4 values into 1, all R/W registers, avoid all the shifting. Include signed clamping when converting to the single RGB word.

Rougher ideas

Palettes are always 4-4-4-4, but direct color 5-5-5-1 or 4-4-4-4 can be used for bitmap layers? Probably best to keep it the same, but given clear displays today, the 5-5-5 would look better for full-color backgrounds.

4bpp Palette Configurations
Entries Depth Palettes Tiles Sprites Notes
SNES 256 555 16×16 0-7 8-15
Genesis 64 333 4×16 0-3 0-3
TG16 512 333 32×16 0-15 16-31
F256 1024 888 4×256 0-3 0-3 No 4bpp, this is all 8bpp
CX16 256 444 16×16 0-15 0-15
Neo Geo 4096 666 256×16 0-255 Everything is sprites. RGB channels share low bit, 5551 = 16 bits
Commando 256 444 16×16 0-7 8-15
CPS-1 3072 4444 192×16 0-31 + layer 0-31 4bit brightness, each layer/type has its own 32 palettes

HDMA

After a line has been rendered to the line buffer, run the HDMA list. If the HDMA list is done, then trigger the EOL interrupt if enabled. The timing that a line & HDMA takes is dynamic. The EOL should always be fired for a line even if there is no time left.

Ideally, if the NMI line could be connected, the EOL interrupt can be dedicated there for minimizing latency.

At first, the HDMA list should contain a line number to wait for, a count, and number of address/data pairs in video IO space to write, and whether to fire an interrupt.

Advanced features would be to load a value from a table, offset by the raster number. And run the HDMA (with optional interrupt) every line until the target line is reached.

There could be some BRAM dedicated to HDMA use or other on-chip variable storage. Would free up the bus, and take smaller indexing for where to copy from. Also, could set some state vars.

The rasterline would likely be the graphics line, not the hi-res line but this could also be an option.

The HDMA "program counter" is visible and editable, and can be safely modified after EOL interrupt. If it's $000000, then it's disabled? Should HDMA lists be on-chip? Have HDMA variables on-chip, referred to by the copies by small index?

This has the effect of externalizing and obsoleting the rasterline interrupt registers, as the HDMA could just fire them on a list of lines instead, without running any actual DMA.

SNES sets up the destination as part of the HDMA config, then each line only has the data to write. Saves cycles, and is probably reasonable to implement, given what these tend to be used for. Examples at https://snes.nesdev.org/wiki/HDMA_examples

While it might be a lot for HDMA, DMA'ing in a chunk of sprite registers from SRAM would be nice. This might be something for normal DMA during VBLANK.

Line Buffers

Use 36bit BRAMs, storing 3 12-bit pixels per word. This is the final output buffer, no alpha necessary.

2 line buffers would be used for CRT emulations, or 3 if we need a work one. 320px wide / 3 = 107 words per line, 321 words total (3 linebuffers), 11,556 bits.

Layers

Every layer def has these:

  • Type (tile, bitmap, RLE, sprite?)
  • Base pointer (could be page-aligned 16-bit, for 16MB range?, else 32-bit pointer)
  • x/y pixel scroll (16-bit wrapping). Could share these in scroll groups, but not a big deal to duplicate these. This includes scrolling sprite layers
  • CLUT selection
  • Bit depth?
  • Clip window?
  • Masking enable & layer target
  • Last layer flag? Meaning color 0 isn't transparent and uses the CLUT color to fill in the background
  • High or low resolution? Some modes and bit depths are low enough bandwidth to do at 480p

64-byte cache per layer, for retaining sprite/tile graphics for reuse (more if bpp is smaller), or a readahead buffer for DDR3 for RLE and bitmap modes, which don't reuse anything anyway

No-overdraw bandwidth reduction

Have multiple hardware instances of layer renderers, all fighting for external bandwidth. Render front-to-back, with transparent pixels causing the next layer underneath to want to draw that pixel. Each layer only requests individual pixels that it needs to draw, and keeps some cache for redrawing the same sprite/tile on the same line. The first layer gets all pixels requested. The only wasted reads are when a read pixel is transparent and dispatches deeper, and 16-bit wide reads where not all the bits are used.

Tilemap data is probably fetched regardless if it's in DDR3. For SRAM, just grabbing & caching the current tile is fine. Tile 0 assumed blank means avoiding fetching any of those pixels.

Alpha Transparency

A pixel entry in the line buffer contains a 12-bit ARGB value, and goes through potentially 3 states: Uninitialized, holding transparent, opaque, with holding transparent being optional.

Action→Next State
Init State Empty Pixel Transparent Pixel Opaque Pixel
Uninit No-op → Uninit Set → Transparent Set → Opaque
Transparent No-op → Transparent Blend (or no-op) → Transparent Blend → Opaque

When transitioning to opaque, the pixel is complete. The common case is uninit to opaque, and shouldn't be done with extra clock cycles, unless the blend is constant and pipelined.

For simplicity, the 'alpha' is actually a transparency channel, with 0 = opaque, and 15 = fully transparent. A layer can override transparency, or any individual palette entry can have non-zero transparency. For most cases, a pixel index value of 0 is a skipped pixel and the equivalent of $F000 (fully transparent black).

Individual layers, sprites, can have a transparency override, ignoring the palette alpha value. If they don't, transparency from the palette per color can still be obeyed. Maybe palette transparency needs to be enabled as well, just using the 12-bit color by default, meaning we can still use 0→f as clear→opaque as standard.

Masking Layers

Any layer can be used as a mask, to clip out layers underneath it. Wherever a solid pixel would be drawn, this would cause the pixel rendering to skip down to a layer lower in the stack. RLE layers would probably be the best for this, but any layer can be used this way to stencil out graphics. Instead of just masking out the immediate layer below it, it can choose which layer to skip to, so it can mask out an entire consecutive stack of layers.

TODO

Buffering alpha sprites with no-overdraw is harder, probably needs its own linebuffer. Both the alpha and opaque layers need to track their own layer depth per pixel when merging together, as maybe only the alpha is visible but not the further back opaque sprite. Simplifications would be that alpha sprites don't show any sprites underneath, only tiles, but that's lame. Another would be that alpha sprites that overlap other sprites always combine with sprites and can't have tiles underneath. That is lame as well. The only real solution is to have the f2b trace through the sprite priority layers instead of having a single combined sprite line buffer, or split out sprite transparency into its own line buffer, with each pixel having its own priority number in both sprite line buffers. Or just support alpha transparency in tile/bitmap/rle layers and not sprites. Again lame, but probably the most workable solution. A small bitmap layer basically acts like a sprite anyway.

Oh, I guess we need an alpha mode as well. Averaging, brightening, dimming, threshold, gel, etc.

Indexed Bit Depth

Currently, everything is 8bpp, which is high bandwidth, more work to create artwork, and doesn't port as easily from other systems that use multiple palette swaps onscreen.

For tiles, sprites, and bitmaps, choose 1/2/4/8 bpp. Direct color 16-bit bitmaps would be separate from paletted bitmaps.

Each layer can select a CLUT. Each sprite or tile points to either a starting palette entry, or maybe a bitmask to OR into it. Pixels that are affected by local color would have those bits be 0, while static colors have those bits all 1. This can drastically reduce the potential colors available, though. If you have 8 shades of the selected color, that means only 32 static colors in the rest. Very wasteful, but functional.

Bitmaps

Option to wrap. Else, it shows blank pixels outside its range. Divmod can be done once per line.

Size x/y pixels

Small bitmaps with wrapping could make for easy wipes and parallax effects, covering the entire screen with a pattern.

During render, recompute the starting address to allow raster effects including Y stretch/squash.

1/2/4/8 bpp for indexed bitmaps, option for 4-4-4-4 or 5-5-5-1 for direct color bitmaps.

DDR3 suits bitmaps the best, ensure that gets supported. A 640×480×8bpp (+307,200 $4B000 bytes) single layer mode would also be nice for GUI and PC game ports, especially with blitter support. Scrolling can still be supported, maybe more than 1 layer might be doable as well.

Tiles

Layer: Base pointer = Tilemap pointer, add tilegfx pointer

Map size x/y tiles

Option to wrap.

By default, tile 0 is assumed empty and no pixels are read, saving bandwidth. Can be used for metadata. Option to have tile 0 non-empty? Maybe it should be tile $ffff (or whatever the max index is, given flip bits?) That would never be in the tileset anyway.

Need to ensure that tile layers are max 65536 pixels wide/tall, since that's what the scroll layers deal with.

For DDR3 tilemaps, it reads an entire row into a buffer. For SRAM, it fetches each tilemap entry as it becomes visible, then fetches the pixels separately, and caches them.

Tile attributes of various platforms
15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SNES VFlip HFlip Priority Palette (0-7) Tile (0-1023)
Genesis Priority Palette (0-3) VFlip HFlip Tile (0-2047)
TG16 Palette (0-15) Tile (0-4095)
F256 Palette (0-3) Tileset (0-7) Tile (0-255)
CX16 Palette (0-15) VFlip HFlip Tile (0-1023)
Neo Geo attr Palette (0-255) Tile MSB (0-1,048,575) Auto-anim 3b Auto-anim 2b VFlip HFlip
Commando Tile MSB (0-1023) VFlip HFlip Palette (0-15) Tile LSB
CPS-1 attr Priority VFlip HFlip Palette (0-31)

Neo Geo and CPS-1 uses 32 bits per tile. 16-bit LSB tile number is excluded, 16-bit attributes (above).

Tilegfx

These are self-describing, with a header that takes up tile 0's pixel space? Having tile $FFFF be transparent actually would be easier for tileset creation, because that would always exist and take no memory or extra flags.

bpp (1,2,4,8), size (8×8, 16×16 (32×32, 64×64?)), bitmap mode, which would give it a stride of 256 bytes no matter the bpp (or take a stride byte/word value, more complex computation but just once per rasterline).

1bpp tiles are interesting, in that we could have transparency modes like text: fg only (direct color selection?), fg/bg (palette index where the 2 start from), fg/bgp (bg0 is always transparent, rest are solid).

I've never seen a platform with the Foenix's notion of multiple tilesets addressable from a single tilemap. I'm not sure that's really ever used anywhere either. It's interesting, but not sure how useful it would be to keep, vs the >8bit tile indices above. The only time it would be specifically useful if 2 tile layers share same partial tileset, but have additional differences. I don't really see a super pressing need for that. I also think that the size, bpp, etc should be part of the tilemap/layer definition, not separated off into its own table of tilesets.

Rougher ideas

Neo Geo has auto-animating tiles/sprites. 4 or 8 tiles in a row in the tileset can be cycled through for animation. (cycling through low 2 or 3 bits of tile index). Global or layer-specific config for how may frames per step.

Meta-tilemap mode, so the tilemap holds entries that are 2×2 or 4×4 hardware tilemap entries.

Priorities aren't all that often use, but might be a good feature to have, especially to save on tilemap layers and scads of empty entries in those additional layers. Place the "background" in front of the sprite layer, with the tile either high priority drawing, or deferring to the sprite layer below it. If the priority is low, it passes the color to the sprite layer, and if the sprite layer has no pixel, it draws that bg pixel. If the tile pixel is transparent, then transparent behavior applies regardless of priority and nothing is different.

A CPS-1 style priority bit setting (16-bit, priority per color in that tile) might be interesting, but I'd say a half-and-half mode (lower 8 colors are lower priority, higher 8 colors are higher priority), or higher 4 colors etc, might be more compact and useful. It puts more constraints on the colors, but I would think certain foreground objects might have their own colors anyway?

Sprites

H-flip at the very minimum. If V-flip and 90° (since all sprites/tiles are square sized), then all 8 orientations and flips are possible. Rotation is only available in SRAM.

16-bit sprite image selection from base pointer, based on bpp & size. Flip bits might be at MSBs of the word. 3 flip bits means 8192 sprite images.

Sprite sheet mode (kinda like square for tilesets), with a declared stride.

Color selection, direct for 1bpp, starting palette offset for 2/4bpp. Note that both of these could be expressed by OR'ing a mask on top of the color.

Need to figure out something for 8bpp color, to have palette swaps for limited ranges. The layer could have a color range/mask where all colors less than that use the sprite color. But this can probably be done with just an OR mask (or XOR for fun?). 8bpp sprites would normally have a 0 'color', palette swaps would leave a range of bits open and use the individual sprite color to set that, but we don't want that done on all bits, only those that are in "palette range", hence the layer setting. Maybe also only when the color is nonzero? or non-FF? Or if the color is 0-16, then it has 16 color selections as a 4bpp color? Colors 17-255 would be normal. Use the top bit of the color byte to select this coloring mode.

8×8, 16×16, 32×32, 64×64 sizes (2 bit selection, forget 24×24) TG16 had 64×64 and did interesting pseudo-layer and huge enemy stuff with it.

1,2,4,8 bpp (2 bit selection)

(or should bpp & size be for the layer? might make for simpler implementation, but varying sprite sizes are probably good. Bpp might still be a consideration for layer config

Overdraw avoidance eliminates pulling in unseen pixels, which prevents hardware collision detection, which is fine.

I do like the genesis idea of having a linked list of active sprites, as it can save time scanning the active sprites and finishing early. Question is should there be multiple sprite layers, each with their own list head, or one sprite list with each sprite having its depth independently set with its priority bits?

For transparency, we probably need 2 sprite linebuffers, one for the transparent part, and one for the solid part. Each holds its own layer in the upper bits. 18 bit entries = 16-bit 4444 RGBA, plus 2 bits layer selection. Now, there's nothing for enabling/disabling the pixel, though. The transparency layer could be enabled by alpha channel, and because of stacked blending of transparent requires the actual full color value. The solid layer could do clut + 8bpp + enable bit, for 13 bits total, or 12bpp plus 2bpp layer plus 1bbp enable, for 15 bits total, since we know that's not transparent. We probably want a flag for the layer/global as to whether sprite transparency stacks or not, since enabling it will blend deeper transparent sprite pixels from being obscured to the topmost transparency layer.

Sprite register should probably be a structure of arrays, so you can wholesale copy x/y locations faster than copying literally the entire sprite definition block every frame. Individual sprite updates for color, frame, etc can be set during animation handling, and directly set the sprite register instead of that needing to be part of the update loop. Probably have an 8-bit register access mode (high bytes & low bytes in different areas, for 256 sprites) vs 16-bit register access mode (for 65816 16-bit copies with indices from 0-510).

Sprite definitions of various platforms
VRAM Num Bytes 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
SNES 64kB 128 Y pos (0-255) X Pos (0-255)
VFlip HFlip Priority Palette (0-7) Tile (0-511)
(packed with other sprite x msb & size) X Pos MSB Size
Genesis 64kB 80 8 Y Pos (0-1023) -256
Width (1-4)×8 Height (1-4)×8 Link (next sprite in draw order)
Priority Palette (0-3) VFlip HFlip Tile (0-2047)
X Pos (0-511) -128
TG16 64kB 64 8 Y Pos (0-1023) relative
X Pos (0-1023) relative
Tile (0-2047), address = tile×32 regardless of sprite size
VFlip Height (16,32,64) HFlip Width (16,32) Priority Palette (0-15)
F256 2MB 64 6 Tile Address MSB Size Priority Palette (0-3) Enable
Tile Address LSB
X Pos (0-65535)
Y Pos (0-65535)
CX16 128kB 128 8 8bpp Tile (0-4095), address = tile×32 regardless of sprite size
X Pos (0-511)
Y Pos (0-511)
Height (8,16,32,64) Width (8,16,32,64) Palette (0-15) Collision mask Priority VFlip HFlip
Neo Geo ROM 380 134 Tile LSB
Palette (0-255) Tile MSB (0-1,048,575) Auto-anim 3b Auto-anim 2b VFlip HFlip
HShrink (15-0) VShrink (255-0)
Y Pos (0-511) Sticky Height in tiles (0-32, 33 = special wrap mode)
X Pos (0-511)
Commando ROM 128 4 Tile MSB (0-2047) X Pos MSB HFlip Palette (0-7) Tile LSB
X Pos (0-255) Y Pos (0-255)
CPS-1 ROM 256 8 X Pos (0-65535?)
Y Pos (0-65535?)
Tile (0-65535)
Height (in sprites) Width (in sprites) YFlip XFlip Palette (0-31)

SNES has 2 sprite sizes globally selectable, and the per-sprite bit sets which to use.

One Neo Geo sprite is a tower of up to 32 tiles (first 2 words above), which makes the attribute size that large: 2 words × 32 tiles, plus 3 other attribute/position words. The sticky bit is to place the next sprite immediately to the right of the current one.

"ROM is listed for "VRAM" when there's no RAM for pixel data, and that's stored in ROM. VRAM tends to describe register tables and the char matrix in these platforms.

CPS-1 sprite tiles are all 16x16 pixels.

Per Line Sprite Limits
Sprites Pixels
SNES 32 34*8 = 272
Genesis 20 20*16 = 320
TG16 16 single-wide 256
F256 64 or 128 (no limit)
CX16 6-57 ≤512 [1]
Neo Geo 96 1536
Commando ?
CPS-1 none, sprite frame buffer

Since transparency doesn't work well with a separate sprite line buffer, I wonder how many parallel sprite units could be in wait to be polled per layer for proper stacking of transparency. These would be similar to shift registers, but basically just a word cache that can render a pixel given an X coordinate that's asked of it. It would be a priority queue that the topmost one that serves that xcoord could present its pixel, and interest would trickle down and vie for a pull from the sram bus.

Rougher ideas

Are all sprites independently defined and layered? Are they all from the same base pointer? Would need multiple layer definitions to give multiple base pointers, and that might be a good idea? Each layer involved could take 64 sprites out of 256, max 4 sprite "layers". Maybe that's not something that should be layer-based, but global to the sprite system. 4 base pointers, 4 groups of 64 sprites, 4 hardware elements scanning the sprite registers in parallel.

Unlimited height sprites? fixed width

8bpp color register could be used to bank a subset of colors, maybe a color range (0-7) can be cycled while others are fixed. Think of Age of Empires 1 recoloring for instance. Or just use that to select a CLUT, overriding the layer's selection.

Select the color to be transparent? If using a fixed smaller palette, like DB16/32, then each sprite could pick a different one. Pico-8 has a 16-bit mask for which colors to include or not, which is interesting.

Cut-out sprites wouldn't display, but would clear any pixel from sprites above it, allowing sprites below to show through. Or, it could skip the sprite immediately below if it has a pixel, masking a single sprite, which might be easier to implement.

Figure out sprite zooming. No rotation,just scaling, not inverting with this? Can grow or shrink independently in x & y. Maybe bresenham? Probably want sub-pixel accuracy, 16-bit with fixed point? Or full 32-bit fixed? x1/x2/y1/y2 dest rectangle maybe?

RLE

Layer has a height (width is undefined).

The base pointer points to an array of height× 16-bit offsets, one for each line, indexed from the base pointer. Each line is an non-demarcated concatenation of tags:

0lllllll cccccccclength + color pair, color 0 is typically transparent

1lllllll c...length + literal pixels.

Length is 1-128, there is no zero-length span.

Uses:

  • Wipes
  • Solid color borders
  • Raster bars without interrupts
  • SNES-style explosion or light ray effects (esp with transparency)
  • Compressed cel style images or animations
  • Polygon filling, especially with multiple layers instead of merging spans into 1 layer
  • Cheap enough to run in hi-res?