Nice, would be interested in seeing the final product .
I have managed to do some optimizations on mine to get a pretty stable 20fps. In debugging performance issues (buffer/array access expensive ) I suspect the very minimal cpu caches are most of the pain here, as it just chews through most of the actual maths.
Wow 40fps, that's impressive, would be interested in any tips/tricks.
So I already am:
Packing 4 vars into a single u32 (2 colours, 2 altitudes) as keeping the cache lines cleaner seemed to improve fps even if it requires more shifts and bit masks. And playdate does 32 bit load/stores anyway if you look at the assembly generated.
I was already using back to front with a depth buffer, but seeing as the cpu seems to be fine at maths and slow at data loads/stores I inverted my loops so that we do the z-buffer for each column rather than the vice versa. This does increase the FP maths quite a bit, but allowed me to remove the y buffer which helped fps, as you can just use a single var. (credit for the idea to a HN discussion from 2021 Voxel Space: Comanche's terrain rendering in less than 20 lines of code (2020) | Hacker News)
I render at 1/2 resolution but do the dithering at the full playdate screen resolution (as it looks better this way).
Removed all intermediate buffers (used to have a framebuffer which was then passed to the dither and screen writing method), now I just directly dither and write to the screen in the main loop.
Rust specific: Used unsafe array access almost everywhere to avoid bounds checks.
Increased the z delta a tiny bit (too high and things look broken quickly). Probably can do quite a bit of tweaking here as I don't think a linear increase in the z delta is optimal.
I hope to be able to open source the code for it soon, once I do would appreciate feedback.