I bet there's a raft of optimisations you could make, starting with lowering the resolution.
Got excited to see if I can try and get some decent performance on it.
The first iteration renders a full RGB scene then converts it using floyd bernstein dithering, but given the ray casting rendering algo I think I can just skip that and simply use 1-bit textures.
So I made a build that renders the walls directly to the PDs 1-bit buffer skipping all the RGB stuff. It dose'nt look quite as fancy but I was hoping this one might run decent.
Its sitting here rayCasterv8_fasterNoDit.zip - Google Drive
I've bumped the version to 1.8.
I've also got a version that renders floors sitting here:
Version is set to 1.9
I'd be very curious to see if this makes it run decent if so it might be a strategy I could use to continue build the game out on. I know you already tested a lot of stuff here so totally understand if you dont wanna spend more time on this but If your curious as well then please try it out and let me know how it runs.
I posted a build that should perform a lot better. Or so I hope If you feel like trying it out I would again be very grateful
Hey Jimmie, FYI turns out it's the build number you have to increment to update on sideload properly...
Just testing this out now...
The frame rate seems significantly impacted when the screen is occupied by wall. It runs at an interactive frame rate when the wall is far away.
Thanks a million for helping out and testing it!
I would imagine that the hit is a result of cache misses, the raycaster renders vertical stripes of the walls. The buffer of the PDs bitmap as well as the textures are layed out in memory as rows of X. So each time the raycaster wants to write a stripe it will constantly make a hit cache. I think this could be fixed by rendering it all out in some temporary structure and then copying it row by row instead, that way it should be much more cache friendly. Ill try and have a go at it
Really appreciate you taking the time and helping out! I do hope my ebay buy comes through then I can stop pestering people on the forums to help me like this
There have been a couple of raycasters in the past, I'm trying hard to recall who made them. I do recall you're right that performance was limited by cache hits though I can't remember how one restructured to work around it. Iirc it achieved 40fps afterwards.
@matt - Yeah I think its being capped by fetching from memory as the pixels it wants to touch are always out of cache due to the way its layed out it memory. I think I can either rewrite the raycaster so it stores and then reuses calculated values, working its way from x = 0 to x = 399 when drawing rather than at every pixel X it stops at it dives down the Y-axis. If thats to tricky, I can probably store it temporarly in a layout that allows me to just mem-cpy all of it later on.
Happy to hear that there are raycasters before that have achieved high frame rates, sounds likely that I should be able to do it as well.
@AlexMay - Seems like my playdate has been shipped from e-bay (yeah I caved in, I've been waiting a year already... ) so once it ships I'll be able to do testing myself.
I'd like to try and make a better performing one though, so I might post a new version here. Hope your not to tired of testing it out
@AlexMay - Posted a new build which I hope should make the wall rendering faster, its sitting here
I've bumped the version (1.10) and I've set the buildNumber to 2 in the pdxinfo file (I never set the buildNumber before so I'm kind of assuming it would default to 1 then?).
This one pre-calculates the ray probing sent out from X = 0 to X = 399 and stores all the data needed to draw all the horizontal wall stripes on the heap. After doing that it is possible to fill out the playdates draw buffer from X = 0 to X = 399 (linear manner) rather than jumping around in the draw buffer for each pixel it renders. Hopefully the renderer has small enough stacks everywhere to be able to do the data pre-fetching to get some gains (I would imagine if you fill up your stack with lots of fluff in the rendering loop, there wont be enough space in the L1/L2 caches of the cpu to do any prefetching anyways... Bit hard to say without diving deeper).
TL;DR - it should be faster, but its theoretical and could very well not be because its kinda hard to estimate
I'd love to see if this makes it run any better, especially when there is a lot of walls filling up screen.
Thanks again for taking the time and helping me
Ok finally made time to try this, sorry for the delay! It's still pretty slow, arguably a bit better when walls fill the screen, but maybe slower when they're far off? Overall I'd say better though.
Thanks for taking the time and trying it out
It probably is quite tricky to nail the last part here, optimizing for caching can be quite hard.
So Ill probably need to tweak and test a bit more with this.
My device should show up before the weekend though so if perhaps can be fun dig into this again, take a little break from fighting the demons in diablo iv
Again, thanks for all the help - truely appreciate it!
So, finally got the physical device and I've managed to build something that seems to run at a decent speed now. Its not perfect yet, but its starting to look viable!
The secret sauce so far has been scaling. This runs at about half resolution plus a few tweaks to render in a more cache friendly manner (but mostly the gains are from the secret sauce - scaling hehe).
I just thought I'd share the build here since you been so kind and helping me try it out and perhaps you'd like to see the next iteration.
Its sitting at ray.pdx.zip - Google Drive
@dave - Ofc a big thank you to you as well, I'd never found that stack overflow with out your help!
Two things you could try (sorry if you already did and I missed it):
Doing the raycasting in a rotated image so that you're drawing each column as a row will be a lot friendlier to the cache, then you can do a final rotated draw to the framebuffer. Playdate has an optimized implementation for 90-degree drawing, takes just a few ms to draw a full-screen image rotated 90 degrees. Rotating your source textures should also help, if you haven't done that already.
Also, I remember seeing a lot of floating point in your pdex.bin. I don't know what kind of improvement you'd see using fixed point math instead, but I got around a 2x boost when I did that on my FM synth demo. Though it's a lot easier to do audio stuff in fixed point than graphics, I know..
Thanks for the tips and all the error searching
I do have an optimized version that is more cache friendly, but I think a rotated renderer would be even more efficent.
I do plan on taking a chunk at the bottom away for a GUI anyways though so I think I can get pretty good performance with what I got now.
Going to turn my eyes on implementing some mip-mapping to help with the artifacts of the scaling code. Should be straight forward enough as I got the distance to each stripe Im rendering in the raycaster and can just choose an appropriate texture depending on distance. Should make it look a lot nicer I would think
Just tried the "rotated textures" approach, it made a significant improvement as well on performance.
So thanks for that advice
Just wanted to say a big thanks again for helping out error searching my raycaster and coming up with advice. I turned that thing into a game and I took the liberty of putting all of you in the credits section
The game is sitting here: Red Terror (Playdate) by therussianbeargame
If you got a link to webpage or something you want mentioned next to your name late me know and Ill see if I can try and squeeze it in.
Again, a million thanks for all the help!
Awesome work! Congrats on the launch!
Super kind with the credit, all good by me.
Can you be a bit more specific as to how to actually do this? Is there an API (":drawRotated') I'm missing?
Or is it just this? It's probably just this: LCDBitmap* playdate->graphics->drawRotatedBitmap(LCDBitmap* bitmap, int x, int y, float degrees, float centerx, float centery, float xscale, float yscale);
That's the one! If
bitmap is 240x400, degrees is 90 or 270,
yscale are 1, and
(x,y) adjusted for
(centerx,centery) winds up at (0,0) (i.e.
x-centerx*bitmap->width == 0 and
y-centery*bitmap->height == 0) then it uses a very fast algorithm for rotating the image.