I’m getting a crash when running my game on device and not in the simulator. I looked up how to get a crash log and symbolicate it, but I’m just getting question marks out the other end. I also saw a method to lookup the symbols directly, but it references a file named symbols.db but the only copies I have come with the Playdate Simulator, and I gather those don’t contain the right symbol table for symbolicating a crash log from an actual device.
When I execute python3 /path/to/playdate/PlaydateSDK/bin/firmware_symbolizer.py /Volumes/PLAYDATE/crashlog.txt /path/to/MyGame_DEVICE.elf I get
?? ??:0
?? ??:0
Can anyone assist? I’m not sure what to do next! It looks like the crash is happening in my own code, but before I start pulling the entire thing apart bit by bit, I could really use something pointing to where the crash is happening. Is there anything else I can do to decipher the crash log, or get some other kind of diagnostics information?
If you’re getting ?? that means the crash is in an address range not covered by your ELF i.e. it crashed in the system. Here’s the crash
mALLOc at /local/builds/ahdEsyZ4Q/0/playdate/PlayDate/Device/Firmware/Core/dlmalloc.c:3474
dlmalloc at /local/builds/ahdEsyZ4Q/0/playdate/PlayDate/Device/Firmware/Core/dlmalloc.c:1605
I’ve done some deeper debugging with print statements to narrow down where the crash is happening. I’ve found that it’s happening in a call to playdate->file->listfiles, but it’s not clear why.
I’m passing it arguments that I’ve passed it earlier in my game’s execution, so the arguments are fine. And it’s crashing before it even gets to executing the callback function I pass in for the second parameter.
So… this might be an SDK bug! Is there anything else I can do to confirm whether this is in the SDK or somehow my own bug?
(I’ve edited the thread title to reflect this new information about the crash.)
I'm also experiencing this. I haven't yet gotten so deep as to know when it’s happening in my code. I'll say it doesn't always happen, and I'm not using playdate->file->listfiles. I'll reply if I figure anything else out.
I can try to isolate this further for you but I may need to test with the actual repo and I’m not sure if you are comfortable with that. ‘^^ Let me know if you want me to have a look, though. I’m also on the PD discord with the same handle.
In my case, I'm working on a game engine which is on GitHub with an MIT license, so access to the repo is no problem: GitHub - invisiblesloth/roxy-engine-project-template: A quick start template for Roxy Engine projects. I thought that the issue was related to a test project which has some very large tilemaps and music and other assets, but I actually just got the project template to crash, and it doesn't have any of that.
DISCLAIMER: I wouldn't use the game engine yet to make games as I'm still changing the API and breaking things regularly. Also, I have used generative AI on code for the game engine. I need to add a note about that in the README.md. I actually need to add a lot to the README. The installation instructions are inadequate to say the least.
Feel free to take a look. Any help or feedback is welcome. Thank you!
That being said, I'd like to understand the problem and figure it out on my own if I'm able (I may not be able, I'm honestly out of my depth on this one). How would you go about tracking this down?
Steps to reproduce:
Build, run, and upload to the Device
Open the sampler and toggle to Device > Lua and start sampling
Then just keep switching back and forth between sampling Device > Lua and Device > C … it'll crash after a few times back and forth.
I've also had it crash if I just leave it sampling Lua for 2-5 mins.
Normally I would stop sampling before I toggle between Lua and C, but I haven't been, and it seems to crash faster when I don't stop sampling.
Here is what I've done so far:
I hardened all my C code making sure there is no way to divide by zero; although, I pretty confident that wasn't happening (It never crashes just playing the game on the Playdate)
I also normalized indexing
Added more validation (sizes, counts, etc.)
Added early outs for when images/imagetables might be missing
Now I'm trying to wrap the Playdate SDK allocator everywhere, so that I can figure out where the issue is, but so far it's crashing without triggering my canaries.
I will say that fixes I've added do seem to have helped as it’s not crashing as much or as quickly. At first it would crash the first attempt to sample from the hardware.
If it would be helpful for the Playdate devs I can try to make a shareable project that reproduces the bug. Of course there could be a complex interaction going on here that triggers it, so divorcing it from my game and engine might prevent me from being able to trigger it.
divorcing it from my game and engine might prevent me from being able to trigger it
This was my reason for offering to take a direct look. It could be something about particular filenames or a specific state things get into in your game. =[
re: Invisible Sloth I’ll check out your project this evening and see if I find anything. If I do find the issue I’ll let you know
Glad to hear you’ve got a reproducible crash. However, are we sure that the crash I’m getting and this one are the same crash? My project is using the C API and specifically occurs with a call to listfiles.
Yours is almost certainly a different crash, and this other thing probably should have been a separate thread. Sorry for the noise. If you do have a minimal repro setup to give the staff, it would be useful to them I'm sure.
I'm still down to have a look myself if you want me to. I haven't used the listfiles API call in my own PD projects so far but I've done a lot of C dev and can maybe isolate the crash + help make a minimal repro for it.
I'm happy to help too. That's my fault for the noise. I'm sorry. I guess I was so consumed by the problem I had that I thought you were experiencing the same issue. I misread your post and thought you were crashing while sampling like I was.
Here are some of the things I learned this week trying to figure out my crash. I tried to map it to using listfiles. You may know all of this, but maybe there is something here that helps.
Since you're only crashing on the device could be related to a hidden file like .DS_Store . Shouldn't actually cause a crash, but I'm trying to think why you crash on device but not simulator.
I saw here that the stack might be ≈61 KB, trying to use more than that might crash. Not certain on that number though.
Frequent malloc/free cycles can cause you to run out of heap. Fragmentation.
Mixing standard malloc/free and pd->system->realloc/free creates 2 separate heaps. Freeing with the wrong allocator corrupts.
Avoid file system API calls including listfiles from inside the callback since it’s running while the system is iterating the directly. Could be that the simulator runs faster and finishes iterating before the device does.
Don't store the pointer for use after the callback returns. If you need it later, copy const char* filename to your own buffer. You can do this in the callback. If you are doing this, it could be where the corruption is happening (see next bullets).
Trimming strings might accidentally trim off the \0 resulting in a non-terminated one (Relates to storing the filenames)
Watch out for off-by-one when appending filenames … the / on directories can throw off the count
Avoid Non-ASCII or very long names. I didn't find anything that said this was a problem, but could be a difference between the computer and the device maybe? Sound's like a best practice.
Things you can do to debug…
Strip back your callback then re-introduce one piece at a time. Probably hard to do as this will probably break your game. Maybe you could pull out each piece into a new function? Goal is to get to a callback that doesn't allocate or parse anything and see if it still breaks or if the break moves somewhere else.
If you are allocating in the callback, wrap with guards to detect overruns or double frees. (This is what I was starting to do although I wasn't using listfiles.) Maybe you could detect overruns or double frees with the simulator Malloc Log, but I don't know how, seems like too much going on there to find the needle in the haystack.
Use playdate->file->stat to confirm it’s a directory before calling listfiles.
Edit: I meant to reply to @bribri ‘s post just above, not this one.
Well, it turns out the crash has just disappeared on its own!
I don’t have an explanation for why it went away. I did try updating my Playdate + the SDK to see if that made a difference, but I’m near certain that I was able to reproduce the crash afterwards.
Nonetheless, it is now gone. One of my personal adages about programming is that any error that mysteriously disappears can just as mysteriously reappear at any moment, so I suspect this isn’t the end of it. But for now I have nothing to reproduce!