C-based game crashing only on device

Today was a tremendous day. I received my Playdate device!

Unfortunately, my C-based game is crashing the device. I get a message to press A to restart. I've confirmed that I can build and run the C API examples on device. This same C code runs without issue in the simulator (with the limited malloc pool), as well as on the Android platform (with a few platform-specific differences, of course). This occurs when building with SDK 1.10.0 on OSX and Windows.

What would be the best approach to debug this? I've been commenting out large pieces of the engine, but iteration time is very slow without any other clues. Logging to console seems very unreliable before a crash.

I'd greatly appreciate any suggestions. Thanks!

Could it be that you're overflowing the stack? It's quite small.

Does your code use recursion anywhere?

I suppose that's possible, but I don't have any recursion, and I don't think I have any large allocations on the stack. I'll double check. Thanks for the suggestion.

The best way in this case is a good old printf() (well logToConsole() in that case)

When the playdate is plugged to the PC, the simulator still shows the debug output in the simulator (in blue).

After a hard crash we log all of the registers we can to /crashlog.txt; e.g.:

--- crash at 2022/05/02 19:18:59---
build:655189161362_dirty-dave_custom-synth-generators-and-signals-dev.20220502135538-dave
   r0:00000002    r1:20030ff0     r2:00000001    r3: 00000000
  r12:006d3000    lr:0802eabb     pc:0802eabe   psr: 61000000
 cfsr:00000000  hfsr:40000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 131264
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

The first thing you want to look at is pc and lr there--pc should be the address where the crash was (or maybe the address after) and lr points to the calling function, unless the current function is using it for something else. If you load your pdex.elf file into gdb and do info line *0x<address> it'll tell you what the functions are, if they're in the scope of the elf file.

4 Likes

Sorry for the dumb question, but how do I build a pdex.elf? I tried running gdb on my pdex.bin, but it complained "pdex.bin: not in executable format: File format not recognized."

Thanks again for your help.

The Makefiles in the example projects leave a pdex.elf behind in the build folder and then use objcopy to generate the pdex.bin files. If you're using plain make, look for it there. I haven't found a way to get an elf file from our cmake setup, unfortunately. (I'm sure it's possible, I just don't know the first thing about cmake..)

@james Any thoughts here?

1 Like

Can you tell us more about how you're building your game?

We should probably provide a symbolification script with the SDK. We already have one built for internal use.

2 Likes

Previously I was using cmake, but I just whipped up a quick Makefile so I could build the elf.

As a side note, I took "Linux/Make" in the documentation to mean "We only support make on Linux," which is why I originally went with cmake.

No luck, unfortunately.

--- crash at 2022/05/06 04:10:48---
build:7a75bff14545-1.10.0-release.135263-buildbot
   r0:00000001    r1:00000001     r2:00045400    r3: 40012c00
  r12:00003fe0    lr:0802f343     pc:0802f302   psr: 21000000
 cfsr:4891374d  hfsr:3309234f  mmfar:5d7077de  bfar: cce1188a
rcccsr:00000000
heap allocated: -1062292400
Lua totalbytes=-715342010 GCdebt=-863916382 GCestimate=-414005513 stacksize=-1847634557
Reading symbols from pdex.elf...done.
(gdb) info line *0x0802f302
No line number information available for address 0x802f302
(gdb) info line *0x0802f343
No line number information available for address 0x802f343

I added -g to my UDEFS, just in case. Do I need to make any other configuration changes to access this info?

The values of pc and lr seem reasonable, compared to the crashlog you shared, but all the negative values for heap and lua are disconcerting. I would expect my lua numbers to be zero, as I have no lua code. Any clues here?

I should have mentioned before: if the address is in the 0x08000000-0x08100000 that's firmware code. Your game will be running in the 0x60000000-0x61000000 range. There's a symbols.db sqlite database that has symbols for the public side of the api, which we use for the profiler. I don't think we have a tool for looking up addresses in there but if you're familiar with sqlite you could look around in there.. But you still wouldn't find these numbers because they're in the low-level driver code. This crash log says it's crashing in the eMMC (flash storage) driver, and that address isn't doing anything unusual. It's dereferencing $r3, but the value there is the device instance, should be fine. :thinking: And the $cfsr value looks like random bits to me, doesn't look like it logged the crash correctly. (The Lua stuff is also uninitialized data, don't worry about those :slight_smile: )

One last thing to try: If you send up the crash report I can try and find it in Memfault and see what information they've got. In Settings, go to the bottom of the first menu to System, then go to the bottom of that menu to "Send Crash Report". If you give me your serial number (DM if you don't want to share in public) I'll look it up and let you know what it says.

4 Likes

Ah, thanks for the insights. I noticed my functions were in 0x60xxxxxx according to readelf as well.

I just ran the crash again and submitted the crash report on device PDU1-Y013240.

The plot thickens.

Just to confirm that I could take an address from crashlog and look it up successfully, I put a null pointer dereference in the Hello World example. Here is the resulting crashlog:

--- crash at 2022/05/07 00:19:16---
build:7a75bff14545-1.10.0-release.135263-buildbot
   r0:00000001    r1:00000001     r2:00045400    r3: 40012c00
  r12:00003fe0    lr:0802f343     pc:0802f302   psr: 21000000
 cfsr:00000082  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 64448
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

lr and pc still appear to be in firmware address space, and even match the values I saw with my game's crash.

bah. okay, so it's definitely not logging correctly in this case. :frowning: I'll see if I can reproduce that and figure out what's going on here.

1 Like

Is logging on the device buffered or streamed I/O? Buffered may not get to flush before the bug happens.

@dave This sounds like great information and super useful. Is there an progress on providing more information in the event of a crash? All of my crashes appear in the firmware. Further, my logToConsole() calls must be getting buffered as I'm unable to even see the message immediately before the crashing code. Any additional information would be helpful. Being able to lookup symbols in a public map file, or get a stack from my own code would be helpful. It would be ideal if I could get a remote debug session running. Thanks.

1 Like

Has anything further developed in this space? I too am getting crashes in the firmware address space.

--- crash at 2023/09/22 14:23:27---
build:1fd086bf5715-2.0.3-release.158184-buildbot
   r0:00000003    r1:ffffffff     r2:e000ed00    r3: 000002c0
  r12:006d3000    lr:08019c59     pc:08019c82   psr: 81000000
 cfsr:00000082  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 2715584
Lua totalbytes=127753145 GCdebt=-127574600 GCestimate=176692 stacksize=80

Alternatively, is there a way to hookup a debugger to the device (or the simulator) for C code so I can step through it?

1 Like

One thing that's coming in 2.1 is those values in crashlog.txt are going to be accurate again. Right now they're getting overwritten before we have a chance to write them out to disk. :confused:

Once that's in place you should be able to use the firmware_symbolizer.py script in the SDK to find out where the crash was, if it was in your game code. I'll file a feature request to have that query symbols.db instead if the crash location is in the firmware.

2 Likes