Need help determining cause of on-device crash in C Game, PC at ~0805cc14

,

Hiya. Have been debugging for several hours trying to isolate a crash that happens pretty reliably on device. Log below. The most common location is the one at 0805cc14. Unfortunately I can't tell exactly when this happens, but I think it might have to do with reading and writing files, possibly in close succession.

Any pointers?

'C.

--- crash at 2024/04/30 19:48:15---
build:5cd9814a-2.4.2-release.166897-buildbot
r0:20039f84 r1:20039fa8 r2:00000030 r3: 600f5b88
r12:6013a6a8 lr:00000000 pc:0805cc14 psr: 41070a00
cfsr:00000082 hfsr:00000000 mmfar:3f800008 bfar: 3f800008
rcccsr:00000000
heap allocated: 1287872
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

--- crash at 2024/04/30 20:34:05---
build:5cd9814a-2.4.2-release.166897-buildbot
r0:20039f84 r1:20039fa8 r2:00000030 r3: 6013a658
r12:6013a6c8 lr:00000000 pc:0805cc10 psr: 410f0600
cfsr:00000082 hfsr:00000000 mmfar:0000000c bfar: 0000000c
rcccsr:00000000
heap allocated: 1326272
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

--- watchdog reset at 2024/04/30 18:33:25---
build:5cd9814a-2.4.2-release.166897-buildbot
r0:00007ec7 r1:20031340 r2:00000001 r3: 20025998
r12:006d3000 lr:0801960f pc:08033f7a psr: 8900002c
rcccsr:00000000
heap allocated: 1286944
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

Have you tried running this through the firmware_symbolizer.py script in the /bin folder? It may help shed some light on the crash.

Did do, but given the addresses are lr:0 and pc:08nnn, there were no useful symbols found. Interestingly the script still extracted a func name+file+line number for the 0x08 addresses, but they didn't actually correspond to anything meaningful, e.g. "memset" as the function, on a line outside of any function, in a file that doesn't call that directly at all.

The first two are in malloc_consolidate(), down in the memory manager. The mmfar value is the address it's trying to access, causing the memory access fault. My guess is the problem is memory corruption from writing past array bounds, overwriting the malloc bookkeeping data between allocation blocks.

Have you tried running this in the simulator with the memory pool disabled? If you can get it to trigger a crash there as well, we might be able to use OS tools to pinpoint it--Address Sanitizer on macOS and Linux should help, no idea about Windows tho. What platform are you on?

Thanks! It took a bit because I didn't have a clean repro, but Address Sanitizer eventually found it for me. Flew too close to the sun: was using negative numbers as error codes in a return value that unwittingly made its way into an pointer offset, and boom, wrote a zero in the byte before an allocation. My bad!

Anyone else who's here because you googled any of the terms on this page along with Playdate, please try Address Sanitizer too! As Dave mentions, be sure to turn off the memory pool. In hindsight it's obvious why sanitizer won't work with the pool, but that took me several minutes to realize too.

For the record, just switched back to the Mac last month. Still getting back into Xcode!

1 Like