I'm getting the following crash on device randomly.
--- crash at 2024/11/13 20:36:56---
build:bcbf4fed-2.6.0-release.176236-buildbot
r0:00000000 r1:00000000 r2:fffffff8 r3: 00000000
r12:00000003 lr:90026be9 pc:90037236 psr: 61000000
cfsr:00000082 hfsr:00000000 mmfar:fffffff8 bfar: fffffff8
rcccsr:00000000
heap allocated: 5591232
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0
lr: 0x90026be9 -> 158697
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
The weird part is that it doesn't seem to crash when compiled or run on my partners device/computer.
I want to understand how to track this kind of crash so here is what I think I understand from reading How to debug a HardFault on an ARM Cortex-M MCU | Interrupt and the devforum:
Seems like the Configurable Fault Status Registers (CFSR) has some value therefor the crash info should be in one of the status registers.
If I convert the CFSR value to binary 0000 0000 1000 0010
there are to bits sets on the first 8 bits so that would indicate that the reason from the crash is specified on the MemManage Status Register (MMFSR)
Based on this image it is a DACCVIOL
error, and that the adress that triggered the mem manage fault should be in the MMFAR
register.
The MMFAR register holds the fffffff8
value but that doesn't seem right ?
Also if I understand correctly the LR
and PC
registers should hold the address in memory of the executable of the line that caused the crash.
but I get this
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
When running arm-none-eabi-addr2line, I already disabled optimizations but doesn't seem to help to track down what function is causing the crash. So I'm a bit lost on how to continue searching for the crash.
Any guidance is welcome!
PD:
I found this tool: pd-symbolize-crashlog that outputs a friendlier analysis of the crash log.
Crash at 2024/11/13 20:36:56
BUILD: bcbf4fed-2.6.0-release.176236-buildbot
HEAP: 5591232
General-purpose registers (stack)
0: 0x000000: ??
1: 0x000000: ??
2: 0xfffffff8: ??
3: 0x000000: ??
12: 0x000003: ??
14 LR: 0x90026be9: ??
15 PC: 0x90037236: ??
Special registers
MMFAR: 0xfffffff8: ??
BFAR: 0xfffffff8: ??
rcccsr: 0x000000: ??
CFSR: 0b000000000000000000000010000010 (Configurable Fault Status)
MMFSR: 0b000000000000000000000010000010 (MemManage Fault Status Register)
DACCVIOL: Data access violation.
The processor attempted a load or store at a location that does not permit the operation.
Faulting instruction: see `PC`.
Address of the attempted access: see `MMFAR`.
And I'm using this modified version of the symbolizer from @superfunc
import re
import subprocess
import click
"""
SETUP:
1. pip3 install click
2. make sure arm-none-eabi-addr2line is in your $PATH
USAGE:
python3 firmware_symbolizer.py crashlog.txt game.elf
"""
@click.command()
@click.argument("crashlog", type=click.Path(exists=True))
@click.argument("elf", type=click.Path(exists=True))
def symbolize(crashlog, elf):
print("symbolizer")
cl_contents = open(crashlog, "r").read()
cl_blocks = re.split(r"\n\n", cl_contents)
for block in cl_blocks:
matches = re.search(r"lr:([0-9a-f]{8})\s+pc:([0-9a-f]{8})", block)
if matches:
print(block, "\n")
lr = matches.group(1)
pc = matches.group(2)
lr_num = int(lr, 16)
pc_num = int(pc, 16)
lr_num = lr_num & 0x0FFFFFFF
pc_num = pc_num & 0x0FFFFFFF
print("lr: {} -> {}".format(hex(int(lr, 16)), lr_num))
lr = hex(lr_num)
pc = hex(pc_num)
cmd = f"arm-none-eabi-addr2line -f -i -p -e {elf} 0x{pc} 0x{lr}"
stack = subprocess.check_output(cmd, shell=True).decode("ASCII")
print(stack)
if __name__ == "__main__":
symbolize()