Random crash on C project

,

I'm getting the following crash on device randomly.

--- crash at 2024/11/13 20:36:56---
build:bcbf4fed-2.6.0-release.176236-buildbot
   r0:00000000    r1:00000000     r2:fffffff8    r3: 00000000
  r12:00000003    lr:90026be9     pc:90037236   psr: 61000000
 cfsr:00000082  hfsr:00000000  mmfar:fffffff8  bfar: fffffff8
rcccsr:00000000
heap allocated: 5591232
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

lr: 0x90026be9 -> 158697
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637

The weird part is that it doesn't seem to crash when compiled or run on my partners device/computer.

I want to understand how to track this kind of crash so here is what I think I understand from reading How to debug a HardFault on an ARM Cortex-M MCU | Interrupt and the devforum:

Seems like the Configurable Fault Status Registers (CFSR) has some value therefor the crash info should be in one of the status registers.

If I convert the CFSR value to binary 0000 0000 1000 0010 there are to bits sets on the first 8 bits so that would indicate that the reason from the crash is specified on the MemManage Status Register (MMFSR)

Based on this image it is a DACCVIOL error, and that the adress that triggered the mem manage fault should be in the MMFAR register.
image

The MMFAR register holds the fffffff8 value but that doesn't seem right ?

Also if I understand correctly the LR and PC registers should hold the address in memory of the executable of the line that caused the crash.

but I get this

?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637
?? at /build/arm-none-eabi-gcc/src/gcc-14.1.0/libgcc/config/arm/ieee754-df.S:637

When running arm-none-eabi-addr2line, I already disabled optimizations but doesn't seem to help to track down what function is causing the crash. So I'm a bit lost on how to continue searching for the crash.

Any guidance is welcome!

PD:

I found this tool: pd-symbolize-crashlog that outputs a friendlier analysis of the crash log.

Crash at 2024/11/13 20:36:56
  BUILD:   bcbf4fed-2.6.0-release.176236-buildbot
  HEAP:    5591232
  General-purpose registers (stack)
  0:       0x000000: ??
  1:       0x000000: ??
  2:       0xfffffff8: ??
  3:       0x000000: ??
  12:      0x000003: ??
  14 LR:   0x90026be9: ??
  15 PC:   0x90037236: ??
  Special registers
  MMFAR:   0xfffffff8: ??
  BFAR:    0xfffffff8: ??
  rcccsr:  0x000000: ??
  CFSR:     0b000000000000000000000010000010 (Configurable Fault Status)
    MMFSR:  0b000000000000000000000010000010 (MemManage Fault Status Register)
       DACCVIOL: Data access violation.
                 The processor attempted a load or store at a location that does not permit the operation.
                 Faulting instruction: see `PC`.
                 Address of the attempted access: see `MMFAR`.

And I'm using this modified version of the symbolizer from @superfunc

import re
import subprocess
import click

"""
    SETUP:
        1. pip3 install click
        2. make sure arm-none-eabi-addr2line is in your $PATH

    USAGE:
        python3 firmware_symbolizer.py crashlog.txt game.elf

"""


@click.command()
@click.argument("crashlog", type=click.Path(exists=True))
@click.argument("elf", type=click.Path(exists=True))
def symbolize(crashlog, elf):
    print("symbolizer")
    cl_contents = open(crashlog, "r").read()

    cl_blocks = re.split(r"\n\n", cl_contents)

    for block in cl_blocks:
        matches = re.search(r"lr:([0-9a-f]{8})\s+pc:([0-9a-f]{8})", block)

        if matches:
            print(block, "\n")

            lr = matches.group(1)
            pc = matches.group(2)

            lr_num = int(lr, 16)
            pc_num = int(pc, 16)

            lr_num = lr_num & 0x0FFFFFFF
            pc_num = pc_num & 0x0FFFFFFF

            print("lr: {} -> {}".format(hex(int(lr, 16)), lr_num))

            lr = hex(lr_num)
            pc = hex(pc_num)

            cmd = f"arm-none-eabi-addr2line -f -i -p -e {elf} 0x{pc} 0x{lr}"
            stack = subprocess.check_output(cmd, shell=True).decode("ASCII")
            print(stack)


if __name__ == "__main__":
    symbolize()

Reading this: Scala Native on the Playdate - #11 by dave

aha! I forgot that in the elf we have it compiled to 0x0 and then we relocate to either 0x6xxx or 0x9xxx at load time. So the correct lookup there is info line *0x1aa4b:

Turns out the symbolizer from the SDK is not taking this in to account, and the one I was using had a bug when passing the arguments to arm-none-eabi-addr2line here is a modified version of the symbolizer

import re
import subprocess
import click

"""
    SETUP:
        1. pip3 install click
        2. make sure arm-none-eabi-addr2line is in your $PATH

    USAGE:
        python3 firmware_symbolizer.py crashlog.txt game.elf

"""


@click.command()
@click.argument("crashlog", type=click.Path(exists=True))
@click.argument("elf", type=click.Path(exists=True))
def symbolize(crashlog, elf):
    cl_contents = open(crashlog, "r").read()

    cl_blocks = re.split(r"\n\n", cl_contents)

    for block in cl_blocks:
        matches = re.search(r"lr:([0-9a-f]{8})\s+pc:([0-9a-f]{8})", block)

        if matches:
            print(block, "\n")

            lr = matches.group(1)
            pc = matches.group(2)

            lr_num = int(lr, 16)
            pc_num = int(pc, 16)

            lr_num = lr_num & 0x0FFFFFFF
            pc_num = pc_num & 0x0FFFFFFF

            print("lr: {} -> {}".format(hex(int(lr, 16)), lr_num))

            lr = hex(lr_num)
            pc = hex(pc_num)

            cmd = f"arm-none-eabi-addr2line -f -i -p -e {elf} {pc} {lr}"
            print(cmd)
            stack = subprocess.check_output(cmd, shell=True).decode("ASCII")
            print(stack)


if __name__ == "__main__":
    symbolize()