Scala Native on the Playdate

Also, I should mention this part: the demo game with the flying square, when written using my functional DSL, ran for 10 seconds at 50fps with Immix. It ran for 5 minutes with no GC.

5 minutes at 50fps is about 15k frames, so it was allocating roughly 978 bytes per frame - sounds about right.

While this score clearly wasn't satisfactory - I want the games to run essentially forever without crashing - it proved that the overhead of whatever Scala does, did not make the game go below the maximum framerate supported by the device. Although the game is almost trivial and far from CPU-intensive, it gives me hope that Scala, and truly functional style, can indeed be used to write games for this platform. And I haven't even started optimizing yet :slight_smile:

The interesting thing is that the Immix crashes in this instance resulted in the console displaying stack overflow in task gameTask when I pressed B for more details. This could be an artifact of some of my mistakes in the SN fork.

1 Like

Update: I saw a printf that I hadn't replaced with a custom log function, and fixing it made some progress on the Immix side: Playdate support by kubukoz · Pull Request #1 · kubukoz/scala-native · GitHub

I can now run 16k allocations 10 times - the last one crashes. 32k allocations still crash the first time though.

1 Like

Here's the current crashlog:

--- crash at 2024/02/04 23:31:58---
build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:90300d00    r1:00000000     r2:90300d00    r3: 00004000
  r12:90300d00    lr:9001aa4b     pc:00000000   psr: 200b0000
 cfsr:00000001  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 16466400
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

This might be useful:

if the address is in the 0x08000000-0x08100000 that's firmware code. Your game will be running in the 0x60000000-0x61000000 range.

from C-based game crashing only on device - #12 by dave.

1 Like

FTR 0x60000000-0x61000000 was on the rev 1 devices. On rev 2 external memory is mapped to 0x90000000-0x91000000. If the $lr register is correct there (and I think it should be, since the IMPRECISEERR flag isn't set in the cfsr register) then that's the calling address, and it's crashing because it's jumping to address 0 (in the $pc register). If you load the pdex.elf file in gdb and do info line *0x9001aa4b it will tell you what source line that's happening at.

1 Like

ah, I was trying that without the asterisk - thank you very much! However...

No line number information available for address 0x9001aa4b

:smiling_face_with_tear:

heap allocated: 16627360

this actually sounds hella suspicious because it's just over 146kb less than the full 16MB. Also much more than I'd expect given I've set the memory size to 14MB... Gonna decrease that one and see what happens.

update: decreased memory size to 8MB, "heap allocated" now says 10022496. I now suspect the OOM handling doesn't do the right thing when I use Immix, and instead of a clean error message I just get a crash.

Here's the binary if you @dave (or anyone else) are able to have a look. Pressing A performs the batch of allocation, at the moment it crashes on the 4th attempt.
HelloWorld.elf.zip (547.3 KB)

New trivia: I added some prints to see what addresses Immix is getting when it calls malloc.

[t=325] Trying to map 3360 bytes of memory
[t=334] Mapped 3360 bytes of memory to 0x900763a0
[t=342] Trying to map 53760 bytes of memory
[t=351] Mapped 53760 bytes of memory to 0x900770d0
[t=358] Trying to map 917516 bytes of memory
[t=366] Mapped 917516 bytes of memory to 0x900842e0
[t=374] Trying to map 14680064 bytes of memory
[t=382] Mapped 14680064 bytes of memory to 0x901642f0

Here's the most recent crashlog with that:

build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:01010101    r1:0000000b     r2:90f64c80    r3: 0000033c
  r12:00084000    lr:900292b3     pc:900292b8   psr: 210d0000
 cfsr:00000082  hfsr:00000000  mmfar:01010105  bfar: 01010105
rcccsr:00000000
heap allocated: 16469056
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

At least this confirms pc isn't in any of these ranges I allocated.

aha! I forgot that in the elf we have it compiled to 0x0 and then we relocate to either 0x6xxx or 0x9xxx at load time. So the correct lookup there is info line *0x1aa4b:

(gdb) info line *0x1aa4b
No line number information available for address 
  0x1aa4b <_SM20scala.DummyImplicit$G4load+18>

And for that one right above you've got $pc at 0x292b8:

(gdb) info line *0x292b8
Line 33 of "dependencies/nativelib_native0.5.0-SNAPSHOT_3-0/scala-native/gc/immix/Marker.c" starts at address 0x292b2 <Marker_Mark+242>
   and ends at 0x292bc <Marker_Mark+252>.
1 Like

ah, so I just skip the 9 prefix when looking at symbols?

This is actually so useful - I've been seeing Marker_Mark start but not complete (according to logs I was able to write) but I wasn't sure if it's actually the culprit.

That should definitely unblock me for some time. Thank you!

Narrowed down to this assertion:

assert(blockMeta == Block_GetBlockMeta(heap->blockMetaStart,
                                               heap->heapStart, lastWord));

in Scala Native's Object.c for immix - it's failing. Example values:

LHS: 0x90075ee8
RHS: 0x90075f08

That's 0x20 (32) of a difference. The size of BlockMeta seems to be 8 bytes, so... off-by-4 error? :sweat_smile:

Somehow changing how I log things made us go back to 9 valid "allocation rounds" and the tenth blowing up...

--- crash at 2024/02/06 00:25:31---
build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:90300d00    r1:00000000     r2:90300d00    r3: 00004000
  r12:90300d00    lr:9001aa15     pc:00000000   psr: 200f0000
 cfsr:00000001  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 16465792
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

It looks like the same thing as before. Initially I didn't realize this so I looked for some docs.

docs: Documentation – Arm Developer

cfsr is 00000001, so here's what the last bit does:

IACCVIOL
Instruction access violation flag:

0
No instruction access violation fault.

1
The processor attempted an instruction fetch from a location that does not permit execution.
This fault occurs on any access to an XN region, even when the MPU is disabled or not present.
When this bit is 1, the PC value stacked for the exception return points to the faulting instruction. The processor has not written a fault address to the MMAR.

so that checks out. Feels good to finally find some hints in the documentation and not in trial-and-error...

Now here's lr:

Link Register
The Link Register (LR) is register R14. It stores the return information for subroutines, function calls, and exceptions. On reset, the processor sets the LR value to 0xFFFFFFFF.

And that'd be our calling code. Here' it's 9001aa15, which I'll infer to be 1aa15 in my compiled code. gdb doesn't know much:

No line number information available for address 0x1aa15 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+76>

but if we desugar this... it's the method my Scala is calling: Array.fill. I have it in my objdump:

0001a9c8 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO>:
   1a9c8: 2d e9 f0 43   push.w  {r4, r5, r6, r7, r8, r9, lr}
   1a9cc: 81 b0         sub     sp, #4
   1a9ce: 00 28         cmp     r0, #0
   1a9d0: 3d d0         beq     0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #122
   1a9d2: 18 46         mov     r0, r3
   1a9d4: 88 46         mov     r8, r1
   1a9d6: 00 29         cmp     r1, #0
   1a9d8: 29 dd         ble     0x1aa2e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x66> @ imm = #82
   1a9da: c0 b3         cbz     r0, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #112
   1a9dc: 14 46         mov     r4, r2
   1a9de: 01 68         ldr     r1, [r0]
   1a9e0: 1d 4a         ldr     r2, [pc, #116]          @ 0x1aa58 <$d.5+0x4>
   1a9e2: 7a 44         add     r2, pc
   1a9e4: 89 68         ldr     r1, [r1, #8]
   1a9e6: d2 f8 00 90   ldr.w   r9, [r2]
   1a9ea: 43 f6 6c 22   movw    r2, #14956
   1a9ee: 09 eb 81 01   add.w   r1, r9, r1, lsl #2
   1a9f2: 8a 58         ldr     r2, [r1, r2]
   1a9f4: 41 46         mov     r1, r8
   1a9f6: 90 47         blx     r2
   1a9f8: 4c b3         cbz     r4, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #82
   1a9fa: 06 46         mov     r6, r0
   1a9fc: 17 48         ldr     r0, [pc, #92]           @ 0x1aa5c <$d.5+0x8>
   1a9fe: 78 44         add     r0, pc
   1aa00: 00 27         movs    r7, #0
   1aa02: 05 68         ldr     r5, [r0]
   1aa04: 20 68         ldr     r0, [r4]
   1aa06: 80 68         ldr     r0, [r0, #8]
   1aa08: 09 eb 80 00   add.w   r0, r9, r0, lsl #2
   1aa0c: d0 f8 2c 1b   ldr.w   r1, [r0, #2860]
   1aa10: 20 46         mov     r0, r4
   1aa12: 88 47         blx     r1
   1aa14: 03 46         mov     r3, r0
   1aa16: 28 46         mov     r0, r5
   1aa18: 31 46         mov     r1, r6
   1aa1a: 3a 46         mov     r2, r7
   1aa1c: e7 f7 a2 ff   bl      0x2964 <_SM27scala.runtime.ScalaRunTime$D12array_updateL16java.lang.ObjectiL16java.lang.ObjectuEO> @ imm = #-98492
   1aa20: 01 37         adds    r7, #1
   1aa22: b8 45         cmp     r8, r7
   1aa24: ee d1         bne     0x1aa04 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x3c> @ imm = #-36
   1aa26: 30 46         mov     r0, r6
   1aa28: 01 b0         add     sp, #4
   1aa2a: bd e8 f0 83   pop.w   {r4, r5, r6, r7, r8, r9, pc}
   1aa2e: 70 b1         cbz     r0, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #28
   1aa30: 01 68         ldr     r1, [r0]
   1aa32: 08 4a         ldr     r2, [pc, #32]           @ 0x1aa54 <$d.5>
   1aa34: 7a 44         add     r2, pc
   1aa36: 89 68         ldr     r1, [r1, #8]
   1aa38: 12 68         ldr     r2, [r2]
   1aa3a: 02 eb 81 01   add.w   r1, r2, r1, lsl #2
   1aa3e: 43 f6 6c 22   movw    r2, #14956
   1aa42: 8a 58         ldr     r2, [r1, r2]
   1aa44: 00 21         movs    r1, #0
   1aa46: 01 b0         add     sp, #4
   1aa48: bd e8 f0 43   pop.w   {r4, r5, r6, r7, r8, r9, lr}
   1aa4c: 10 47         bx      r2
   1aa4e: 00 20         movs    r0, #0
   1aa50: f6 f7 84 fb   bl      0x1115c <_SM34scala.scalanative.runtime.package$D16throwNullPointernEO> @ imm = #-39160

quick zoom at the area around 1aa15:

   1aa10: 20 46         mov     r0, r4
   1aa12: 88 47         blx     r1
   1aa14: 03 46         mov     r3, r0
   1aa16: 28 46         mov     r0, r5
   1aa18: 31 46         mov     r1, r6
   1aa1a: 3a 46         mov     r2, r7
   1aa1c: e7 f7 a2 ff   bl      0x2964 <_SM27scala.runtime.ScalaRunTime$D12array_updateL16java.lang.ObjectiL16java.lang.ObjectuEO> @ imm = #-98492

Thing is, 1aa15 is in the middle of an instruction: the byte 46. If I understand anything, you can't just split an instruction like that because 03 46 is just the code for a copy between these particular registers. And how is that even performing a jump to 0x00?

End of day summary: I'm in a state where nothing really crashes, but I have asserts failing in the GC code. This is kinda good news, because I have a direct contact to the SN maintainer who knows that code :sweat_smile:

It appears that the GC's object-marking code is failing an assertion that objectSize < blockSize. Block size is 8192, and objectSize is whatever I have the array set to... and it sounds like the array shouldn't even be getting to that place.

1 Like

This is fantastic! I absolutely adore Scala and would love to be able to use it on the Playdate. I'm definitely keeping my eye on this thread :slight_smile:

Last week I met with Wojciech Mazur (the maintainer of Scala Native), we tried a couple changes in SN, which increased the amount of allocations I'm able to do with GC on, without breaking the game.

One important note was that we should preallocate a heap ahead of time, and not allow it to grow: because we're only able to use malloc and not mmap, and the heap segments are supposed to be located next to each other in memory (at least from the program's POV), heap allocation is just asking for trouble and reading some arbitrary memory instead of what was allocated.

So, we're now allocating around 10MB if my memory serves me. This worked for 20 rounds of 128k allocations or so.

Fast forward to today - I saw that Wojciech got a couple GC-related changes to the main branch, including fix: Try to stablize GC by WojciechMazur · Pull Request #3767 · scala-native/scala-native · GitHub, I tried them out and... it seems to just... work like a charm?

I can do 256k rounds of allocations, as often as I want (one round takes several seconds because I assume the hardware isn't that fast), and I see only ~87 blocks (out of the 283 allocated) are getting used. 512k allocations hang the game loop for over 10 seconds so the game crashes - but I don't think I'd need that many anyways :wink:

Going back to lower numbers like 16k allocations seems to clear up all the blocks. At this point I think it's safe to say that the GC is working much much better than before.

I would love to get stack traces to work (with libunwind), but given the time constraints (T minus 34 days for the conference talk), I think the bigger priority is the actual game code. Perhaps I'll try the bindings once more.

1 Like

Scala DSL going well so far:

object MainGame {
  val ratWidth = 32
  val ratHeight = 32
  val ratMarginX = 20
  val ratMarginY = 20

  def config: GameConfig = GameConfig(fps = 50)

  def init(ctx: GameContext): Resource[GameState] = Assets.bitmap("arrow.png").map { arrow =>
    GameState(
      rat = Rat(
        y = ctx.screen.height / 2 - ratHeight / 2,
        rotation = Radians(0),
      ),
      assets = Assets(
        arrow = arrow
      ),
    )
  }

  def update(ctx: GameContext): GameState => GameState = {

    val rotateRat: GameState => GameState =
      state => {
        val newRotation =
          (
            state.rat.rotation + Radians.fromDegrees(ctx.crank.change)
          )
            .clamp(
              min = Radians.fromDegrees(-60),
              max = Radians.fromDegrees(60),
            )
        state.copy(rat = state.rat.copy(rotation = newRotation))
      }

    val moveRat: GameState => GameState =
      state => {
        val newY = (state.rat.y + Math.sin(state.rat.rotation.value) * ctx.delta * 300)
          .clamp(20, ctx.screen.height - ratHeight - ratMarginY)

        state.copy(rat = state.rat.copy(y = newY.toFloat))
      }

    val equalizeRat: GameState => GameState =
      state => {
        val newRotation =
          if state.rat.y == ratMarginY || state
              .rat
              .y == ctx.screen.height - ratHeight - ratMarginY
          then state.rat.rotation * 0.9
          else state.rat.rotation

        state.copy(rat = state.rat.copy(rotation = newRotation))
      }

    Function.chain(
      List(
        rotateRat,
        moveRat,
        equalizeRat,
      )
    )
  }

  def render(state: GameState): Render = {
    import Render._

    val rat = Render.Bitmap(
      x = ratMarginX + ratWidth / 2,
      y = state.rat.y.toInt + ratHeight / 2,
      bitmap = state.assets.arrow,
      rotation = state.rat.rotation,
      centerX = 0.5,
      centerY = 0.5,
      xscale = 1.0,
      yscale = 1.0,
    )
    // .rotated(state.szczur.rotation)

    val debug = Render.Text(
      x = 10,
      y = 10,
      s"Rotation: ${state.rat.rotation.value}, y: ${state.rat.y}",
    )

    Clear(Color.White) |+|
      FPS(0, 0) |+|
      rat |+|
      debug
  }

}

Game development now happening here: GitHub - kubukoz/demos at wroclaw-rat-game - it's a branch made from the starting point of the allocation demo from earlier.

and here's a demo of the current state. There's sound when you score, but I couldn't have that in a gif.

szczur-rec

I've made the similar project, but for Java 1.6.
It's is perfectly works on Simulator and it's is good to prototyping any kind of games (for me ofcoz).
The next step is working with Java on device (currently I wait the shipment).
Thanks for explanation of your's Native Scala modifications, I suppose it would helps me in future.

Interesting. How did you get a JVM to run in a dylib? Something like this? Calling Java From C | John's Blog

Yeah, indeed.
JNI is an easiest way to make a link between C and Java.