Scala Native on the Playdate

Hi! I've been working on this in my spare time since mid-December and thought I'd write down some of my findings.

Why & what

My background is mostly functional programming in Scala. I've been using the language for over 8 years now, and I feel very confident and productive with it: all the nice features like ADTs, lambdas, a rich collections library, and the convenience of a garbage collector, make it a pleasant experience. I could go on for hours.

When I saw the Playdate, I knew I wanted one - and I was also interested in possibly making games for it - but the SDK choice (C, Lua) didn't satisfy me:

  • I find C to be pretty hard to maintain, and very easy to make mistakes with: unions, manual memory management, relatively weak typing, and so on.
  • Lua doesn't have static typing (which might be solved by some of its dialects, or type comments), and also I know literally nothing about it (I do know a bit about C from my short time at university). I intend to learn it one day, but I'm hoping this is not the day! (you already know I really like Scala)

Naturally, I wanted to see if I can run Scala on the PD. Well, can I?

Hard facts

Scala's de-facto-default runtime is the JVM. There's no way I can run a whole JVM on a Playdate - and still have cycles left for an actual game - so that's simply out of the question. However, there's this thing called Scala Native that has been in development for almost a decade now, and it's become pretty stable on desktop platforms (x86, aarch64).

So, let's look at today's state of Scala Native and compare it to what the Playdate offers:

SN:

  • builds via LLVM/clang
  • Supports 64-bit platforms out of the box (Mac, Linux, Windows)
  • 32-bit support available in the 0.5 series, which is unstable
  • 4 garbage collection modes:
    • no GC: pre-allocate a chunk of memory and exit when it's full
    • immix (default)
    • commix
    • boehm (using a dynamically linked libgc)
  • requires some C++ libraries, e.g. for exception handling

Playdate:

  • 32-bit ARM Cortex-M7
  • 16MB memory (might not all be available to games)
  • single-threaded game runtime
  • limited access to libc (as far as I understand, some things are missing from the toolchain)
  • build toolchain is based on GCC. For the simulator, on Mac, I can use clang instead (which makes things easier).

Disclaimer: in case this isn't clear yet, I lived most of my life on the JVM and I'm a native noob. Anything I say about non-JVM things may be complete BS.

The plan

I went through a number of things that failed, but if I were to retroactively write down the plan it would be this:

  1. Generate API bindings
  2. Run a Scala game in the simulator
  3. Run a Scala game on the device, with no GC
  4. Add GC
  5. Build a functional API/engine to make it all worth it

1. Generate API bindings

This was pretty easy: I used sn-bindgen to generate the initial bindings, then copy-pasted them to my project. There were two unnamed structs/unions that I had to give names to, but in the end it was a pretty trouble-free process.

We'll get back to this though.

2. Run a Scala game in the simulator

Surprisingly, this was relatively trouble-free part as well. The code is here.

Some important things worth mentioning at this step:

  1. build target

To build a game for the PD, you have to emit a dynamic library that implements a common interface (the event handler). This means that I can't package my Scala code as an executable - it has to be a library instead.

Thing is, when Scala Native runs as a library, you have to call a special method - ScalaNativeInit() - before making any other Scala calls. That means I still have to wrap my game in some C, even if just to run that function and then proxy to the rest.

So, do I build as a static or dynamic library? In the simulator, we can use shared (dynamically linked) libraries, because it's simply running on our computer's hardware natively (hence it being a simulator and not an emulator). However, the device doesn't currently have a dynamic loader, so I wanted to build static even for the simulator - just to be prepared and prove it'll work.

Thankfully, this is pretty easy with Scala Native:

// in build.sbt
nativeConfig ~= (
  _.withBuildTarget(BuildTarget.libraryStatic)
)

I ended up with this C code:

#include "pd_api.h"
#include "demo.h"

int eventHandler(PlaydateAPI *pd, PDSystemEvent event, uint32_t arg)
{
	if (event == kEventInit)
	{
		ScalaNativeInit();
	}

	return sn_event(pd, event, arg);
}

which calls my exported sn_event function: (full code)

// skipped imports for brevity
object Main {

  @exported("sn_event")
  def event(
    pd: Ptr[PlaydateAPI],
    event: PDSystemEvent,
    arg: UInt,
  ): Int = {

    val f: CFuncPtr1[Ptr[Byte], CInt] = update

    val ptr: Ptr[PDCallbackFunction] = CFuncPtr.toPtr(f).asInstanceOf[Ptr[PDCallbackFunction]]

    if (event == kEventInit)
      (!(!pd).system).setUpdateCallback(ptr, pd.asInstanceOf[Ptr[Byte]])

    0
  }

  def update(
    arg: Ptr[Byte]
  ): Int = {
    val pd = arg.asInstanceOf[Ptr[PlaydateAPI]]

    //the usage is actually pretty boring...

    1
  }

}

And it works!

For the C part of this build, I'm using a custom script - when I was at this stage I didn't grasp CMake/Make that well.

#!/bin/bash

set -euo pipefail

mkdir -p build
mkdir -p build/dep

PLAYDATE_SDK="/Users/kubukoz/Developer/PlaydateSDK"

BASEDIR=$(dirname "$0")
clang -g -g -dynamiclib -rdynamic \
  -lm \
  -DTARGET_SIMULATOR=1 \
  -DTARGET_EXTENSION=1 \
  -I . \
  -I $PLAYDATE_SDK/C_API \
  -I "$BASEDIR/../lib" \
  "$BASEDIR/../app/.native/target/scala-3.3.1/libdemo-out.a" \
  -Wl,--no-demangle \
  -l c++ \
  -o "$BASEDIR/build/pdex.dylib" \
  "$BASEDIR/src/main.c" \
  "$PLAYDATE_SDK/C_API/buildsupport/setup.c"

cp "$BASEDIR/build/pdex.dylib" Source
$PLAYDATE_SDK/bin/pdc "$BASEDIR/Source" "$BASEDIR/HelloWorld.pdx"

open "$BASEDIR/HelloWorld.pdx"

Needless to say, this isn't portable unless you change the PLAYDATE_SDK path to whatever it is for you.

3. Run a Scala game on the device, with no GC

This was much more work. I had to fork scala-native and make quite a bit of changes.

Because there's C++ stuff involved in running Scala Native, I figured I would start with playdate-cpp - it was definitely a useful asset, and to this day I'm relying on it. Hopefully, in the future I can make my build standalone, without having to depend on this - I don't need the full power of C++, most likely just a bunch of stubs will suffice.

Here's a non-exhaustive list of what I've done to make it compile, link and run, in the SN fork itself (diff at the time of writing):

  • get rid of some atomic numeric operations (no need for atomics in single-threaded environments)
  • add some dummy implementations for things that were needed at link time, but weren't called: this was mostly done by piggybacking on what was done for Windows (based on #ifdef TARGET_PLAYDATE)
  • get rid of pwd/cwd, and some more system stuff needed to implement java.lang.System
  • hardcode a single thread identifier in the Thread class
  • make a NativeThread implementation, so that the main thread can be instantiated. Most methods here are stubbed with System.exit with various codes, so that I'd see which one I need to implement if it happens to be called
  • disable delimited continuation support (making them work was way above my skill level and it's just an experimental SN feature I didn't need)
  • replace any fprintf(stderr with calls to PD's logging function
  • hardcode the total memory size (relevant for both GC and non-GC work). 16MB was crashing, 14MB seems fine for the time being.
  • disable the usage of libunwind (again, way over my skill level at the moment, and I'm fine not having stack traces for now) in functions like StackTrace_PrintStackTrace
  • replace mmap (used for memory allocation in both GC and non-GC) with a malloc. This sounds illegal, but seems to work
  • replace the uncaught exception handler with something that only prints its message
  • hardcode clock_gettime to 0

Note: I'm pretty sure some of these things are wrong or unnecessary, but this is not the time to start cleaning up. Well, I did already clean things up a bit, but this is as far as I want to get before the functionality matches my expectations.

More trouble: bindings

Remember how I said bindings worked fine in the simulator? Well, the simulator was running on a 64-bit CPU. Turns out sn-bindgen doesn't support that yet, and it sure as hell doesn't support generating bindings for a platform other than the build platform.

As a workaround, I fixed the relevant places myself (with a bit of help from a fork of sn-bindgen hardcoded to 32 bits, although that wasn't enough). Still, the API seem to crash the game in some ways, so I'm only using some numeric/enum types, and the rest is getting proxied via C.

Back to the hacks

Here's the nativeConfig for building the device-compatible static library:

nativeConfig ~= (
  _.withBuildTarget(BuildTarget.libraryStatic)
    .withTargetTriple("arm-none-eabi")
    .withGC(GC.none)
    .withCompileOptions(
      Seq(
        "-g3",
        "-mthumb",
        "-mcpu=cortex-m7",
        "-mfloat-abi=hard",
        "-mfpu=fpv5-sp-d16",
        "-D__FPU_USED=1",
        "-O2",
        "-falign-functions=16",
        "-fomit-frame-pointer",
        "-gdwarf-2",
        "-fverbose-asm",
        "-Wdouble-promotion",
        "-fno-common",
        "-ffunction-sections",
        "-fdata-sections",
        "-DTARGET_PLAYDATE=1",
        "-DTARGET_EXTENSION=1",
        "-DDEBUG_PRINT=1",
        "-D_LIBCPP_HAS_THREAD_API_PTHREAD=1",
        "-MD",
        "-MP",
        s"-I${playdateSdk / "C_API"}",
        "-march=armv7-m",
        "-m32",
        // "-v",
      )
    )
    .withMultithreadingSupport(false)
)

Again, some of this might be redundant, but it works - with no GC.

Fast forward: current state

Current state of the game

Current state of the SN fork

In mid-January, I was able to run a game with the logic written fully in Scala - with the C proxies I mentioned above, for the Playdate API calls. You can see a demo video here: Jakub Kozłowski 🐀: "It runs! Scala Native on the Playdate! #playdated…" - Mastodon Party

After that, I was trying to get GC to work. Immix had some trouble, and I considered switching to Boehm (I was told by SN devs that it's simpler), but I didn't have the time to invest into building it for the target platform - and as a statically linked library, no less.

Next steps

I want to go back to some basics: instead of implementing the whole game in Scala, I'll actually minimize that part, and focus on getting Immix to not crash the first time it tries to garbage collect.

In the future, I'd also love to have exception support - but this is not something I urgently need.

I'm also giving a talk about this whole process, as well as the functional game engine, on Scalar (Warsaw, Poland, March 21-22 2024).

5 Likes

Like I said, I scaled down the Scala part for now:

@exported("foo")
def foo(n: Int): Int = {
  Array.ofDim[Int](n)
  42
}

This is my closest approximation at a controllable amount of allocations.

The update loop in C now calls that with the current number (configurable with the arrow keys):

if (pressed & kButtonA)
{
    pd->system->logToConsole("Calling Scala with %d allocations", allocationCount);
    foo(allocationCount);
}

The results with GC=none, max memory set to 14M in getMemorySize(), and allocation count = 256K are:

[t=312] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
Calling Scala with 262144 allocations
[t=5738] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=6728] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=7554] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=8345] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=9204] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=10061] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=10854] Trying to map 2097152 bytes of memory
[t=10860] Failed to map memory
[t=10865] Out of heap space
exited with code 1.

The [t=...] thing is my custom logging called from SN's allocation code.

So: we ran 262144*8 (2M) allocations and crashed. SN grabs 2MB for starters (DEFAULT_CHUNK_SIZE set by me to "2M", makes sense) and grows its heap by 2 more megs, 7 times - in the last one, the malloc returns null so it exits with code 1. (apparently fprintf to stderr actually works??)

That seems reasonable. Looks like an array of 256k ints takes 2MB of memory, 8 bytes per item on average - strange, but... okay, predictable. Actually, I'm not sure that the getMemorySize() setting actually matters here, it appears that the malloc gets called regardless of that, until it eventually fails as we reach close to 16MB.

What I find strange is that it breaks down the first time I try to allocate 512 elements:

[t=312] Trying to map 2097152 bytes of memory
Calling Scala with 524288 allocations
[t=3361] Trying to map 2097152 bytes of memory
[t=3367] Trying to map 2097152 bytes of memory
[t=3372] Trying to map 2097152 bytes of memory
[t=3377] Trying to map 2097152 bytes of memory
[t=3383] Trying to map 2097152 bytes of memory
[t=3388] Trying to map 2097152 bytes of memory
[t=3393] Trying to map 2097152 bytes of memory
[t=3398] Failed to map memory
[t=3404] Out of heap space
exited with code 1.

So let me get this straight: 256k elements take 2MB every time and we can do it 7 times safely, but twice the usual amount crashes the first time we try? I have no clue.

let's see Paul Allen's Immix's logs. Max memory still set to 14M, all the other settings are left unchanged IIRC. Starting with allocation count = 1M (256K took forever to crash):

t=324] Trying to map 3360 bytes of memory
[t=330] Trying to map 53760 bytes of memory
[t=335] Trying to map 917516 bytes of memory
[t=341] Trying to map 14680064 bytes of memory
[t=346] mark_time_ns,nullify_time_ns,sweep_time_ns
Calling Scala with 1048576 allocations
[t=4186] 
Collect
[t=4193] 
Block count: 30
[t=4198] Unavailable: 15
[t=4203] Free: 15
[t=4209] Recycled: 0
[t=4214] Growing heap by 1048560 bytes, to 2097136 bytes
End collect
[t=4221] Growing heap by 8947712 bytes, to 11044848 bytes
Calling Scala with 1048576 allocations
[t=10879] 
Collect
[t=10885] 
Block count: 316
[t=10891] Unavailable: 15
[t=10896] Free: 301
[t=10901] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=11446] 
Collect
[t=11459] 
Block count: 316
[t=11465] Unavailable: 15
[t=11470] Free: 301
[t=11475] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=11984] 
Collect
[t=11991] 
Block count: 316
[t=11996] Unavailable: 15
[t=12001] Free: 301
[t=12006] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=12550] 
Collect
[t=12556] 
Block count: 316
[t=12562] Unavailable: 15
[t=12567] Free: 301
[t=12573] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=12921] 
Collect
[t=12927] 
Block count: 316
[t=12933] Unavailable: 15
[t=12939] Free: 301
[t=12944] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=13157] 
Collect
[t=13164] 
Block count: 316
[t=13169] Unavailable: 15
[t=13174] Free: 301
[t=13180] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=13722] 
Collect
[t=13729] 
Block count: 316
[t=13737] Unavailable: 15
[t=13742] Free: 301
[t=13747] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=14028] 
Collect
[t=14034] 
Block count: 316
[t=14040] Unavailable: 15
[t=14046] Free: 301
[t=14051] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=14635] 
Collect
[t=14643] 
Block count: 316
[t=14648] Unavailable: 15
[t=14653] Free: 301
[t=14659] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=14937] 
Collect
[t=14944] 
Block count: 316
[t=14950] Unavailable: 15
[t=14954] Free: 301
[t=14960] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=15215] 
Collect
[t=15222] 
Block count: 316
[t=15227] Unavailable: 15
[t=15232] Free: 301
[t=15237] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=15459] 
Collect
[t=15466] 
Block count: 316
[t=15472] Unavailable: 15
[t=15477] Free: 301
[t=15483] Recycled: 0
End collect
Calling Scala with 1048576 allocations
[t=15826] 
Collect
[t=15833] 
Block count: 316
[t=15839] Unavailable: 15
[t=15849] Free: 301
[t=15854] Recycled: 0
End collect

And it just... keeps working. Looks like it grew its heap all the way up to over 10.5MB, and just stays there. Apparently, large allocations (1M items being around 1MB) are being recycled well. That sounds promising...

Let's try something more challenging - smaller arrays, but with actual objects in them.

class Foo(i: Int)

@exported("foo")
def foo(n: Int): Int = {
  Array.fill[Foo](n)(new Foo(n))
  42
}

We'll be gentle this time - 16k allocations at a time. Almost as if I know what's going to happen if we use more.

[t=316] Trying to map 3360 bytes of memory
[t=323] Trying to map 53760 bytes of memory
[t=328] Trying to map 917516 bytes of memory
[t=334] Trying to map 14680064 bytes of memory
[t=340] mark_time_ns,nullify_time_ns,sweep_time_ns
Calling Scala with 16384 allocations
Calling Scala with 16384 allocations
[t=4984] 
Collect
[t=5042] 
Block count: 30
[t=5049] Unavailable: 15
[t=5054] Free: 15
[t=5060] Recycled: 0
[t=5065] Growing heap by 1048560 bytes, to 2097136 bytes
End collect
Calling Scala with 16384 allocations
Calling Scala with 16384 allocations
[t=8395] 
Collect
<crash with e0, no useful logs>

What's interesting is that going with 32k fail immediately:

[t=325] Trying to map 3360 bytes of memory
[t=333] Trying to map 53760 bytes of memory
[t=339] Trying to map 917516 bytes of memory
[t=344] Trying to map 14680064 bytes of memory
[t=349] mark_time_ns,nullify_time_ns,sweep_time_ns
Calling Scala with 32768 allocations
[t=2736] 
Collect

Kind of like in the non-GC case of int arrays. Meanwhile, GC=none used with classes, 256k allocations at a time, works 3 times and fails after the fourth (exit code 1):

Calling Scala with 262144 allocations
[t=122722] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=123271] Trying to map 2097152 bytes of memory
[t=123715] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=124160] Trying to map 2097152 bytes of memory
[t=124602] Trying to map 2097152 bytes of memory
Calling Scala with 262144 allocations
[t=125047] Trying to map 2097152 bytes of memory
[t=125493] Trying to map 2097152 bytes of memory
[t=125499] Failed to map memory
[t=125504] Out of heap space
exited with code 1.

So, it appears that our memory usage per allocation has doubled - we need two chunks (4MB in total) to handle 256k items, averaging at 16 bytes per item.


I just remembered - João Costa (JD557) has previously told me about a bug that affects Immix in its handling of small objects, it might apply here: GC not freeing short living variables under some circumstances · Issue #2436 · scala-native/scala-native · GitHub

I'll try to process all this and figure out the next step.

Another idea he suggested was to set the min and max heap to the same value, ensuring it never grows - I don't have any apps competing for this memory, so might as well claim it ahead of time. Might play with this idea later on.

2 Likes

Also, I should mention this part: the demo game with the flying square, when written using my functional DSL, ran for 10 seconds at 50fps with Immix. It ran for 5 minutes with no GC.

5 minutes at 50fps is about 15k frames, so it was allocating roughly 978 bytes per frame - sounds about right.

While this score clearly wasn't satisfactory - I want the games to run essentially forever without crashing - it proved that the overhead of whatever Scala does, did not make the game go below the maximum framerate supported by the device. Although the game is almost trivial and far from CPU-intensive, it gives me hope that Scala, and truly functional style, can indeed be used to write games for this platform. And I haven't even started optimizing yet :slight_smile:

The interesting thing is that the Immix crashes in this instance resulted in the console displaying stack overflow in task gameTask when I pressed B for more details. This could be an artifact of some of my mistakes in the SN fork.

1 Like

Update: I saw a printf that I hadn't replaced with a custom log function, and fixing it made some progress on the Immix side: Playdate support by kubukoz · Pull Request #1 · kubukoz/scala-native · GitHub

I can now run 16k allocations 10 times - the last one crashes. 32k allocations still crash the first time though.

1 Like

Here's the current crashlog:

--- crash at 2024/02/04 23:31:58---
build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:90300d00    r1:00000000     r2:90300d00    r3: 00004000
  r12:90300d00    lr:9001aa4b     pc:00000000   psr: 200b0000
 cfsr:00000001  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 16466400
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

This might be useful:

if the address is in the 0x08000000-0x08100000 that's firmware code. Your game will be running in the 0x60000000-0x61000000 range.

from C-based game crashing only on device - #12 by dave.

1 Like

FTR 0x60000000-0x61000000 was on the rev 1 devices. On rev 2 external memory is mapped to 0x90000000-0x91000000. If the $lr register is correct there (and I think it should be, since the IMPRECISEERR flag isn't set in the cfsr register) then that's the calling address, and it's crashing because it's jumping to address 0 (in the $pc register). If you load the pdex.elf file in gdb and do info line *0x9001aa4b it will tell you what source line that's happening at.

1 Like

ah, I was trying that without the asterisk - thank you very much! However...

No line number information available for address 0x9001aa4b

:smiling_face_with_tear:

heap allocated: 16627360

this actually sounds hella suspicious because it's just over 146kb less than the full 16MB. Also much more than I'd expect given I've set the memory size to 14MB... Gonna decrease that one and see what happens.

update: decreased memory size to 8MB, "heap allocated" now says 10022496. I now suspect the OOM handling doesn't do the right thing when I use Immix, and instead of a clean error message I just get a crash.

Here's the binary if you @dave (or anyone else) are able to have a look. Pressing A performs the batch of allocation, at the moment it crashes on the 4th attempt.
HelloWorld.elf.zip (547.3 KB)

New trivia: I added some prints to see what addresses Immix is getting when it calls malloc.

[t=325] Trying to map 3360 bytes of memory
[t=334] Mapped 3360 bytes of memory to 0x900763a0
[t=342] Trying to map 53760 bytes of memory
[t=351] Mapped 53760 bytes of memory to 0x900770d0
[t=358] Trying to map 917516 bytes of memory
[t=366] Mapped 917516 bytes of memory to 0x900842e0
[t=374] Trying to map 14680064 bytes of memory
[t=382] Mapped 14680064 bytes of memory to 0x901642f0

Here's the most recent crashlog with that:

build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:01010101    r1:0000000b     r2:90f64c80    r3: 0000033c
  r12:00084000    lr:900292b3     pc:900292b8   psr: 210d0000
 cfsr:00000082  hfsr:00000000  mmfar:01010105  bfar: 01010105
rcccsr:00000000
heap allocated: 16469056
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

At least this confirms pc isn't in any of these ranges I allocated.

aha! I forgot that in the elf we have it compiled to 0x0 and then we relocate to either 0x6xxx or 0x9xxx at load time. So the correct lookup there is info line *0x1aa4b:

(gdb) info line *0x1aa4b
No line number information available for address 
  0x1aa4b <_SM20scala.DummyImplicit$G4load+18>

And for that one right above you've got $pc at 0x292b8:

(gdb) info line *0x292b8
Line 33 of "dependencies/nativelib_native0.5.0-SNAPSHOT_3-0/scala-native/gc/immix/Marker.c" starts at address 0x292b2 <Marker_Mark+242>
   and ends at 0x292bc <Marker_Mark+252>.
1 Like

ah, so I just skip the 9 prefix when looking at symbols?

This is actually so useful - I've been seeing Marker_Mark start but not complete (according to logs I was able to write) but I wasn't sure if it's actually the culprit.

That should definitely unblock me for some time. Thank you!

Narrowed down to this assertion:

assert(blockMeta == Block_GetBlockMeta(heap->blockMetaStart,
                                               heap->heapStart, lastWord));

in Scala Native's Object.c for immix - it's failing. Example values:

LHS: 0x90075ee8
RHS: 0x90075f08

That's 0x20 (32) of a difference. The size of BlockMeta seems to be 8 bytes, so... off-by-4 error? :sweat_smile:

Somehow changing how I log things made us go back to 9 valid "allocation rounds" and the tenth blowing up...

--- crash at 2024/02/06 00:25:31---
build:9c92a2f1-2.2.0-release.163717-buildbot
   r0:90300d00    r1:00000000     r2:90300d00    r3: 00004000
  r12:90300d00    lr:9001aa15     pc:00000000   psr: 200f0000
 cfsr:00000001  hfsr:00000000  mmfar:00000000  bfar: 00000000
rcccsr:00000000
heap allocated: 16465792
Lua totalbytes=0 GCdebt=0 GCestimate=0 stacksize=0

It looks like the same thing as before. Initially I didn't realize this so I looked for some docs.

docs: Documentation – Arm Developer

cfsr is 00000001, so here's what the last bit does:

IACCVIOL
Instruction access violation flag:

0
No instruction access violation fault.

1
The processor attempted an instruction fetch from a location that does not permit execution.
This fault occurs on any access to an XN region, even when the MPU is disabled or not present.
When this bit is 1, the PC value stacked for the exception return points to the faulting instruction. The processor has not written a fault address to the MMAR.

so that checks out. Feels good to finally find some hints in the documentation and not in trial-and-error...

Now here's lr:

Link Register
The Link Register (LR) is register R14. It stores the return information for subroutines, function calls, and exceptions. On reset, the processor sets the LR value to 0xFFFFFFFF.

And that'd be our calling code. Here' it's 9001aa15, which I'll infer to be 1aa15 in my compiled code. gdb doesn't know much:

No line number information available for address 0x1aa15 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+76>

but if we desugar this... it's the method my Scala is calling: Array.fill. I have it in my objdump:

0001a9c8 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO>:
   1a9c8: 2d e9 f0 43   push.w  {r4, r5, r6, r7, r8, r9, lr}
   1a9cc: 81 b0         sub     sp, #4
   1a9ce: 00 28         cmp     r0, #0
   1a9d0: 3d d0         beq     0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #122
   1a9d2: 18 46         mov     r0, r3
   1a9d4: 88 46         mov     r8, r1
   1a9d6: 00 29         cmp     r1, #0
   1a9d8: 29 dd         ble     0x1aa2e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x66> @ imm = #82
   1a9da: c0 b3         cbz     r0, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #112
   1a9dc: 14 46         mov     r4, r2
   1a9de: 01 68         ldr     r1, [r0]
   1a9e0: 1d 4a         ldr     r2, [pc, #116]          @ 0x1aa58 <$d.5+0x4>
   1a9e2: 7a 44         add     r2, pc
   1a9e4: 89 68         ldr     r1, [r1, #8]
   1a9e6: d2 f8 00 90   ldr.w   r9, [r2]
   1a9ea: 43 f6 6c 22   movw    r2, #14956
   1a9ee: 09 eb 81 01   add.w   r1, r9, r1, lsl #2
   1a9f2: 8a 58         ldr     r2, [r1, r2]
   1a9f4: 41 46         mov     r1, r8
   1a9f6: 90 47         blx     r2
   1a9f8: 4c b3         cbz     r4, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #82
   1a9fa: 06 46         mov     r6, r0
   1a9fc: 17 48         ldr     r0, [pc, #92]           @ 0x1aa5c <$d.5+0x8>
   1a9fe: 78 44         add     r0, pc
   1aa00: 00 27         movs    r7, #0
   1aa02: 05 68         ldr     r5, [r0]
   1aa04: 20 68         ldr     r0, [r4]
   1aa06: 80 68         ldr     r0, [r0, #8]
   1aa08: 09 eb 80 00   add.w   r0, r9, r0, lsl #2
   1aa0c: d0 f8 2c 1b   ldr.w   r1, [r0, #2860]
   1aa10: 20 46         mov     r0, r4
   1aa12: 88 47         blx     r1
   1aa14: 03 46         mov     r3, r0
   1aa16: 28 46         mov     r0, r5
   1aa18: 31 46         mov     r1, r6
   1aa1a: 3a 46         mov     r2, r7
   1aa1c: e7 f7 a2 ff   bl      0x2964 <_SM27scala.runtime.ScalaRunTime$D12array_updateL16java.lang.ObjectiL16java.lang.ObjectuEO> @ imm = #-98492
   1aa20: 01 37         adds    r7, #1
   1aa22: b8 45         cmp     r8, r7
   1aa24: ee d1         bne     0x1aa04 <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x3c> @ imm = #-36
   1aa26: 30 46         mov     r0, r6
   1aa28: 01 b0         add     sp, #4
   1aa2a: bd e8 f0 83   pop.w   {r4, r5, r6, r7, r8, r9, pc}
   1aa2e: 70 b1         cbz     r0, 0x1aa4e <_SM12scala.Array$D4filliL15scala.Function0L22scala.reflect.ClassTagL16java.lang.ObjectEO+0x86> @ imm = #28
   1aa30: 01 68         ldr     r1, [r0]
   1aa32: 08 4a         ldr     r2, [pc, #32]           @ 0x1aa54 <$d.5>
   1aa34: 7a 44         add     r2, pc
   1aa36: 89 68         ldr     r1, [r1, #8]
   1aa38: 12 68         ldr     r2, [r2]
   1aa3a: 02 eb 81 01   add.w   r1, r2, r1, lsl #2
   1aa3e: 43 f6 6c 22   movw    r2, #14956
   1aa42: 8a 58         ldr     r2, [r1, r2]
   1aa44: 00 21         movs    r1, #0
   1aa46: 01 b0         add     sp, #4
   1aa48: bd e8 f0 43   pop.w   {r4, r5, r6, r7, r8, r9, lr}
   1aa4c: 10 47         bx      r2
   1aa4e: 00 20         movs    r0, #0
   1aa50: f6 f7 84 fb   bl      0x1115c <_SM34scala.scalanative.runtime.package$D16throwNullPointernEO> @ imm = #-39160

quick zoom at the area around 1aa15:

   1aa10: 20 46         mov     r0, r4
   1aa12: 88 47         blx     r1
   1aa14: 03 46         mov     r3, r0
   1aa16: 28 46         mov     r0, r5
   1aa18: 31 46         mov     r1, r6
   1aa1a: 3a 46         mov     r2, r7
   1aa1c: e7 f7 a2 ff   bl      0x2964 <_SM27scala.runtime.ScalaRunTime$D12array_updateL16java.lang.ObjectiL16java.lang.ObjectuEO> @ imm = #-98492

Thing is, 1aa15 is in the middle of an instruction: the byte 46. If I understand anything, you can't just split an instruction like that because 03 46 is just the code for a copy between these particular registers. And how is that even performing a jump to 0x00?

End of day summary: I'm in a state where nothing really crashes, but I have asserts failing in the GC code. This is kinda good news, because I have a direct contact to the SN maintainer who knows that code :sweat_smile:

It appears that the GC's object-marking code is failing an assertion that objectSize < blockSize. Block size is 8192, and objectSize is whatever I have the array set to... and it sounds like the array shouldn't even be getting to that place.

1 Like

This is fantastic! I absolutely adore Scala and would love to be able to use it on the Playdate. I'm definitely keeping my eye on this thread :slight_smile:

Last week I met with Wojciech Mazur (the maintainer of Scala Native), we tried a couple changes in SN, which increased the amount of allocations I'm able to do with GC on, without breaking the game.

One important note was that we should preallocate a heap ahead of time, and not allow it to grow: because we're only able to use malloc and not mmap, and the heap segments are supposed to be located next to each other in memory (at least from the program's POV), heap allocation is just asking for trouble and reading some arbitrary memory instead of what was allocated.

So, we're now allocating around 10MB if my memory serves me. This worked for 20 rounds of 128k allocations or so.

Fast forward to today - I saw that Wojciech got a couple GC-related changes to the main branch, including fix: Try to stablize GC by WojciechMazur · Pull Request #3767 · scala-native/scala-native · GitHub, I tried them out and... it seems to just... work like a charm?

I can do 256k rounds of allocations, as often as I want (one round takes several seconds because I assume the hardware isn't that fast), and I see only ~87 blocks (out of the 283 allocated) are getting used. 512k allocations hang the game loop for over 10 seconds so the game crashes - but I don't think I'd need that many anyways :wink:

Going back to lower numbers like 16k allocations seems to clear up all the blocks. At this point I think it's safe to say that the GC is working much much better than before.

I would love to get stack traces to work (with libunwind), but given the time constraints (T minus 34 days for the conference talk), I think the bigger priority is the actual game code. Perhaps I'll try the bindings once more.

1 Like

Scala DSL going well so far:

object MainGame {
  val ratWidth = 32
  val ratHeight = 32
  val ratMarginX = 20
  val ratMarginY = 20

  def config: GameConfig = GameConfig(fps = 50)

  def init(ctx: GameContext): Resource[GameState] = Assets.bitmap("arrow.png").map { arrow =>
    GameState(
      rat = Rat(
        y = ctx.screen.height / 2 - ratHeight / 2,
        rotation = Radians(0),
      ),
      assets = Assets(
        arrow = arrow
      ),
    )
  }

  def update(ctx: GameContext): GameState => GameState = {

    val rotateRat: GameState => GameState =
      state => {
        val newRotation =
          (
            state.rat.rotation + Radians.fromDegrees(ctx.crank.change)
          )
            .clamp(
              min = Radians.fromDegrees(-60),
              max = Radians.fromDegrees(60),
            )
        state.copy(rat = state.rat.copy(rotation = newRotation))
      }

    val moveRat: GameState => GameState =
      state => {
        val newY = (state.rat.y + Math.sin(state.rat.rotation.value) * ctx.delta * 300)
          .clamp(20, ctx.screen.height - ratHeight - ratMarginY)

        state.copy(rat = state.rat.copy(y = newY.toFloat))
      }

    val equalizeRat: GameState => GameState =
      state => {
        val newRotation =
          if state.rat.y == ratMarginY || state
              .rat
              .y == ctx.screen.height - ratHeight - ratMarginY
          then state.rat.rotation * 0.9
          else state.rat.rotation

        state.copy(rat = state.rat.copy(rotation = newRotation))
      }

    Function.chain(
      List(
        rotateRat,
        moveRat,
        equalizeRat,
      )
    )
  }

  def render(state: GameState): Render = {
    import Render._

    val rat = Render.Bitmap(
      x = ratMarginX + ratWidth / 2,
      y = state.rat.y.toInt + ratHeight / 2,
      bitmap = state.assets.arrow,
      rotation = state.rat.rotation,
      centerX = 0.5,
      centerY = 0.5,
      xscale = 1.0,
      yscale = 1.0,
    )
    // .rotated(state.szczur.rotation)

    val debug = Render.Text(
      x = 10,
      y = 10,
      s"Rotation: ${state.rat.rotation.value}, y: ${state.rat.y}",
    )

    Clear(Color.White) |+|
      FPS(0, 0) |+|
      rat |+|
      debug
  }

}

Game development now happening here: GitHub - kubukoz/demos at wroclaw-rat-game - it's a branch made from the starting point of the allocation demo from earlier.

and here's a demo of the current state. There's sound when you score, but I couldn't have that in a gif.

szczur-rec

I've made the similar project, but for Java 1.6.
It's is perfectly works on Simulator and it's is good to prototyping any kind of games (for me ofcoz).
The next step is working with Java on device (currently I wait the shipment).
Thanks for explanation of your's Native Scala modifications, I suppose it would helps me in future.