Gameboy Emulation - Performance Improvement Ideas

We have a lot of gameboy emulation projects, and it would be wise to collect the wisdom of all our brainstorming. It seems that most projects are using peanut-gb as a base, and branching off that to add support for Playdate.

Here are the gameboy emulation projects I am aware of (all based on peanut-gb):

It seems these projects all hit a performance wall around 30-40 fps (the gameboy's true speed is 59.7 hz).

Discussion on the bottleneck seems to fall into three camps.

  • Interpreter, i.e. CPU emulation. Compilers are notoriously bad at interpreter loops, so it's believable, but on the other hand, the playdate's CPU is around 150x faster than the Gameboy's. Perhaps the cache holds is to blame?
  • Rendering (emulated). The 160x144 lcd buffer is updated 1 scanline at a time. After every gb cpu instruction, check the cycle count and determine if it's time to render a scanline (or process an interrupt). That's a lot of overhead! Scanlines themselves don't seem to be terribly expensive though.
  • Rendering (playdate lcd). Any lines that have changed should be redrawn. Expand 2-bit gameboy pixels to 2x1 or 2x2 1-bit playdate pixels. Mark lcd rows as updated. We know this is slow -- full screen refresh is 50 Hz max. @rpdev claims this only accounts for 4 of 23 ms, i.e around 15% of processor time. The state of the art here is to emulate 2 frames at once and only draw the second one, which at least makes full speed emulation plausible.

It's hard to profile these, but I invite you to try. As far as I can tell, @dustin blames rendering, @timhei blames cpu emulation.

Ideas to improve CPU emulation

Ideas to improve Rendering

  • cache background planes. I tried this, but it slowed emulation around 10%.
  • cache sprites
  • interlacing

Other Ideas

  • compile with armclang (doesn't seem to help)

Please contribute if you have ideas, or if you'd like to try your hand at profiling!


Regarding JIT ("just in time" compilation), the idea is to translate the gameboy SM83 instructions into ARMv7-M assembly, so that the Playdate CPU runs the code natively -- blazingly fast by gameboy standards. What's more, we can statically optimize out most flag updates.

I have a working demo of SM83-to-arm translation, but it needs more test cases to catch all the bugs. GitHub - nstbayless/jit-playdate-gameboy: JIT for running gameboy instructions on playdate

Current progress: 100% of instructions implemented! But hardly any testing done at all.

It's really tough to test correctness on hardware because of a lack of debugging ability, so I am also emulating arm on my laptop to debug this: How to Emulate Playdate (Arm) with QEMU


Thinking out loud, could it be useful/fun corner-cutting to strive for just 30fps, BUT only render half the frames, so play speed remains correct, at the expense of smoothness?

(Other work would be done for ALL frames, even the hidden ones, so I don't know how much processing this would save. But I know I've enjoyed certain games in the past that never even hit 30! For some titles it matters more than others.)

That's already what is currently done by some implementations. There's a disagreement about semantics though (is that 30 or 60 fps?). The actual implementation also differs -- it can be handled as two gameboy frames in one playdate update, or one gameboy frame per update with only the odd numbered frames flushing to the display. In the former case, playdate reports 30 fps maximum; in the latter, 60 (and the updates, we presume, are different lengths of time each, depending on parity.)

1 Like

Thread around this after very impressive work! Sadly I can test but I'm watching closely. Well done so far!

Very exciting!

Just to clarify: efficiency around memory usage is everything on Playdate. The amount of work most emulators do per pixel just doesn't seem to fly on device. This is why I've been thinking about rendering and caching all backgrounds and sprites ahead of time, and only update them when we see new sprite or bg data loaded. This may break a few games but would work on most.

That said! I never considered building JIT based emulation! Honestly don't think I have the skills to do so. I hope this works! I will happily take Gamekid offline and point players your way if you get this working.

I had considered writing a little Mac/Windows app that would actually recompile GB roms as PDX files to remove emulation entirely, but... that's as close as I got to thinking about something like this.

Best of luck!

Also! Not really sure what other emulators are doing, but I switched from my own emulation layer to PeanutGB early on as I just didn't feel it necessary to add yet another GB emulator code base into the word when Peanut was pretty good and being actively worked on with goals that aligned with mine.

I do regret it sometimes however as I am a lot less likely to try new things in the emulator.

But hey, I didn't have to write the audio emulation... so that's nice? :stuck_out_tongue: