Performance with `drawRotated`


I'm on SDK 2.0.0.

I'm working on a game that draws rotated bitmaps. I ran into some performance bottlenecks so I wanted to profile drawRotated and drawWithTransform. These methods are both pretty slow on hardware, and I'm hoping for any insight into how I can speed things up.

First, I noticed there's a particular pattern to the framerate and CPU that I found interesting.
Here's my Playdate's CPU and framerate when running the following program:

import "CoreLibs/graphics"
import "CoreLibs/sprites"

local gfx <const> =

local angle = 1
local xform =
xform:rotate(angle, 200, 120)

local img =, 240, gfx.kColorClear)
gfx.fillRect(180, 100, 40, 40)

local sprite =
sprite:setBounds(0, 0, 400, 240)
sprite.draw = function(x, y, w, h)
    gfx.drawText(angle, 20, 20)
    img:drawRotated(200, 120, angle)


playdate.update = function()
    angle = (angle + 1) % 360

The CPU bottoms out (and frame rate tops out) at 180º and 360º/0º of rotation. I'm guessing there's some optimization being done when the rotation is exactly 180º or 0º, but I'm curious why the CPU spikes so badly around 260º and 70º. If there's something I can learn from this to optimize my drawing I'd certainly like to try it!

My game doesn't really lend itself to prebaked rotations, unfortunately, but I might have to figure out something like that as a last resort!

Also, I noticed that a smaller bitmap draws rotated much more quickly. I can maintain 50fps when drawing the image in the above program using only a 40x40 bitmap. If anyone has insight on why this is true I'd like to try to apply that to my game.

Thanks as always for all of your help!

It's all about the L1 cache: if the data you're processing fits in the cache then after the initial load you don't have to touch external memory until you're finished and writing it back out. (Gross oversimplification there, but generally true.) The larger the image is the more cache misses and stalls on memory i/o you get. Rotation is also a factor: For small angles of rotation the source data for one output line is in a smaller span of memory. If you're rotating close to 90 degrees then you have to walk nearly the entire source image row by row to generate each output row. In this case the cache overhead hurts us, so I did a bunch of profiling on image sizes+angles and have a heuristic in the code that disables the data cache where we expect it'll slow us down. And, finally, you're right, there's an optimization for 90-degree multiples. There I'm processing the image in square chunks instead of row-by-row so that the inner loop fits in the cache. I don't remember the exact number, but that made rotating a full screen image something like 50x faster. :smiley:


Have you considered pre-rotating the images?

See: CleanRotation

The performance benefits are large, the rotations are IMO at least as good as those obtained via API, and the image tables produced aren't hideously big :slight_smile: .

1 Like

Oops, that didn't register at first. Neevermind! :slight_smile:

Edit: will leave the CleanRotation reply in here anyway, just in case you need to fall back to pre-rendering.

No, that's super helpful! Thank you!
My game's bitmaps change frequently so I was thinking I wouldn't be able to pre-compute at compile time, but this looks like a great solution for runtime.

1 Like