Is there a C version of this benchmark ? I seem to remember there was some discussion on porting it to C api but not sure it has ever been done
has anyone done a REV A vs REV B comparison using this benchmark ?
Comparing to Benchmarks & Optimisations - #27 by matt
I'm getting much better results on rev A hardware, firmware 2.1.1.
Any explanation for this?
Has the testing time per function been increased maybe?
These are the timing values I see in the script I used, which come down to 100 ms per test.
local FPS = 50
local frameMS = 5000/FPS
playdate.display.setRefreshRate(FPS)
#, BENCH, CALL
nil 14854
drawLine - Diagonal 387
drawLine - Horizontal 1846
drawLine - Vertical 454
drawLine - Random Diagonal 687
drawLine - fillRect 4537
drawLine - drawRect 935
math.random 6736
math.random - local 6398
math.sin 5814
math.sin - random 2848
math.cos 5818
math.cos - random 2598
math.floor - local 6259
image:sample 1680
drawText - local 444
drawTextInRect 55
drawRect 454
fillRect 849
drawCircleAtPoint 221
fillCircleAtPoint 280
drawCircleInRect 379
fillCircleInRect 479
sprite:moveTo - static 1206
sprite:moveTo - random 795
sprite:setImage 1750
sprite:setCenter - static 1477
sprite:setCenter - toggle 1353
sprite:setCenter - random 1125
sprite:setZIndex 3054
image:draw 1451
image:draw - locked 782
image:draw - locked local 746
image:draw - pushcontext local 816
not sure its related to these tests, but dave said the new hardware (rev b units) are more sensitive to cache misses, it has caused performance issues before with drawblurred and rotating bitmaps 90 degrees Performance of graphics.image:drawBlurred method on Rev B hardware - #3 by dave
edit: i just ran the benchmark on my rev a unit by compiling the source code using latest pdc.exe from latest sdk version from the 1st post (lua) and these where my results on my REV A unit:
They seem to be more in line with the values posted from other firmware's although some seem lower now, while it seems with you it actually ran more calls ?
DEVICE (5 RUN AVE)
#, BENCH, CALL
01, 1590, nil
02, 73, drawLine - Diagonal
03, 301, drawLine - Horizontal
04, 85, drawLine - Vertical
05, 124, drawLine - Random Diagonal
06, 584, drawLine - fillRect
07, 161, drawLine - drawRect
08, 791, math.random
09, 749, math.random - local
10, 655, math.sin
11, 428, math.sin - random
12, 822, math.cos
13, 547, math.cos - random
14, 1345, math.floor - local
15, 258, image:sample
16, 84, drawText - local
17, 11, drawTextInRect
18, 85, drawRect
19, 152, fillRect
20, 41, drawCircleAtPoint
21, 52, fillCircleAtPoint
22, 69, drawCircleInRect
23, 85, fillCircleInRect
24, 181, sprite:moveTo - static
25, 130, sprite:moveTo - random
26, 260, sprite:setImage
27, 222, sprite:setCenter - static
28, 221, sprite:setCenter - toggle
29, 175, sprite:setCenter - random
30, 399, sprite:setZIndex
31, 220, image:draw
32, 137, image:draw - locked
33, 131, image:draw - locked local
34, 128, image:draw - pushcontext local
END
edit 2: i see a difference in timings the original sources i just downloaded showed this :
while you do 5000 / FPS
So the benchmark runs longer with you (5 times longer) and can do more calls per benchmark because the timeframe is different, so your results posted above in theory need to be divided by 5
edit 3: it would seem you downloaded dave's version while this is the Original where the numbers were previously based on
for reference here are the numbers of my REV A as well (i think nino's are REV A Also)
echo off
time and date set
#, BENCH, CALL
nil 14831
drawLine - Diagonal 390
drawLine - Horizontal 1833
drawLine - Vertical 449
drawLine - Random Diagonal 683
drawLine - fillRect 4611
drawLine - drawRect 954
math.random 6028
math.random - local 7295
math.sin 7293
math.sin - random 3169
math.cos 7167
math.cos - random 3049
math.floor - local 8664
image:sample 1948
drawText - local 455
drawTextInRect 55
drawRect 456
fillRect 848
drawCircleAtPoint 219
fillCircleAtPoint 262
drawCircleInRect 361
fillCircleInRect 467
sprite:moveTo - static 1259
sprite:moveTo - random 782
sprite:setImage 1802
sprite:setCenter - static 1750
sprite:setCenter - toggle 1444
sprite:setCenter - random 1212
sprite:setZIndex 2879
image:draw 1381
image:draw - locked 745
image:draw - locked local 752
image:draw - pushcontext local 759
END
@joyrider3774 asked if I could run these benchmarks on my Rev B Playdate, and I'm happy to, especially as I'm also trying to figure out Rev A vs Rev B performance differences in my music player app. I'll post about that in a separate thread soon.
Results from original version and Dave's version
Original (with 5 second delay, crank enabled)
DEVICE (5 RUN AVE)
#, BENCH, CALL
01, 1299, nil
02, 70, drawLine - Diagonal
03, 337, drawLine - Horizontal
04, 83, drawLine - Vertical
05, 118, drawLine - Random Diagonal
06, 783, drawLine - fillRect
07, 177, drawLine - drawRect
08, 849, math.random
09, 798, math.random - local
10, 906, math.sin
11, 470, math.sin - random
12, 892, math.cos
13, 461, math.cos - random
14, 860, math.floor - local
15, 449, image:sample
16, 110, drawText - local
17, 14, drawTextInRect
18, 89, drawRect
19, 159, fillRect
20, 46, drawCircleAtPoint
21, 56, fillCircleAtPoint
22, 73, drawCircleInRect
23, 101, fillCircleInRect
24, 375, sprite:moveTo - static
25, 215, sprite:moveTo - random
26, 489, sprite:setImage
27, 457, sprite:setCenter - static
28, 470, sprite:setCenter - toggle
29, 309, sprite:setCenter - random
30, 686, sprite:setZIndex
31, 354, image:draw
32, 184, image:draw - locked
33, 179, image:draw - locked local
34, 185, image:draw - pushcontext local
END
Dave version (with 5 second delay, crank enabled)
#, BENCH, CALL
nil 7730
drawLine - Diagonal 368
drawLine - Horizontal 1791
drawLine - Vertical 437
drawLine - Random Diagonal 619
drawLine - fillRect 4492
drawLine - drawRect 945
math.random 4838
math.random - local 5080
math.sin 5478
math.sin - random 2594
math.cos 5606
math.cos - random 2604
math.floor - local 5659
image:sample 2525
drawText - local 577
drawTextInRect 74
drawRect 451
fillRect 773
drawCircleAtPoint 238
fillCircleAtPoint 297
drawCircleInRect 418
fillCircleInRect 550
sprite:moveTo - static 1938
sprite:moveTo - random 1149
sprite:setImage 2177
sprite:setCenter - static 2456
sprite:setCenter - toggle 2421
sprite:setCenter - random 1688
sprite:setZIndex 3774
image:draw 1952
image:draw - locked 962
image:draw - locked local 957
image:draw - pushcontext local 963
END
While testing my app, I noticed that connecting my Playdate to the macOS simulator and clicking 'Control Device with Simulator' improved performance by a significant amount (iirc around 15%, enough to make Opus audio playback real time). The responsible serial command is disablecrank
. I added a 5 second delay before starting the benchmark, so I could disable the crank via the Simulator option.
Results from both versions with crank disabled
Original (with 5 second delay, crank disabled)
DEVICE (5 RUN AVE)
#, BENCH, CALL
01, 1298, nil
02, 78, drawLine - Diagonal
03, 357, drawLine - Horizontal
04, 92, drawLine - Vertical
05, 129, drawLine - Random Diagonal
06, 835, drawLine - fillRect
07, 180, drawLine - drawRect
08, 893, math.random
09, 880, math.random - local
10, 969, math.sin
11, 496, math.sin - random
12, 973, math.cos
13, 513, math.cos - random
14, 946, math.floor - local
15, 462, image:sample
16, 127, drawText - local
17, 15, drawTextInRect
18, 96, drawRect
19, 172, fillRect
20, 51, drawCircleAtPoint
21, 62, fillCircleAtPoint
22, 87, drawCircleInRect
23, 112, fillCircleInRect
24, 384, sprite:moveTo - static
25, 238, sprite:moveTo - random
26, 500, sprite:setImage
27, 468, sprite:setCenter - static
28, 494, sprite:setCenter - toggle
29, 353, sprite:setCenter - random
30, 684, sprite:setZIndex
31, 384, image:draw
32, 190, image:draw - locked
33, 197, image:draw - locked local
34, 201, image:draw - pushcontext local
END
Dave version (with 5 second delay, crank disabled)
#, BENCH, CALL
nil 7836
drawLine - Diagonal 401
drawLine - Horizontal 1826
drawLine - Vertical 477
drawLine - Random Diagonal 666
drawLine - fillRect 4958
drawLine - drawRect 1017
math.random 5109
math.random - local 5574
math.sin 6054
math.sin - random 2645
math.cos 6038
math.cos - random 2639
math.floor - local 6284
image:sample 2757
drawText - local 664
drawTextInRect 79
drawRect 493
fillRect 895
drawCircleAtPoint 261
fillCircleAtPoint 324
drawCircleInRect 451
fillCircleInRect 591
sprite:moveTo - static 2090
sprite:moveTo - random 1260
sprite:setImage 2691
sprite:setCenter - static 2654
sprite:setCenter - toggle 2634
sprite:setCenter - random 1845
sprite:setZIndex 3824
image:draw 2112
image:draw - locked 1038
image:draw - locked local 1058
image:draw - pushcontext local 1055
END
@dave, do you happen to know why the option is making such a difference in performance?
I've put the REV B with cranck and no cranck in an excel sheet and compared the values to my Rev A results, its indeed weird that disabling the cranck on a REV B seems to have an impact
original version. Overal REV B also seems faster than REV A except on some math stuff
Original version (https://1drv.ms/x/s!AmOKrDXp7rpjlOZdcHmWJTxAuSoBYw?e=vOa1Zk)
Dave's version ( https://1drv.ms/x/s!AmOKrDXp7rpjlOZbB1yiegFswN3yXQ?e=4OtrfA )
Holy crap, you're right. For some reason on the rev B board the crank sampling is taking way longer than it does on rev A. I'm looking into this now, not sure whether we can get a fix in for 2.2, but if not it'll be soon after.
Here's an interesting Lua thing I discovered whilst looking into constant folding and propagation during compilation, which is when a compiler replaces constants with their values if it sees that they won't change at runtime (there are exceptions). I'm don't pretend to know how this applies to the Lua bytecode compilation of Playdate SDK, or running that bytecode in the interpreter. Regardless...
So, let's say in your game you do some maths and stuff across a bunch of variables, some of which are based on constants defined earlier in the program flow. I do this all the time.
I also have values written out as maths, like
gfx.drawText("hello", 20+10+2-6, 120+10)
, arrived as as I adjust the position of things on screen (so I can remove the most recent addition/subtraction to get back to the previous value) and which I never get around to shortening. Good to know they get optimised out during compilation!
Anyway, consider these two functionally identical (almost exactly identical) snippets:
-- math
local a = 30
local b = 9 - (a / 5)
local c
local d
c = b * 4
if (c > 10) then
c = c - 10
end
d = 60 / a
-- a = 30, c = 2, res = 4
local function fMathOrderA()
res = c * (60 / a)
return "math orderA"
end
local function fMathOrderB()
res = (60 / a) * c
return "math orderB"
end
local function fMathOrderC()
res = d * c
return "math orderC"
end
- Device (rev A): fMathOrderB is ~10% slower
- Device (rev A): fMathOrderC is ~2.5% slower
Lesson: the order of execution of your maths seems to matter more than expected
Could it be summarised as "put your constants first (towards the left)"? Not sure.
Or could all this just be caching symptoms?
I tested the above code with
which = 0
function playdate.update()
local start = playdate.getCurrentTimeMilliseconds()
if which == 0 then
for i=1,100000 do fMathOrderA() end
elseif which == 1 then
for i=1,100000 do fMathOrderB() end
else
for i=1,100000 do fMathOrderC() end
end
print("test "..which.." elapsed: "..(playdate.getCurrentTimeMilliseconds()-start).." ms")
which = (which+1)%3
end
and I get C 17% faster than A (make sense) and B 5% faster (). On an H7 unit that's 10% and 4%, respectively.
Here's what I get from echo "res = c * (60 / a)" | luac -l -p -
:
main <stdin:0,0> (10 instructions at 0x600000d0c080)
0+ params, 3 slots, 1 upvalue, 0 locals, 3 constants, 0 functions
1 [1] VARARGPREP 0
2 [1] GETTABUP 0 0 1 ; _ENV "c"
3 [1] GETTABUP 1 0 2 ; _ENV "a"
4 [1] LOADI 2 60
5 [1] DIV 1 2 1
6 [1] MMBIN 2 1 11 ; __div
7 [1] MUL 0 0 1
8 [1] MMBIN 0 1 8 ; __mul
9 [1] SETTABUP 0 0 0 ; _ENV "res"
10 [1] RETURN 0 1 1 ; 0 out
and here's echo "res = (60 / a) * c" | luac -l -p -
:
main <stdin:0,0> (10 instructions at 0x600003194080)
0+ params, 2 slots, 1 upvalue, 0 locals, 3 constants, 0 functions
1 [1] VARARGPREP 0
2 [1] GETTABUP 0 0 1 ; _ENV "a"
3 [1] LOADI 1 60
4 [1] DIV 0 1 0
5 [1] MMBIN 1 0 11 ; __div
6 [1] GETTABUP 1 0 2 ; _ENV "c"
7 [1] MUL 0 0 1
8 [1] MMBIN 0 1 8 ; __mul
9 [1] SETTABUP 0 0 0 ; _ENV "res"
10 [1] RETURN 0 1 1 ; 0 out
The code is pretty much the same, but the first one does use one more stack slot. Maybe that's the difference? Not sure what's going on in your test, though..
Thanks Dave, particularly for the decompilation dumps.
At this point, me neither