Benchmarks & Optimisations

Here are some numbers showing variance (std. dev. over average) over five runs of a modified bench.pdx (2.4 KB) and how both GC and the random hash seed affect it:

The point here is that if you're using a single run to compare different SDK versions your results may not be very reliable. Turning off GC during the test helps a lot. Using a fixed seed even more--but there's a problem there: the fixed seed may cause different behavior between the SDK versions and give you a misleading picture of the performance changes. When I get back home I'll try different fixed seeds between 1.11.1, 1.12.3, and 1.13.4, see if I'm wrong about that.

1 Like

Here we go: bench-nogc.csv.zip (2.4 KB)

This is running the above version of bench.pdx on 1.11.1, 1.12.3, and 1.13.4 using five different hash seeds, average of five runs for each configuration (in case there's still some property that affects the results and changes between runs). What I was looking for here was whether we see the same change in performance between versions for any fixed hash seed, and that's pretty clearly not the case—here's a plot of the results from a few functions:

If the squiggles did follow the same shape then we could just use one fixed seed when running the benchmark and be confident that the performance deltas we get would be the same no matter what seed we used. But no such luck: The only way to get an accurate measure of performance is to average over a bunch of separate runs with different hash seeds.

Okay, one last chart. Here's the deltas between the versions, averaging the results for each test on each SDK version, a rough approximation of the more accurate test I described in the last paragraph.

The one thing that stands out to me is that image drawing clearly took a sizeable hit after 1.12.3 so I need to go back and try and figure out what happened and see if it's fixable. I think that might be where I did some refactoring to support pattern stencils, but I'd swear I profiled that and didn't see a significant difference. :thinking:

Anyway, we're looking at adding this kind of check to the automated tests to alert when a number gets out of whack, just need to make sure it's giving us accurate information.

6 Likes

Thank you for doing this. I’m making use of a lot of image:draw but also drawPolygon and fillPolygon and I was noticing a drop in performance in recent updates but since I didn’t change the code I was wondering what was going on and I guess this could be part of the issue?
Would it be difficult to add the polygon functions to the benchmark as a sanity check?

Is there a C version of this benchmark ? I seem to remember there was some discussion on porting it to C api but not sure it has ever been done

has anyone done a REV A vs REV B comparison using this benchmark ?

2 Likes

Comparing to Benchmarks & Optimisations - #27 by matt

I'm getting much better results on rev A hardware, firmware 2.1.1.

Any explanation for this?
Has the testing time per function been increased maybe?

These are the timing values I see in the script I used, which come down to 100 ms per test.

local FPS = 50
local frameMS = 5000/FPS
playdate.display.setRefreshRate(FPS)
#,	 BENCH,	CALL
nil	 14854
drawLine - Diagonal	   387
drawLine - Horizontal	  1846
drawLine - Vertical	   454
drawLine - Random Diagonal	   687
drawLine - fillRect	  4537
drawLine - drawRect	   935
math.random	  6736
math.random - local	  6398
math.sin	  5814
math.sin - random	  2848
math.cos	  5818
math.cos - random	  2598
math.floor - local	  6259
image:sample	  1680
drawText - local	   444
drawTextInRect	    55
drawRect	   454
fillRect	   849
drawCircleAtPoint	   221
fillCircleAtPoint	   280
drawCircleInRect	   379
fillCircleInRect	   479
sprite:moveTo - static	  1206
sprite:moveTo - random	   795
sprite:setImage	  1750
sprite:setCenter - static	  1477
sprite:setCenter - toggle	  1353
sprite:setCenter - random	  1125
sprite:setZIndex	  3054
image:draw	  1451
image:draw - locked	   782
image:draw - locked local	   746
image:draw - pushcontext local	   816

not sure its related to these tests, but dave said the new hardware (rev b units) are more sensitive to cache misses, it has caused performance issues before with drawblurred and rotating bitmaps 90 degrees Performance of graphics.image:drawBlurred method on Rev B hardware - #3 by dave

edit: i just ran the benchmark on my rev a unit by compiling the source code using latest pdc.exe from latest sdk version from the 1st post (lua) and these where my results on my REV A unit:

They seem to be more in line with the values posted from other firmware's although some seem lower now, while it seems with you it actually ran more calls ?

DEVICE (5 RUN AVE)
#,	 BENCH,	CALL
01,	  1590,	nil
02,	    73,	drawLine - Diagonal
03,	   301,	drawLine - Horizontal
04,	    85,	drawLine - Vertical
05,	   124,	drawLine - Random Diagonal
06,	   584,	drawLine - fillRect
07,	   161,	drawLine - drawRect
08,	   791,	math.random
09,	   749,	math.random - local
10,	   655,	math.sin
11,	   428,	math.sin - random
12,	   822,	math.cos
13,	   547,	math.cos - random
14,	  1345,	math.floor - local
15,	   258,	image:sample
16,	    84,	drawText - local
17,	    11,	drawTextInRect
18,	    85,	drawRect
19,	   152,	fillRect
20,	    41,	drawCircleAtPoint
21,	    52,	fillCircleAtPoint
22,	    69,	drawCircleInRect
23,	    85,	fillCircleInRect
24,	   181,	sprite:moveTo - static
25,	   130,	sprite:moveTo - random
26,	   260,	sprite:setImage
27,	   222,	sprite:setCenter - static
28,	   221,	sprite:setCenter - toggle
29,	   175,	sprite:setCenter - random
30,	   399,	sprite:setZIndex
31,	   220,	image:draw
32,	   137,	image:draw - locked
33,	   131,	image:draw - locked local
34,	   128,	image:draw - pushcontext local
END

edit 2: i see a difference in timings the original sources i just downloaded showed this :
image
while you do 5000 / FPS

So the benchmark runs longer with you (5 times longer) and can do more calls per benchmark because the timeframe is different, so your results posted above in theory need to be divided by 5

edit 3: it would seem you downloaded dave's version while this is the Original where the numbers were previously based on
for reference here are the numbers of my REV A as well (i think nino's are REV A Also)

echo off
time and date set
#,	 BENCH,	CALL
nil	 14831
drawLine - Diagonal	   390
drawLine - Horizontal	  1833
drawLine - Vertical	   449
drawLine - Random Diagonal	   683
drawLine - fillRect	  4611
drawLine - drawRect	   954
math.random	  6028
math.random - local	  7295
math.sin	  7293
math.sin - random	  3169
math.cos	  7167
math.cos - random	  3049
math.floor - local	  8664
image:sample	  1948
drawText - local	   455
drawTextInRect	    55
drawRect	   456
fillRect	   848
drawCircleAtPoint	   219
fillCircleAtPoint	   262
drawCircleInRect	   361
fillCircleInRect	   467
sprite:moveTo - static	  1259
sprite:moveTo - random	   782
sprite:setImage	  1802
sprite:setCenter - static	  1750
sprite:setCenter - toggle	  1444
sprite:setCenter - random	  1212
sprite:setZIndex	  2879
image:draw	  1381
image:draw - locked	   745
image:draw - locked local	   752
image:draw - pushcontext local	   759
END

2 Likes

@joyrider3774 asked if I could run these benchmarks on my Rev B Playdate, and I'm happy to, especially as I'm also trying to figure out Rev A vs Rev B performance differences in my music player app. I'll post about that in a separate thread soon.

Results from original version and Dave's version
Original (with 5 second delay, crank enabled)
DEVICE (5 RUN AVE)
#,	 BENCH,	CALL
01,	  1299,	nil
02,	    70,	drawLine - Diagonal
03,	   337,	drawLine - Horizontal
04,	    83,	drawLine - Vertical
05,	   118,	drawLine - Random Diagonal
06,	   783,	drawLine - fillRect
07,	   177,	drawLine - drawRect
08,	   849,	math.random
09,	   798,	math.random - local
10,	   906,	math.sin
11,	   470,	math.sin - random
12,	   892,	math.cos
13,	   461,	math.cos - random
14,	   860,	math.floor - local
15,	   449,	image:sample
16,	   110,	drawText - local
17,	    14,	drawTextInRect
18,	    89,	drawRect
19,	   159,	fillRect
20,	    46,	drawCircleAtPoint
21,	    56,	fillCircleAtPoint
22,	    73,	drawCircleInRect
23,	   101,	fillCircleInRect
24,	   375,	sprite:moveTo - static
25,	   215,	sprite:moveTo - random
26,	   489,	sprite:setImage
27,	   457,	sprite:setCenter - static
28,	   470,	sprite:setCenter - toggle
29,	   309,	sprite:setCenter - random
30,	   686,	sprite:setZIndex
31,	   354,	image:draw
32,	   184,	image:draw - locked
33,	   179,	image:draw - locked local
34,	   185,	image:draw - pushcontext local
END
Dave version (with 5 second delay, crank enabled)
#,	 BENCH,	CALL
nil	  7730
drawLine - Diagonal	   368
drawLine - Horizontal	  1791
drawLine - Vertical	   437
drawLine - Random Diagonal	   619
drawLine - fillRect	  4492
drawLine - drawRect	   945
math.random	  4838
math.random - local	  5080
math.sin	  5478
math.sin - random	  2594
math.cos	  5606
math.cos - random	  2604
math.floor - local	  5659
image:sample	  2525
drawText - local	   577
drawTextInRect	    74
drawRect	   451
fillRect	   773
drawCircleAtPoint	   238
fillCircleAtPoint	   297
drawCircleInRect	   418
fillCircleInRect	   550
sprite:moveTo - static	  1938
sprite:moveTo - random	  1149
sprite:setImage	  2177
sprite:setCenter - static	  2456
sprite:setCenter - toggle	  2421
sprite:setCenter - random	  1688
sprite:setZIndex	  3774
image:draw	  1952
image:draw - locked	   962
image:draw - locked local	   957
image:draw - pushcontext local	   963
END

While testing my app, I noticed that connecting my Playdate to the macOS simulator and clicking 'Control Device with Simulator' improved performance by a significant amount (iirc around 15%, enough to make Opus audio playback real time). The responsible serial command is disablecrank. I added a 5 second delay before starting the benchmark, so I could disable the crank via the Simulator option.

Results from both versions with crank disabled
Original (with 5 second delay, crank disabled)
DEVICE (5 RUN AVE)
#,	 BENCH,	CALL
01,	  1298,	nil
02,	    78,	drawLine - Diagonal
03,	   357,	drawLine - Horizontal
04,	    92,	drawLine - Vertical
05,	   129,	drawLine - Random Diagonal
06,	   835,	drawLine - fillRect
07,	   180,	drawLine - drawRect
08,	   893,	math.random
09,	   880,	math.random - local
10,	   969,	math.sin
11,	   496,	math.sin - random
12,	   973,	math.cos
13,	   513,	math.cos - random
14,	   946,	math.floor - local
15,	   462,	image:sample
16,	   127,	drawText - local
17,	    15,	drawTextInRect
18,	    96,	drawRect
19,	   172,	fillRect
20,	    51,	drawCircleAtPoint
21,	    62,	fillCircleAtPoint
22,	    87,	drawCircleInRect
23,	   112,	fillCircleInRect
24,	   384,	sprite:moveTo - static
25,	   238,	sprite:moveTo - random
26,	   500,	sprite:setImage
27,	   468,	sprite:setCenter - static
28,	   494,	sprite:setCenter - toggle
29,	   353,	sprite:setCenter - random
30,	   684,	sprite:setZIndex
31,	   384,	image:draw
32,	   190,	image:draw - locked
33,	   197,	image:draw - locked local
34,	   201,	image:draw - pushcontext local
END
Dave version (with 5 second delay, crank disabled)
#,	 BENCH,	CALL
nil	  7836
drawLine - Diagonal	   401
drawLine - Horizontal	  1826
drawLine - Vertical	   477
drawLine - Random Diagonal	   666
drawLine - fillRect	  4958
drawLine - drawRect	  1017
math.random	  5109
math.random - local	  5574
math.sin	  6054
math.sin - random	  2645
math.cos	  6038
math.cos - random	  2639
math.floor - local	  6284
image:sample	  2757
drawText - local	   664
drawTextInRect	    79
drawRect	   493
fillRect	   895
drawCircleAtPoint	   261
fillCircleAtPoint	   324
drawCircleInRect	   451
fillCircleInRect	   591
sprite:moveTo - static	  2090
sprite:moveTo - random	  1260
sprite:setImage	  2691
sprite:setCenter - static	  2654
sprite:setCenter - toggle	  2634
sprite:setCenter - random	  1845
sprite:setZIndex	  3824
image:draw	  2112
image:draw - locked	  1038
image:draw - locked local	  1058
image:draw - pushcontext local	  1055
END

@dave, do you happen to know why the option is making such a difference in performance?

1 Like

I've put the REV B with cranck and no cranck in an excel sheet and compared the values to my Rev A results, its indeed weird that disabling the cranck on a REV B seems to have an impact
original version. Overal REV B also seems faster than REV A except on some math stuff

Original version (https://1drv.ms/x/s!AmOKrDXp7rpjlOZdcHmWJTxAuSoBYw?e=vOa1Zk)

Dave's version ( https://1drv.ms/x/s!AmOKrDXp7rpjlOZbB1yiegFswN3yXQ?e=4OtrfA )

1 Like

Holy crap, you're right. For some reason on the rev B board the crank sampling is taking way longer than it does on rev A. :anguished: I'm looking into this now, not sure whether we can get a fix in for 2.2, but if not it'll be soon after.

5 Likes

Here's an interesting Lua thing I discovered whilst looking into constant folding and propagation during compilation, which is when a compiler replaces constants with their values if it sees that they won't change at runtime (there are exceptions). I'm don't pretend to know how this applies to the Lua bytecode compilation of Playdate SDK, or running that bytecode in the interpreter. Regardless...

So, let's say in your game you do some maths and stuff across a bunch of variables, some of which are based on constants defined earlier in the program flow. I do this all the time.

I also have values written out as maths, like gfx.drawText("hello", 20+10+2-6, 120+10), arrived as as I adjust the position of things on screen (so I can remove the most recent addition/subtraction to get back to the previous value) and which I never get around to shortening. Good to know they get optimised out during compilation!

Anyway, consider these two functionally identical (almost exactly identical) snippets:

-- math
local a = 30
local b = 9 - (a / 5)
local c
local d

c = b * 4
if (c > 10) then
    c = c - 10
end
d = 60 / a
-- a = 30, c = 2, res = 4
local function fMathOrderA()
    res = c * (60 / a)
    return "math orderA"
end
local function fMathOrderB()
    res = (60 / a) * c
    return "math orderB"
end
local function fMathOrderC()
    res = d * c
    return "math orderC"
end
  • Device (rev A): fMathOrderB is ~10% slower
  • Device (rev A): fMathOrderC is ~2.5% slower

Lesson: the order of execution of your maths seems to matter more than expected

Could it be summarised as "put your constants first (towards the left)"? Not sure.

Or could all this just be caching symptoms?

2 Likes

I tested the above code with

which = 0

function playdate.update()
	
	local start = playdate.getCurrentTimeMilliseconds()
	
	if which == 0 then
		for i=1,100000 do fMathOrderA() end
	elseif which == 1 then
		for i=1,100000 do fMathOrderB() end
	else
		for i=1,100000 do fMathOrderC() end
	end
	
	print("test "..which.." elapsed: "..(playdate.getCurrentTimeMilliseconds()-start).." ms")
	
	which = (which+1)%3
end

and I get C 17% faster than A (make sense) and B 5% faster (:person_shrugging:). On an H7 unit that's 10% and 4%, respectively.

Here's what I get from echo "res = c * (60 / a)" | luac -l -p -:

main <stdin:0,0> (10 instructions at 0x600000d0c080)
0+ params, 3 slots, 1 upvalue, 0 locals, 3 constants, 0 functions
	1	[1]	VARARGPREP	0
	2	[1]	GETTABUP 	0 0 1	; _ENV "c"
	3	[1]	GETTABUP 	1 0 2	; _ENV "a"
	4	[1]	LOADI    	2 60
	5	[1]	DIV      	1 2 1
	6	[1]	MMBIN    	2 1 11	; __div
	7	[1]	MUL      	0 0 1
	8	[1]	MMBIN    	0 1 8	; __mul
	9	[1]	SETTABUP 	0 0 0	; _ENV "res"
	10	[1]	RETURN   	0 1 1	; 0 out

and here's echo "res = (60 / a) * c" | luac -l -p -:

main <stdin:0,0> (10 instructions at 0x600003194080)
0+ params, 2 slots, 1 upvalue, 0 locals, 3 constants, 0 functions
	1	[1]	VARARGPREP	0
	2	[1]	GETTABUP 	0 0 1	; _ENV "a"
	3	[1]	LOADI    	1 60
	4	[1]	DIV      	0 1 0
	5	[1]	MMBIN    	1 0 11	; __div
	6	[1]	GETTABUP 	1 0 2	; _ENV "c"
	7	[1]	MUL      	0 0 1
	8	[1]	MMBIN    	0 1 8	; __mul
	9	[1]	SETTABUP 	0 0 0	; _ENV "res"
	10	[1]	RETURN   	0 1 1	; 0 out

The code is pretty much the same, but the first one does use one more stack slot. Maybe that's the difference? Not sure what's going on in your test, though..

2 Likes

Thanks Dave, particularly for the decompilation dumps.

At this point, me neither :crazy_face:

Nim Bindings vs Lua

I've been trying Nim Bindings lately. Was very interested in the performance comparison of Nim vs Lua. If we assume Nim to be very close to C performance, you can also see this as a rough indication of Lua vs. native performance.

For a fair comparison, I implemented Matt's benchmark as posted in this thread to Nim.
As noted before, there is some variability between runs. Also for the Nim results. For Nim, I would expect timing accuary to account for the variability. I'd say the variability is lower for Nim that for Lua.

I'm using the latest published Nim bindings, and like that @samdze showed me some improved results using experimental compiler optimizations which widen the performance gap between Lua and NIm. His results indicate a performance increase of 7.5x. This is a combination of (unexplained) worse lua performance on his device compared to mine, plus an 1.6x performance increase from the Nim bindings I used to his experimental LTO branch. The results by @samdze can be viewed here

Source (on a branch for another project, sorry for the messy organisation): use C init event for bench · ninovanhooff/Nim-Snake-Playdate@0247678 · GitHub

Here are the results for the current Nim bindings version.

revA_nim_211_name revA_nim_211_# revA_lua_# revA_lua_211
nil 11187 3.755287009 2979 nil
drawDiagonal 70 0.9210526316 76 drawLine - Diagonal
drawHorizontal 380 1.035422343 367 drawLine - Horizontal
drawVertical 82 0.9111111111 90 drawLine - Vertical
drawRandomDiagonal 156 1.130434783 138 drawLine - Random Diagonal
drawLineFillRect 1567 1.748883929 896 drawLine - fillRect
drawLineDrawRect 193 1.09039548 177 drawLine - drawRect
mathRandomSugar 1879 1.378576669 1363 math.random
mathRandomProc 1668 1.319620253 1264 math.random - local
mathSin 10951 9.734222222 1125 math.sin
mathSinRandom 1816 3.472275335 523 math.sin - random
mathCos 11307 9.988515901 1132 math.cos
mathCosRandom 1587 2.994339623 530 math.cos - random
mathFloor 11476 9.453047776 1214 math.floor - local
imageSample - Fast 10697 34.39549839 311 image:sample
drawText 101 1.188235294 85 drawText - local
drawTextInRect not in C API 11 drawTextInRect
drawRect 101 1.16091954 87 drawRect
fillRect 167 1.024539877 163 fillRect
drawEllipse 81 1.88372093 43 drawCircleAtPoint
fillEllipse 112 2 56 fillCircleAtPoint
drawEllipse 82 1.138888889 72 drawCircleInRect
fillEllipse 112 1.191489362 94 fillCircleInRect
spriteMoveToStatic 332 1.509090909 220 sprite:moveTo - static
spriteMoveToRandom 199 1.309210526 152 sprite:moveTo - random
spriteSetImage 5958 16.87818697 353 sprite:setImage
spriteSetCenterStatic - center not implemented in Nim 272 sprite:setCenter - static
spriteSetCenterToggle - center not implemented in Nim 254 sprite:setCenter - toggle
spriteSetCenterRandom - center not implemented in Nim 212 sprite:setCenter - random
spriteSetZIndex 9578 16.3447099 586 sprite:setZIndex
draw 1178 4.252707581 277 image:draw
drawLockedLocal - lockFocus not implemented in C 156 image:draw - locked
drawLockedLocal - local is a lua-concept 151 image:draw - locked local
drawPushContext 1156 7.09202454 163 image:draw - pushcontext local
imageSample - Slow 163 0.5241157556 311 image:sample
average perf increase
4.856087018

Observations:

  • Are we comparing apples to apples? Looking at the original lua benchmark; Joyrider, Samdze and my results differ significantly. My results, which are used in this comparison, are most favourable to Lua
  • All functions except drawLine-vertical are faster for Nim than for Lua. This function is roughly equally as slow on both systems, the fdiiferences are not consistent between runs. ImageSample - Slow can be ignored because...
  • It seems that the C implementation for imageSample is so slow that there are already some optimisations done in Lua. This will have something to do with the generation of BitmapData for every invocation. Still, when working directly with BitmapData, a dramatic increase compared to Lua can still be achieved (34x). If you are making an image editing app, (Playmaker etc.) I would strongly consider C or Nim
  • Over all functions, the performance increase averages out to Nim being 5x faster than Lua when ignoring the unoptimized image:sample function
  • The performance increase differs greatly per function. For a performance benefit for your project, look at the functions that are used most heavily in your project for every frame. Math is about 10x faster in Nim. When you are using the sprite system for drawing, the performance increase might not be so great (not directly measured by benchmark) when compared to drawing directly to screen (4.3x). If you do a lot of drawing to off-screen images, the performance increase is even more significant, at 7.3x
  • Where the Lua sdk is mature and the performance is stabilized, the performance gap will widen because Nim can be tweaked further to be more performant. (5x -> 7.5x as shown by preliminary results)
3 Likes

Here are my results on my rev A device, using the experimental optimized version of the Nim bindings.

I'd also want to point out that the comparison done here is almost purely in terms of how fast a language can interact with the SDK (and it is often the SDK itself that bottlenecks the tests).
So it is not a general speed comparison and standard algorithms/pure logic implemented in Nim vs. Lua would show much more difference.

That said, here you go:

function revA 2.1.1 Nim revA 2.1.1 Lua perf. increase
nil 12778 2077 x6.152142513
drawDiagonal 70 71 x0.985915493
drawHorizontal 411 289 x1.422145329
drawVertical 84 84 x1
drawRandomDiagonal 168 122 x1.37704918
drawLineFillRect 1744 601 x2.901830283
drawLineDrawRect 200 165 x1.212121212
mathRandomSugar 7800 984 x7.926829268
mathRandomProc 9663 1104 x8.752717391
mathSin 13458 939 x14.33226837
mathSinRandom 5117 428 x11.95560748
mathCos 13156 1071 x12.28384687
mathCosRandom 11077 568 x19.50176056
mathFloor 13018 1298 x10.02927581
imageSample - fast 13411 290 x46.24482759
drawText 102 83 x1.228915663
drawTextInRect 29 11 x2.636363636
drawRect 100 87 x1.149425287
fillRect 180 156 x1.153846154
drawEllipse 87 43 x2.023255814
fillEllipse 126 52 x2.423076923
drawEllipse 86 70 x1.228571429
fillEllipse 120 87 x1.379310345
spriteMoveToStatic 3242 171 x18.95906433
spriteMoveToRandom 658 128 x5.140625
spriteSetImage 2956 274 x10.78832117
spriteSetCenterStatic - still to test in Nim 235
spriteSetCenterToggle - still to test in Nim 247
spriteSetCenterRandom - still to test in Nim 183
spriteSetZIndex 7927 413 x19.1937046
draw 1125 224 x5.022321429
drawLockedLocal - not implemented in C 135
drawLockedLocal - local is a lua-concept 135
drawPushContext 1091 128 x8.5234375
imageSample - slow, can be ignored 199 290 x0.6862068966
average perf increase
x7.587159451
ignoring imageSample slow
x7.825123332

i checked the branch as i'm trying to port the NIM Code to C but i noticed this:


is that normal ? i mean is the playdate case sensitive when it comes to directory names or not ? because if it is it might have failed to load the background image in certain cases

edit: come to think of it it's probably not case sensitive as it runs from a fat file system if i'm not mistaken

here's the C Port https://github.com/joyrider3774/benchmark_c_playdate i'm not entirely sure of certain functions like the random ones (randomsugor / proc) and the values used for sin and cos random, it would be greate if someone verified these. i also added a setcenter(0,0) on the background image otherwise i would not see "bench" text on the screen. Also not sure if i'm supposed to see the benchmark actually running i think with the lua version i did, but almost seems as here it's being run in the background somehow

So here's the results from running on my pladate rev A:

#, BENCH, CALL
0,nil,12732
1,drawDiagonal,71
2,drawHorizontal,415
3,drawVertical,84
4,drawRandomDiagonal,209
5,drawLineFillRect,2005
6,drawLineDrawRect,206
7,mathRandomSugar,13000
8,mathRandomProc,12422
9,mathSin,11731
10,mathSinRandom,13213
11,mathCos,12809
12,mathCosRandom,12727
13,mathFloor,11630
14,imageSample - Fast,12011
15,drawText,103
16,drawTextInRect not in C API,0
17,drawRect,104
18,fillRect,176
19,drawEllipse,87
20,fillEllipse,119
21,drawEllipse,87
22,fillEllipse,119
23,spriteMoveToStatic,2771
24,spriteMoveToRandom,2415
25,spriteSetImage,6130
26,spriteSetCenterStatic - center not implemented in C,0
27,spriteSetCenterToggle - center not implemented in C,0
28,spriteSetCenterRandom - center not implemented in C,0
29,spriteSetZIndex,8921
30,draw,1279
31,drawLockedLocal - lockFocus not implemented in C,0
32,drawLockedLocal - local is a lua-concept,0
33,drawPushContext,1221
34,imageSample - Slow,4806

Edit on github (source code) i also added the SpriteSetCenter* functions now as C does have these

2 Likes

I wrote a benchmark for Opus audio decoding, as I found that Rev A was too slow for real time playback, while Rev B was just barely fast enough, and I wanted precise performance numbers. I posted more details along with source code and download links for the benchmark in a new thread, but here are the results:

Rev A
Rev B with crank sampling
Rev B without crank sampling

(Rev A: 0.84x, Rev B w/ crank sampling: 1.13x, Rev B w/o crank sampling: 1.24x)

Rev B without crank sampling is nearly 50% faster than Rev A. Rev A and Rev B are supposed to have roughly equivalent performance, but that's clearly not the case with all workloads.

From the thread (rev A was 84% real time and runs at 168 MHz):

Any update on this? Disabling crank sampling still results in a speed increase on 2.3.0.

1 Like

Sorry, this got buried in the dozens of open dev forum tabs. Yeah, seems pretty important. I've targeted it for 2.5. I looked into it earlier and didn't see any obvious reason it was taking longer, but hopefully I'll have better luck this time.

2 Likes