One thing I notice with bench.pdx is that two runs right after each other, same pdx, same hardware, can vary by > 10% on a test, so comparing single runs between firmware versions isn't going to give very conclusive results. Still, there do seem to be some consistent trends. Here's what I get averaging five runs each on some different configurations: 1.12.3 firmware running bench.pdx compiled with 1.12.3 pdc and 1.13.0 pdc (with pdxversion changed to 11200 in the latter case), and 1.13.0 firmware running those two and also the 1.13.0 build without the pdxversion number changed:
Percentages are relative to the 1.12.3 build on 1.12.3 firmware. I'll bump the test times up and run this again later tonight, see if the numbers come out similar.
Earlier today I was working on the serial port driver, adding a ring buffer so the device can receive data faster than 64 KB/s. At one point it was testing at 250 KB/s, the next time I tested that configuration it dropped down to 200, then later it was back up to 250 again.
So I increased the test time by 10x and it didn't change the variance at all, still around 10% max, which suggests that whatever is causing it stays the same through the run and changes between runs.. Which reminds me that Lua sets a random seed for its hash function at init time. I changed that to a hardcoded seed and the runs are pretty much identical every time.
I'll port bench.pdx to C and see how it fares between 1.12.3 and 1.13.0.
Worth putting some sound API usage additionally in there since @matt s awesome test thing doesnt cover that part.
Could the sound change be the overhead of mixing left and right channels for mono device speaker? (previously it was outputting only left channel)
It might be worth adding the square-root function to the maths tests. It's used in a lot of linear algebra functions, e.g.: euclidean distance.
Seems unlikely. That'll be a few extra ops per sample, let's say 200K/s = ~5/samples * 44,100 samples/second. The CPU runs at 160 MHz (or maybe a bit less now, I think we knocked it down a bit after discovering we were running just out of spec for the core voltage) so that's on the order of 0.1%.
The benchmark already in itself averages 5 runs. So your results are the average of 5x5 runs
I also have a superstition that I only post the second set of results I get
Also interesting is that 1.12.3 was already somewhat slower than 1.11.1 in key areas.
I'd like to see your result from the C benchmark for 1.11.1 recovery vs 1.13.3 final?
if I had to summarise, 1.13.3 compared to 1.11.1:
- there is less free time per update ~10%
- math functions are all slower by up to ~25%
- image sample is slower by ~20%
- draw text is slower by ~10%
- draw text in rect is slower by ~20%
- image draw is slower by ~15%
I do see the increases in performance in some functions, thanks again, so I am only calling out the ones that are still slow.
my situation is that i worked really hard a couple years ago on performance (award winning), the game binary that I test with has not changed since then, yet its performance has slowly declined over time.
1.13.3 vs 1.12.3 vs 1.11.1
1.13.3 vs 1.11.1
This is a concern. Image:draw and drawTextinRect dramatically reduced performance are particularly worrying for me, since I use them all over my game. What could possibly make the SDK performance decline like this over time on a fixed hardware?
I also worked hard at optimizing so generally it's quite problematic, because it's nearly impossible to optimize if the goal post is constantly moving away.
What could be the workaround? Is there a way to write my own image:draw to sort of lock it in time? Sorry if this is a dumb question. I don't have much programming experience. I imagine replacing the image:draw with something I would write in C would also fix the issue? But I have no idea how to integrate C routines in the code.
It's been suggested that this will be done for us at SDK level.
Wow that would be amazing if some of the core graphic features were to be done in C at SDK level! That would improve performance for a lot of people who don't have the technical skills to get to that level. It's super exciting.
I believe it already is for a lot of stuff, the drawTextinRect thing is that bug fixes and edge cases change have slowed it down in Lua so it will be moved to C.
I see, thanks. So you think that they’ll do the same for image:draw? 20% slower seems like there’s a good case for it, especially considering how indispensable the function is.
image:draw() is already calling into the underlying DrawBitmap C function if I remember correctly
Oh okay. I wonder why Matt’s benchmarks show such a decline in performance. I can’t quite understand what could cause almost every part of the SDK to fluctuate in performance so much with every minor update of the SDK. I can’t imagine that they massively rewrite anything at this point?
Part of it is Lua runtime, which uses a random hashing of tables so the performance of each launch can vary up to 10%. This is done for security purposes to try to prevent people hacking games, as table contents will be in different places each launch. I don't know enough to argue whether or not it is needed on am embedded device. But, as a test, Dave tried removing this feature and indeed the performance penalty went away.
Over on Discord there have been more discoveries about the other performance differences, but I need to investigate so it's too early to talk about those. You can read it there for now.
What's this? Denuvo for playdate? Don't enforce these DRM performance penalties on developers, make it an option, I'd say.
I mean 10% of performance variation at runtime is something I can at least predict and work with. But a 20% hit on a graphic function after an update makes the platform hard to rely on overt time. There should be a baseline that doesn’t change I would think.
Here are some numbers showing variance (std. dev. over average) over five runs of a modified bench.pdx (2.4 KB) and how both GC and the random hash seed affect it:
The point here is that if you're using a single run to compare different SDK versions your results may not be very reliable. Turning off GC during the test helps a lot. Using a fixed seed even more--but there's a problem there: the fixed seed may cause different behavior between the SDK versions and give you a misleading picture of the performance changes. When I get back home I'll try different fixed seeds between 1.11.1, 1.12.3, and 1.13.4, see if I'm wrong about that.
Here we go: bench-nogc.csv.zip (2.4 KB)
This is running the above version of bench.pdx on 1.11.1, 1.12.3, and 1.13.4 using five different hash seeds, average of five runs for each configuration (in case there's still some property that affects the results and changes between runs). What I was looking for here was whether we see the same change in performance between versions for any fixed hash seed, and that's pretty clearly not the case—here's a plot of the results from a few functions:
If the squiggles did follow the same shape then we could just use one fixed seed when running the benchmark and be confident that the performance deltas we get would be the same no matter what seed we used. But no such luck: The only way to get an accurate measure of performance is to average over a bunch of separate runs with different hash seeds.
Okay, one last chart. Here's the deltas between the versions, averaging the results for each test on each SDK version, a rough approximation of the more accurate test I described in the last paragraph.
The one thing that stands out to me is that image drawing clearly took a sizeable hit after 1.12.3 so I need to go back and try and figure out what happened and see if it's fixable. I think that might be where I did some refactoring to support pattern stencils, but I'd swear I profiled that and didn't see a significant difference.
Anyway, we're looking at adding this kind of check to the automated tests to alert when a number gets out of whack, just need to make sure it's giving us accurate information.
Thank you for doing this. I’m making use of a lot of image:draw but also drawPolygon and fillPolygon and I was noticing a drop in performance in recent updates but since I didn’t change the code I was wondering what was going on and I guess this could be part of the issue?
Would it be difficult to add the polygon functions to the benchmark as a sanity check?