Benchmarks & Optimisations

Sorry, I see now you've been editing previous posts with updates.

1 Like

I doubt my local version is much different, I think this is up-to-date.

Matt, would you mind sending me your historical data?

We really appreciate this work you've done!

1 Like

Sure, I'll send it when I next get to my Mac.

Have just emailed you 2x CSV files @james

Also slightly updated code above, one new benchmark (sprite:setVisible) had been added locally by me since October.

1 Like

2 years later...

Comparing 1.13.0 vs 1.12.3 vs 1.11.1 (recovery firmware)

Wondering if there have been compilation settings changes?

On 1.13.0 I'm seeing a drop in performance almost right across the board. Ave ~6% less.

Also 1.12.3 also had some changes, maths was slower but sprites were faster, and was on average as fast as 1.11.1. Ave ~1% slower.

Since 0.12.3

  • "empty" update loop is ~12% slower
  • math functions are 10–30% slower
  • image draw ~20% slower
  • draw text is ~10% slower

Since 0.13.0

  • image sample is ~35% slower
  • sprite set visible is ~30% slower
  • sprite set image is ~25% slower
  • sprite set properties is ~10–15% slower
  • sprite set zindex ~20% slower
  • draw text is ~10% slower

but

  • drawing circles ~10% faster (since 1.12.3)
  • sprite move ~10% faster (since 1.13.0)
  • math random ~20% faster (since 1.13.0)

2 Likes

Thanks for doing this Matt!

Very curious that maths would be so much slower. You would think there is not much code to implement or change for those.

1 Like

fwiw, I see a regression on the C side as well (running the same binary, not recompiled, on 1.12.3 vs 1.13)

notably, audio seems to have gotten ever more expensive ~8% up to ~11% on average

1.12.3

1.13

On a more personal note, this stuff can be somewhat demoralizing when I work so hard on nailing down performance, I know its hard to track everything but I feel perf testing should gate releases.

To not end on a downer, the frame rate seems to have ticked up a half frame, and become a bit more stable, thank you!

4 Likes

I've added results for 0.12.3 in my table, which in hindsight is to blame for some of the stuff I'm seeing.

I also hadn't bothered to run the benchmark for some time.

Fingers cross it's easy to spot the problem(s).

I know how hard 0.13.0 release has been for the team — you have my utmost respect!

Agree, you can't have it all, and priorities are a fact of life.

I'm glad the Catalog sdk release is out the door, that should hopefully bring release frequency back to a bi-monthly schedule.

It would be great if performance would be the focus for one of the upcoming releases.

1 Like

One thing I notice with bench.pdx is that two runs right after each other, same pdx, same hardware, can vary by > 10% on a test, so comparing single runs between firmware versions isn't going to give very conclusive results. Still, there do seem to be some consistent trends. Here's what I get averaging five runs each on some different configurations: 1.12.3 firmware running bench.pdx compiled with 1.12.3 pdc and 1.13.0 pdc (with pdxversion changed to 11200 in the latter case), and 1.13.0 firmware running those two and also the 1.13.0 build without the pdxversion number changed:

Percentages are relative to the 1.12.3 build on 1.12.3 firmware. I'll bump the test times up and run this again later tonight, see if the numbers come out similar.

Earlier today I was working on the serial port driver, adding a ring buffer so the device can receive data faster than 64 KB/s. At one point it was testing at 250 KB/s, the next time I tested that configuration it dropped down to 200, then later it was back up to 250 again. :person_shrugging:

3 Likes

So I increased the test time by 10x and it didn't change the variance at all, still around 10% max, which suggests that whatever is causing it stays the same through the run and changes between runs.. Which reminds me that Lua sets a random seed for its hash function at init time. :person_facepalming: I changed that to a hardcoded seed and the runs are pretty much identical every time.

I'll port bench.pdx to C and see how it fares between 1.12.3 and 1.13.0.

6 Likes

Worth putting some sound API usage additionally in there since @matt s awesome test thing doesnt cover that part.

Could the sound change be the overhead of mixing left and right channels for mono device speaker? (previously it was outputting only left channel)

It might be worth adding the square-root function to the maths tests. It's used in a lot of linear algebra functions, e.g.: euclidean distance.

1 Like

Seems unlikely. That'll be a few extra ops per sample, let's say 200K/s = ~5/samples * 44,100 samples/second. The CPU runs at 160 MHz (or maybe a bit less now, I think we knocked it down a bit after discovering we were running just out of spec for the core voltage) so that's on the order of 0.1%.

1 Like

The benchmark already in itself averages 5 runs. So your results are the average of 5x5 runs :slight_smile:
I also have a superstition that I only post the second set of results I get :crystal_ball:

Also interesting is that 1.12.3 was already somewhat slower than 1.11.1 in key areas.
I'd like to see your result from the C benchmark for 1.11.1 recovery vs 1.13.3 final?

if I had to summarise, 1.13.3 compared to 1.11.1:

  • there is less free time per update ~10%
  • math functions are all slower by up to ~25%
  • image sample is slower by ~20%
  • draw text is slower by ~10%
  • draw text in rect is slower by ~20%
  • image draw is slower by ~15%

I do see the increases in performance in some functions, thanks again, so I am only calling out the ones that are still slow.

my situation is that i worked really hard a couple years ago on performance (award winning), the game binary that I test with has not changed since then, yet its performance has slowly declined over time.


1.13.3 vs 1.12.3 vs 1.11.1

1.13.3 vs 1.11.1

1 Like

This is a concern. Image:draw and drawTextinRect dramatically reduced performance are particularly worrying for me, since I use them all over my game. What could possibly make the SDK performance decline like this over time on a fixed hardware?
I also worked hard at optimizing so generally it's quite problematic, because it's nearly impossible to optimize if the goal post is constantly moving away.

What could be the workaround? Is there a way to write my own image:draw to sort of lock it in time? Sorry if this is a dumb question. I don't have much programming experience. I imagine replacing the image:draw with something I would write in C would also fix the issue? But I have no idea how to integrate C routines in the code.

2 Likes

It's been suggested that this will be done for us at SDK level.

TBC.

1 Like

Wow that would be amazing if some of the core graphic features were to be done in C at SDK level! That would improve performance for a lot of people who don't have the technical skills to get to that level. It's super exciting.