(C API) Opus audio decoding much slower on Rev A than Rev B (can't play in real time)

,

Several months ago, I started work on a music player app supporting the Opus audio codec, using the reference Opusfile library. It works well in the Simulator, but audio decoding is much too slow for real time playback on Rev A hardware. @joyrider3774 spent hours helping test different compiler options and code changes on his Rev A Playdate, but nothing we did seemed to make enough of an improvement. However, much to my surprise, when I received my Rev B Playdate, playback was nearly full speed, and once I discovered that disabling crank sampling boosted performance even further, I had it running in real time.

I asked in the Opus IRC channel if they thought the Playdate Rev A hardware, a 168 MHz Cortex M7 CPU (the STM32F746) with 16 MB of RAM, was capable of decoding Opus in real time, and I think the general consensus was yes, it should be fast enough. I was linked to the Rockbox wiki's codec performance comparison chart as an example of what Opus decoding performance should look like on ARM. I think the iPod Classic (6th gen?) is a good point of comparison, which measured 128 kbps stereo Opus decoding at 148.53% real time when running at the device's stock frequency of 54 Mhz, or 561.29% real time when running at the boosted frequency of 216 MHz.

How does the Playdate compare? I wrote my own benchmark, specifically measuring how long each call to the Opus decoder takes. Rev A only measured 84% real time (0.84x), while Rev B measured 113% real time with crank sampling enabled and 124% with crank sampling disabled.

There's a substantial difference between Rev A and Rev B, and while I don't know what CPU speed Rev B is running at, Rev B at its best is still beaten by the iPod Classic at 54 MHz.

Here's how I have Opus configured:

  • Fixed point mode, as floating point was much slower even after a couple of dirty hacks to avoid double promotion.
  • Pseudostack enabled, as it was crashing with a stack overflow when starting to decode. Reducing buffer sizes within Opus to avoid requiring the pseudostack didn't seem to make a noticeable difference.
  • Opus' CMake builds don't support all of the compatible ARM assembly optimizations, but manually enabling defines and compiling the necessary files didn't seem to make a noticeable difference. The benchmark doesn't have the optimizations enabled, but I may enable them in another branch so I can gather specific performance numbers.
  • Same compiler options as found in buildsupport/playdate.cmake.

I think in the meantime, my music player will only support Rev B Playdates, as Rev A just doesn't seem to be fast enough. I think it's a disappointing outcome, but Opus support is a priority for me, so I don't see myself switching codecs.

Any help, insight, or advice on how I could improve decoding performance would be greatly appreciated. You can find the benchmark source code on GitHub, and I've included a build here: playdate-opusfile-test.pdx.zip (568.2 KB). Please let me know if you have any questions - I'd be happy to clarify.

@dave if you get a chance, could you please clarify what CPU speed each revision runs at? Or do they run at the same frequency? I know the intent was for both revisions to perform the same, but that doesn't seem to be happening in a lot of cases.

(and if you have any optimization ideas, I'd love to hear them!)