Updating the entire LCD in C and performance

The main CPU is an STM32F7, armv7-M (Cortex-M7) which does have SIMD and DMA according to the docs. FreeRTOS is running between the SDK and the hardware, if that makes any difference.

There has been some discussion of low level details on the Discord, but at this point it is buried behind search. Example: DMA of frame buffer to display.

1 Like

Thanks for those links Matt, much appreciated.

The "SIMD" instructions available are only 32 bits wide, a few 2x16-bit and 4x8-bit operations. The frame buffer returned by the getFrame() function is in the DTCM tightly-coupled memory area, so you wouldn't gain anything by using a separate working area and blitting your data over.

But yeah, I'm also seeing 48 fps drawing random static but the profiler says it's spending most of its time idle. :thinking: I'll look into it!

3 Likes

Isn’t it just the system waiting the DMA that send data to the screen to finish?

Thanks Dave, appreciate it. I would love to know how hard I can push, and where to push to get the best results for 50fps.

It is, but it should be able to update the screen faster than 1/48th s. I checked it on the scope, and the clock is running at the right speed and there's only ~200uS of idle time on that line but for some reason it's taking longer than it should. I'll have to get out the DAQ and see what's really going on. :confused:

1 Like

This reminds me of the issue I was seeing with Gamekid over a year back where I would just see this pause when updating the display a certain number of rows but it was fine (if I recall) when just updating a few rows. Even though I was still emulating and modifying my own display buffer to reflect the changes, I just stopped marking display rows as needing updates except for the first few and saw a significant gain in FPS.

Well, I for one am excited that it isn’t a memory speed limiation and that a solid 50fps with a full screen update might still be possible! As for the DTCM tightly-coupled memory, what’s the sustained bandwidth available to that on the Playdate?

1 Like

I have no idea! I'm pretty sure CPU will be a bottleneck before that's ever an issue. :smiley:

1 Like

So, I traced on the Saleae and saw that there's a bit of down time each row that makes the difference between 50+ and 48 fps. Makes perfect sense in retrospect: we're only DMAing one row at a time, so there's some turnaround for the interrupt handler to set up the next transfer. I switched to a circular buffer so that it's loading the next row while it's sending the current one and I'm getting just over 54 fps. :tada:

I feel like I should point out: This doesn't mean that if your game is dragging at 23 fps this will bump it up to 29! You have to be able to provide 54 frames a second to the display driver. But with this change the display will be able to keep up with you if you can go that fast.

I guess another nice side effect of this is regardless of how fast your game is running, we'll be refreshing the display around 10% faster. :zap:

9 Likes

CPU is 180MHz, 32-bit. I come from a world where the CPU was 1MHz 8-bit, so if the CPU is the bottleneck, and not the memory, I’m going to be very happy.

Thanks, this is great news Dave, you have made my day and no doubt everyone else’s day for getting us all a bonus 10% frame rate bump!

So you set up an IRQ to handle every line transfer to the LCD via DMA, but you do one row at a time. Presumably there’s a good reason not to just blast the entire series of rows marked as requiring an update?

Relatedly, how much frame time would be left for us to work with at a requested 50fps with a full screen update using your updated circular buffer code?

I hope you don’t mind me asking such basic questions, but without being able to see the code, I have to make no assumptions. Thanks for getting to the bottom of this, really appreciated.

1 Like

Hi Dave, perhaps I missed it on the announcements, but has this change made it into the latest SDK, or will it take a little longer?

It's my understanding it'll be in the 1.3 release.

2 Likes

I've read the release notes for 1.3 and don't see a note on this at all. Just wondering if this has gone in without an explanatory note, or if it is now scheduled for a later release?

You're right, it didn't make it into the 1.3 release. There's one little glitch to work out, I think because I'm not stopping the circular DMA fast enough. I might need to switch to normal DMA but use the half-complete interrupt to do the same kind of pipelining.

Regarding your previous questions.. Why not prepare the entire update and send it out at once? Mostly just being stingy with memory. We'd need two copies of the command buffer since we'd want to update the next frame while the current is still being sent, so that's an extra ~28KB of the 320KB SRAM required. As far as overhead in the display driver, I'm not actually sure! In one sense it's the same as before, the difference is using the circular buffer lets us set up the command buffer at the same time DMA is running instead of bouncing back and forth between the two. We're not doing any less work, we're just doing it in parallel. But if you want to run at >48 fps, once this patch is out and the hardware supports it, you'll have to finish your work faster--by 10% in order to run at 54 fps.

1 Like

Did this make it into 1.4?

(I'm guessing not after running a tiny test)

Still not fixed, sorry. :frowning: We've been dealing with a lot of big structural stuff the last month or two, getting ready for season testing. I can't wait until I can go back to knocking off bugs and getting them easy dopamine hits.

1 Like

No worries, just checking!

Keep On Truckin'

The problem was I couldn't get the circular DMA working exactly right.. :confused: I think the problem is we can't stop it right at the end of a row so a bit leaks through, putting an extra garbage line on the screen. I changed that back to normal line-by-line transfer but now I'm keeping the double buffer and pipelining it properly (fire off the transfer, then while that's going prepare the next line to send once it's done) and that gets us to an even 50 fps. Gonna call it there. :slight_smile:

6 Likes