Updating the entire LCD in C and performance

I'm getting a pointer to the background screen buffer with:

uint8_t* playdate->graphics->getFrame(void);

I'm writing some random 32-bit values to the entire frame (3120, to account for stride and alignment)

I'm then calling:

playdate->system->drawFPS(0,0);

Finally, I'm calling:

pd->graphics->markUpdatedRows(0, SCREEN_HEIGHT - 1);

I'm seeing 48FPS on the device. I'm presuming that since the markUpdatedRows call happens at the end of my update, that I don't manually need to update/flush the frame as I've been warned against that in the C docs?

What's the penalty for updating the entire display? I'm presuming that's the bottleneck? Are there any deeper dive tech docs available on how fast the memory is, how fast the LCD can accept an entire screen update, and so on? If I wanted to do full screen 3D rendering, what would be the gotchas?

I'm using C, and C only.

I've also found that's the limit of the screen hardware for a full screen update.

I get 48fps updating the whole screen, and I'm using Lua.

However, it's worth knowing that screen accepts rows so you can update only 1 pixel on each row and it's as expensive as a full screen update.

Conversely, you can get higher FPS by updating fewer rows.

  • 60fps @ 177 rows, but not at 178 rows
  • 200fps @ 75 rows, but not at 76 rows

200fps seems to be the upper limit, though I've not done extensive testing. It's difficult to tell as the FPS display only goes to 99. I wrote my own frame-limiting code to be able to target higher refresh rates than 50Hz.

Documentation for the screen, with C code samples, is at: 2016_SDE_App_Note_for_Memory_LCD_programming_V1.3.pdf

i don't know if it's possible to optimise a game's output to only touch specific number of rows, or create some way of splitting/interlacing the output so you can update rows across frames. something to think about for sure!

My game targets 60fps, and I take advantage of partial frame draws to only update a small number of rows each frame.

5 Likes

Thanks Matt. Yes, I’m aware of the ability to limit the number of rows updated, I use the call in my sample. As mentioned, I would really like to know the bandwidth of the display, and it appears from your experiments and my initial very simple test that 48fps is the limit for a full screen update. This is a shame.

As an old school assembler programmer, I’m familiar with partial screen updates, it was the only way one could achieve satisfactory performance on old school 8-bit hardware, but much of that hardware (including relevantly, the Gameboy) had character mapped displays with fine scroll offsets as well as sprites, allowing high frame rates.

Your test for 177 rows is useful. If writing to the LCD is the bottleneck, but writing to our background buffer isn’t, then 50fps with a few more than 177 rows should be achievable, even if there is a fair amount of overdraw in the background buffer.

What would be really useful is to know just how fast normal memory is.

I appreciate your reply, and thanks for sharing the Sharp doc.

1 Like

Please do report back with any further findings!

Will do. Meanwhile, if someone from the team can offer further insight into low level details, that would be much appreciated.

One thing that might help with performance is knowing if there is any SIMD, DMA or blit hardware at all and whether İş is exposed in any way and if not, whether it is used by the system so that we can access it in a supported manner. If I were to guess, there’s possibly SIMD on the CPU, not no DMA transfer to the LCD and no blit hardware.

2 Likes

The main CPU is an STM32F7, armv7-M (Cortex-M7) which does have SIMD and DMA according to the docs. FreeRTOS is running between the SDK and the hardware, if that makes any difference.

There has been some discussion of low level details on the Discord, but at this point it is buried behind search. Example: DMA of frame buffer to display.

1 Like

Thanks for those links Matt, much appreciated.

The "SIMD" instructions available are only 32 bits wide, a few 2x16-bit and 4x8-bit operations. The frame buffer returned by the getFrame() function is in the DTCM tightly-coupled memory area, so you wouldn't gain anything by using a separate working area and blitting your data over.

But yeah, I'm also seeing 48 fps drawing random static but the profiler says it's spending most of its time idle. :thinking: I'll look into it!

3 Likes

Isn’t it just the system waiting the DMA that send data to the screen to finish?

Thanks Dave, appreciate it. I would love to know how hard I can push, and where to push to get the best results for 50fps.

It is, but it should be able to update the screen faster than 1/48th s. I checked it on the scope, and the clock is running at the right speed and there's only ~200uS of idle time on that line but for some reason it's taking longer than it should. I'll have to get out the DAQ and see what's really going on. :confused:

1 Like

This reminds me of the issue I was seeing with Gamekid over a year back where I would just see this pause when updating the display a certain number of rows but it was fine (if I recall) when just updating a few rows. Even though I was still emulating and modifying my own display buffer to reflect the changes, I just stopped marking display rows as needing updates except for the first few and saw a significant gain in FPS.

Well, I for one am excited that it isn’t a memory speed limiation and that a solid 50fps with a full screen update might still be possible! As for the DTCM tightly-coupled memory, what’s the sustained bandwidth available to that on the Playdate?

1 Like

I have no idea! I'm pretty sure CPU will be a bottleneck before that's ever an issue. :smiley:

1 Like

So, I traced on the Saleae and saw that there's a bit of down time each row that makes the difference between 50+ and 48 fps. Makes perfect sense in retrospect: we're only DMAing one row at a time, so there's some turnaround for the interrupt handler to set up the next transfer. I switched to a circular buffer so that it's loading the next row while it's sending the current one and I'm getting just over 54 fps. :tada:

I feel like I should point out: This doesn't mean that if your game is dragging at 23 fps this will bump it up to 29! You have to be able to provide 54 frames a second to the display driver. But with this change the display will be able to keep up with you if you can go that fast.

I guess another nice side effect of this is regardless of how fast your game is running, we'll be refreshing the display around 10% faster. :zap:

9 Likes

CPU is 180MHz, 32-bit. I come from a world where the CPU was 1MHz 8-bit, so if the CPU is the bottleneck, and not the memory, I’m going to be very happy.

Thanks, this is great news Dave, you have made my day and no doubt everyone else’s day for getting us all a bonus 10% frame rate bump!

So you set up an IRQ to handle every line transfer to the LCD via DMA, but you do one row at a time. Presumably there’s a good reason not to just blast the entire series of rows marked as requiring an update?

Relatedly, how much frame time would be left for us to work with at a requested 50fps with a full screen update using your updated circular buffer code?

I hope you don’t mind me asking such basic questions, but without being able to see the code, I have to make no assumptions. Thanks for getting to the bottom of this, really appreciated.

1 Like

Hi Dave, perhaps I missed it on the announcements, but has this change made it into the latest SDK, or will it take a little longer?

It's my understanding it'll be in the 1.3 release.

2 Likes

I've read the release notes for 1.3 and don't see a note on this at all. Just wondering if this has gone in without an explanatory note, or if it is now scheduled for a later release?