Add a way to (temporarily?) extend the 'run loop stalled' watchdog beyond 10 seconds

Tengu · May 26, 2023, 12:56am

Referencing this post here regarding run loop watchdog greater than 10 seconds..

I'm in the process of doing pre-Beta testing, and I'm running up against this as well.

Saving a game state to a save slot takes 12-15 seconds on actual hardware (I do prompt the user that 'this will take up to 15 seconds, please wait' but on hardware I get the infamous crash.

I suppose I could (try?) to break up the save into chunks (it's already multiple JSONs, so that's possible) and kludge with juggling semaphores etc. but it seems like a lot of trouble to go through when I know that the game 'isn't hung' but is just going to take >10 seconds.

As games get larger on Playdate (hint: once we get out of Beta ) IMO there should be some way to (temporarily?) flag that a long I/O is about to happen and that the Playdate runtime shouldn't freak out.

So, putting in a feature request - please add some way to indicate to the SDK that:

<PeeWee Herman>I meant to do that</PeeWee Herman>

when the time goes over 10 seconds (maybe with an additional 'if it goes beyond X seconds then I really didn't mean it, go ahead and bail out' second parameter).

Edit: this particular episode is a classic example of why end-to-end testing on actual hardware is critical before shipping anything (doesn't just pertain to the Playdate console ).

matt · May 26, 2023, 9:57am

What exactly are you saving that takes more than 10 seconds?

Have you tried disabling pretty print (pretty argument = false) of the saved output? In my experience that reduces time to save/load due to there simply being fewer bytes.

Tengu · May 26, 2023, 11:53am

What exactly are you saving that takes more than 10 seconds?

10+ levels of dungeon/monster/player state (Has that chest been looted or not? Has that door been unlocked or not? Has that door been opened or is it still closed? Has that pickup been found/acquired or is it still lying there? Is that monster still there or was it beaten? Of all the monsters that there are, has the player encountered them yet [think 'Pokedex']? Player inventory, player history [notebook/journal], lots of other stuff).

Yes, tried turning pretty-printing off (I'd done that earlier to try to get better performance on game load/exit), it didn't help for this issue.

Fortunately, it's not all one big honkin' JSON but a couple of large-ish ones and a whole bunch of smaller... I'll see if I can break the save/restore from slot process into discrete chunks (like, one per update cycle instead of all in one update cycle), that might actually present an opportunity to put in a 'progress bar' (instead of the current 'this will take X seconds, please wait).

Tengu · May 26, 2023, 1:14pm

< Homer>D'oh!< /Homer>

Ok, it turned out I (once again) coded something that was quick/easy/lazy that (once again) worked fine on Simulator but went 'over the line' on actual hardware.

In several places I do something like this (level serialization for all levels - '<delete|read|write>' just means one of those is done depending on need, that's not actually/literally in the code):

  local gamesName = "games_"
  local chestsName = "chests_"
  local fogName = "fog_"
  local messagesName = "messages_"
  local mobsName = "mobs_"
  local pickupsName = "pickups_"
  local staticName = "static_"
  local triggersName = "triggers_"

  local indexName = ""

  for index = 1, MaxLevels, 1 do
    indexName = tostring(index)
    playdate.datastore.<delete|read|write>(chestsName .. indexName)
    playdate.datastore.<delete|read|write>(fogName .. indexName)
    playdate.datastore.<delete|read|write>(messagesName .. indexName)
    playdate.datastore.<delete|read|write>(mobsName .. indexName)
    playdate.datastore.<delete|read|write>(pickupsName .. indexName)
    playdate.datastore.<delete|read|write>(staticName .. indexName)
    playdate.datastore.<delete|read|write>(triggersName .. indexName)
  end

So, waaaay back in time I didn't know how many levels we'd wind up with, so I set MaxLevels to something 'big' (256). Which still works fine (on Simulator) with the lazy code above, 'cause if there's an attempted datastore operation for a chunk-o-state for a level that doesn't actually exist (like 'pickups_22' when there's only 10 levels) then that datastore op will 'just fail', no harm no foul.

And that 'works' on actual hardware, too... until it doesn't, as the time wasted doing 240-ish * 7 failed datastore ops on hardware (much slower I/O than Simulator) turns out to have been what was pushing things past the watchdog timer edge.

I went back and set MaxLevels to the actual number of levels (10 or so) and now game slot save/restore is just under 8 seconds on actual hardware, no longer hitting the watchdog time and is working fine.

I'd still like to see a way to adjust that hard 10 second watchdog in future, though - there may be circumstances where it isn't practical to work around it.

jan.martinek · May 26, 2023, 1:24pm

Not sure if I just don’t see the reason behind it, but wouldn’t saving just the current level help making it much faster? This seems to me like a very brute force solution.

matt · May 26, 2023, 1:28pm

Other ideas are to only save what can't be recalculated after loading, or to save in a more compact format than currently. That could be as simple as Jan's idea of saving the current level, or saving only part of the structure for all levels.

In Sparrow Solitaire I know that Mac @madvogel restructured our data format to be more compact to make saving/loading more performant.

We also do certain things across multiple updates rather than trying to do it all in one update. It's only doing to much in a single frame/update that will stall the run loop.

Tengu · May 26, 2023, 1:33pm

This isn't the code that saves 'current level' state during normal play (that's pretty fast, a fraction of a second), this is the code that creates/saves/loads the entire game state for all levels to/from a slot (think 'Bob and Sue both play the game, they each have their own slot for progress').

matt · May 26, 2023, 1:34pm

Assuming you have to save the entire game state (I find it hard to believe, but you know more than I do about your game)

For me the question I'd ask would not be "how can I extend the run loop stall time?" but rather "how can I spread this time-consuming-thing across multiple frames/updates?"

Instead of using a for loop to save them all in a single update, you could write some code to save each level a while after the last. If you know each level takes 250ms to save, then schedule them 300ms apart. Or you can use Lua co-routine to save them in a separate "thread". There are many options.

Edit: this thread reminds of XY Problem

Tengu · May 26, 2023, 1:41pm

Yes, I really need to save the entire state (in this particular case) .

And yes, fortunately (in this case, at least) there are multiple ways to break things up to spread across updates - I was in the process of doing just that when I realized how my 'lazy' approach was pounding the poor hardware I/O for effectively nothing to show for it but wasting several seconds .

Tengu · May 26, 2023, 1:47pm

Yes, I agree - and it's embarrassing for me to be on the other side of that conversation for a change (it's usually me pointing it out to someone else asking for a change to one of my SDKs ).

matt · May 26, 2023, 1:58pm

No worries!

The benefit of spreading these things out over multiple updates is that you will be able to other things like continue running your game, an animation, a progress bar etc.

Tengu · May 26, 2023, 4:36pm

Final solution, inspired by The Fine Manual:

It's even simpler than I thought... playdate.update() is a coroutine. So, in this case, the bulk of the work is already done for me... I just call coroutine.yield() at various steps of serialization (makes the watchdog happy) and at the same time calculate progress and use gfx. in the serialization function to draw a nice progress bar. Works a treat .

Tengu · May 26, 2023, 5:27pm

The end result:

utt_save_load_progress_try_2

Thanks, everyone!

dave · May 26, 2023, 9:01pm

A while back I got decoding from disk running a lot faster by adding a read buffer up at the API level so that it doesn't have to call all the way into the filesystem driver to hit the buffer down there, looks like I forgot/didn't think to add that to the write side as well. Oops! I implemented that and a little test I wrote went from 2.6s to 0.9s to write out a 293KB file.

I'm pushing the MR now, look for this in a future update!