Working on a little particle system for a game. Would there be interest if I built it out into a library?

I wanted to have a few basic particle effects in my game. I couldn't find any existing libraries for Playdate so I decided to make my own and well, things got away from me a little. I'm basing it off the Unity particle system, so you start by specifying a sprite, then can set things like:

  • Emission rate
  • Emission force
  • Emission angle, spread, and initial width
  • Particle size and opacity over time
  • Gravity
  • Whether the particle inherits the emitter's velocity

Some examples:
Emission Spread
Emission Force
Velocity

Like I said, I couldn't find anything like it (though definitely let me know if there already is) so if people are interested,I'd be happy to release it as a library. I'd just need to spend some time polishing it up a little, as well as making it more performant (any help in those regards would definitely be appreciated as well).

Would love to hear your feedback!

7 Likes

Very nice! Unity's particles are pretty great.

I'm into this!

How does it run on hardware?

Yeah so I tried it on device last night and the results… weren’t great. On my computer I can do hundreds of particles at a time, on device it’s more like 20-30. Which is actually enough for some things like a smoke effect, but not enough for sparks etc.

To be honest, I’m not sure what’s the bottleneck at this point. One thought I had is the particles are all sprites, and I could replace them with images. I could also stop adjusting opacity and scale, or bake them into an image table and play an animation instead. I’d love to hear any thoughts people have as to how to improve it, or suggestions for how I could best determine the bottleneck. I’d also be happy to post the source code if it helps!

Sorry :pensive:

Use the Simulator sampler window to see where the code is spending most time. Make sure to measure device not simulator.

I think the sprite system does add quite a lot of overhead. You might be able to create your own lighter weight version. I believe Nic took this approach in one game.

Also simple things like drawing/moving to integer positions even if you track position with floating point.

Pre-render any faded versions of each particle. Goal being to remove as many unnecessary repetitive calculations.

Also, there's an example of how to do particles in C and use them in Lua. SDK/Examples C folder. I found it not configurable enough to be of use, as well as adding complexity to the build process.

That's too bad. But some of the fun of Playdate development is finding creative alternatives when the "real" way is too slow! If you need 300 sparks and can only get away 30, maybe each spark image is a cluster of 10 dots, and you never let them travel far enough to look off? Etc.

Thank you all for the tips! I switched from sprites to images, and the frame rate may have improved slightly. I also tried rounding positions to integers, though just to double check: it's faster to calculate floating point positions then round before drawing vs. drawing at the floating point position?

And thanks for the tip about using the sampler. Here are the top % items:

Line 256 calls the particle update function, which is:

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = math.floor(self.position.x+.5)
    self.position.y = math.floor(self.position.y+.5)
end

Do you know what the issue might be here? I guess it gets called on every particle every frame so it could add up. Also, do you know what the metamethod _mul, metamethod _index, and metamethod _add reference?

Thanks!

seems to be a lot of vector maths (C add, mul) and table thrashing (C index)?

depends, but my understanding is that a sprite you tell it to draw at a float and it's different than the last float but still the same int, it will mark dirty. but if you told it to draw at an int last time and next time it's the same int, then it won't be marked dirty.

try this faster version

local floor <const> = math.floor
function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = floor(self.position.x+.5)
    self.position.y = floor(self.position.y+.5)
end

and then this one

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = (self.position.x+.5)//1
    self.position.y = (self.position.y+.5)//1
end

not sure about this

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self:moveBy(0.5, 0.5)
end

or this
not sure about this

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self:moveTo((self.position.x+.5)//1, (self.position.y+.5)//1)
end

please report your benchmarks!

also see my benchmark thread Well… I didn’t expect that! (Benchmarks & Optimisations)

Nice work! Would love to see something like this make its way to a reusable library.

For performance issues, I found success in pre-rendering my particle interactions. I wrote up a little "rendering engine" that would save the contents of the canvas to a pdi animation. Then I would just load those pdi animations into my game. You could even pre-render several variations and randomly playback if you wanted some variety.

1 Like

Matt, thank you for the benchmarking tool, it's incredibly helpful. I didn't get a chance to try it with your rounding improvements, but I was curious about the vector math you pointed out and indeed there were some interesting findings:

Adding two vectors is significantly slower than adding their individual components. That is,

vectorA.x += vectorB.x
vectorA.y += vectorB.y

is nearly twice as fast as

vectorA = vectorA + vectorB

Unfortunately, after updating all the vector math, the limit still seems to be 20-30 particles. Checking the profiler now does give some different results, namely:


Weirdly, the main culprits (line 83 and 66) are:

    self.position.y = math.floor(self.position.y+.5)

and

    self.velocity.x += force.x

which is odd because the x position and y velocity lines aren't mentioned. Also, it's now mostly under metamethod _index, which is interesting.

And professir, that's actually really interesting. I've disabled scaling and opacity temporarily because I know they're rather slow processes, but if I could pre-render it into a series of images it could be a lot faster. Have you released that rendering engine?

1 Like

The time spent in index seems to be iterating through your vectors. 32% of time spent on vectors. Hmm.

I don't use vectors myself. I'm now not in a rush to do so!

The vector addition performance difference is surprising. I wonder if @dave has any thoughts?

Next try the rounding thing? But this was more for if you were using sprites. Do it only in your final drawing call, not in any movement code.

At least alias the math call, doing so across your project is a huge gain. See: Lua Performance Tips Sample chapter: https://www.lua.org/gems/sample.pdf

and maybe try the long hand

self.velocity.x = self.velocity.x + force.x

But really I think you'll still see slow vector performance. Maybe try it without vectors!?

Hey Matt, what does the // operator do in Lua? It looks like //1 will floor down to the nearest integer?

Yes, it's integer/floor division. Since Lua 5.3

https://www.lua.org/manual/5.3/manual.html#3.4.1

I use //1 instead of (an alias to) math.floor() as it's quicker and easier, though I can't remember the performance difference off-hand.

1 Like

Ah ok, much appreciated. And thanks for the link, couldn't find it before.

1 Like

Strange, I can't reproduce that. In a simple test case I'm seeing the opposite, the vector addition is a bit under 3x as fast as adding the components separately, both in the simulator and on the device. I wonder what the difference is?

main.lua.zip (727 Bytes)

Interesting... Here are the results I got from Matt's benchmark tool:

# time name
35, 11242, add vectors: plus equals
36, 11165, add vectors
37, 4634, add vectors piecewise
38, 4656, add vectors piecewise: plus equals
39, 4789, add vectors new
40, 16838, add nonvectors piecewise
41, 14018, add nonvectors piecewise, indexed
42, 17058, add numbers
43, 12457, scale vectors
44, 6780, scale vectors piecewise

I'm assuming the second number is the time to run the function? If so, 35-36 involve adding vectors directly ("plus equals" refers to me testing A+=B vs. A = A+B which seems negligeble), 37-39 involve adding the x and y components separately (which I refer to as "piecewise"), 40-41 involve using a generic Lua table instead of vectors, 42 involves storing the x and y components as individual variables, and 43-44 recreates the initial tests but with scaling instead of addition.

Unless maybe I have the numbers backwards?

...Welp

I just looked at the original thread and yup, I got it backwards. Higher number = faster, so adding vectors is indeed around twice as fast as adding the components. Technically using generic tables or individual variables is ~50% faster than vectors, but the added convenience may be worth it.

Also, I made a change so that the particles' velocity only updates every other frame, and got it up to ~50 particles which is a fairly decent number. I've got a few more things I want to try but if anyone has any suggestions I'd love to hear it!

1 Like

Nope! It's how many iterations were done in the time available. Bigger is better.

Not sure why I labelled that column time, sorry :grimacing:

SsirRender-public 3.zip (113.0 KB)

@manalive Here you go. I sanitized the engine a bit and threw in some comments / printed output into the simulator console to help newcomers. Let me know if you have any questions

EDIT: minor changes to the file

Ooh, better correct that, this will be very confusing and possibly a massive time-waster. Since optimization is already a huge time sink.

Suggested: add a bit of math to calculate the time that is spent, or change labels to i"terations" or something like that.

Sorry just to clarify I added the time label since I didn’t see that it was labeled. I think right now it says benchmark or something. And just a quick update: I’ve gotten it up to 100 particles which I think is decent enough to release. I’ll clean it up a little and post it here soon.

1 Like