Working on a little particle system for a game. Would there be interest if I built it out into a library?

I'm into this!

How does it run on hardware?

Yeah so I tried it on device last night and the results… weren’t great. On my computer I can do hundreds of particles at a time, on device it’s more like 20-30. Which is actually enough for some things like a smoke effect, but not enough for sparks etc.

To be honest, I’m not sure what’s the bottleneck at this point. One thought I had is the particles are all sprites, and I could replace them with images. I could also stop adjusting opacity and scale, or bake them into an image table and play an animation instead. I’d love to hear any thoughts people have as to how to improve it, or suggestions for how I could best determine the bottleneck. I’d also be happy to post the source code if it helps!

Sorry :pensive:

Use the Simulator sampler window to see where the code is spending most time. Make sure to measure device not simulator.

I think the sprite system does add quite a lot of overhead. You might be able to create your own lighter weight version. I believe Nic took this approach in one game.

Also simple things like drawing/moving to integer positions even if you track position with floating point.

Pre-render any faded versions of each particle. Goal being to remove as many unnecessary repetitive calculations.

Also, there's an example of how to do particles in C and use them in Lua. SDK/Examples C folder. I found it not configurable enough to be of use, as well as adding complexity to the build process.

That's too bad. But some of the fun of Playdate development is finding creative alternatives when the "real" way is too slow! If you need 300 sparks and can only get away 30, maybe each spark image is a cluster of 10 dots, and you never let them travel far enough to look off? Etc.

1 Like

Thank you all for the tips! I switched from sprites to images, and the frame rate may have improved slightly. I also tried rounding positions to integers, though just to double check: it's faster to calculate floating point positions then round before drawing vs. drawing at the floating point position?

And thanks for the tip about using the sampler. Here are the top % items:

Line 256 calls the particle update function, which is:

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = math.floor(self.position.x+.5)
    self.position.y = math.floor(self.position.y+.5)
end

Do you know what the issue might be here? I guess it gets called on every particle every frame so it could add up. Also, do you know what the metamethod _mul, metamethod _index, and metamethod _add reference?

Thanks!

seems to be a lot of vector maths (C add, mul) and table thrashing (C index)?

depends, but my understanding is that a sprite you tell it to draw at a float and it's different than the last float but still the same int, it will mark dirty. but if you told it to draw at an int last time and next time it's the same int, then it won't be marked dirty.

try this faster version

local floor <const> = math.floor
function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = floor(self.position.x+.5)
    self.position.y = floor(self.position.y+.5)
end

and then this one

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self.position.x = (self.position.x+.5)//1
    self.position.y = (self.position.y+.5)//1
end

not sure about this

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self:moveBy(0.5, 0.5)
end

or this
not sure about this

function Particle:update()
    self.lifetime+=dt
    self.position+=self.velocity*dt
    self:moveTo((self.position.x+.5)//1, (self.position.y+.5)//1)
end

please report your benchmarks!

also see my benchmark thread Well… I didn’t expect that! (Benchmarks & Optimisations)

1 Like

Nice work! Would love to see something like this make its way to a reusable library.

For performance issues, I found success in pre-rendering my particle interactions. I wrote up a little "rendering engine" that would save the contents of the canvas to a pdi animation. Then I would just load those pdi animations into my game. You could even pre-render several variations and randomly playback if you wanted some variety.

1 Like

Matt, thank you for the benchmarking tool, it's incredibly helpful. I didn't get a chance to try it with your rounding improvements, but I was curious about the vector math you pointed out and indeed there were some interesting findings:

Adding two vectors is significantly slower than adding their individual components. That is,

vectorA.x += vectorB.x
vectorA.y += vectorB.y

is nearly twice as fast as

vectorA = vectorA + vectorB

Unfortunately, after updating all the vector math, the limit still seems to be 20-30 particles. Checking the profiler now does give some different results, namely:


Weirdly, the main culprits (line 83 and 66) are:

    self.position.y = math.floor(self.position.y+.5)

and

    self.velocity.x += force.x

which is odd because the x position and y velocity lines aren't mentioned. Also, it's now mostly under metamethod _index, which is interesting.

And professir, that's actually really interesting. I've disabled scaling and opacity temporarily because I know they're rather slow processes, but if I could pre-render it into a series of images it could be a lot faster. Have you released that rendering engine?

2 Likes

The time spent in index seems to be iterating through your vectors. 32% of time spent on vectors. Hmm.

I don't use vectors myself. I'm now not in a rush to do so!

The vector addition performance difference is surprising. I wonder if @dave has any thoughts?

Next try the rounding thing? But this was more for if you were using sprites. Do it only in your final drawing call, not in any movement code.

At least alias the math call, doing so across your project is a huge gain. See: Lua Performance Tips Sample chapter: https://www.lua.org/gems/sample.pdf

and maybe try the long hand

self.velocity.x = self.velocity.x + force.x

But really I think you'll still see slow vector performance. Maybe try it without vectors!?

Hey Matt, what does the // operator do in Lua? It looks like //1 will floor down to the nearest integer?

Yes, it's integer/floor division. Since Lua 5.3

https://www.lua.org/manual/5.3/manual.html#3.4.1

I use //1 instead of (an alias to) math.floor() as it's quicker and easier, though I can't remember the performance difference off-hand.

2 Likes

Ah ok, much appreciated. And thanks for the link, couldn't find it before.

1 Like

Strange, I can't reproduce that. In a simple test case I'm seeing the opposite, the vector addition is a bit under 3x as fast as adding the components separately, both in the simulator and on the device. I wonder what the difference is?

main.lua.zip (727 Bytes)

Interesting... Here are the results I got from Matt's benchmark tool:

# time name
35, 11242, add vectors: plus equals
36, 11165, add vectors
37, 4634, add vectors piecewise
38, 4656, add vectors piecewise: plus equals
39, 4789, add vectors new
40, 16838, add nonvectors piecewise
41, 14018, add nonvectors piecewise, indexed
42, 17058, add numbers
43, 12457, scale vectors
44, 6780, scale vectors piecewise

I'm assuming the second number is the time to run the function? If so, 35-36 involve adding vectors directly ("plus equals" refers to me testing A+=B vs. A = A+B which seems negligeble), 37-39 involve adding the x and y components separately (which I refer to as "piecewise"), 40-41 involve using a generic Lua table instead of vectors, 42 involves storing the x and y components as individual variables, and 43-44 recreates the initial tests but with scaling instead of addition.

Unless maybe I have the numbers backwards?

...Welp

I just looked at the original thread and yup, I got it backwards. Higher number = faster, so adding vectors is indeed around twice as fast as adding the components. Technically using generic tables or individual variables is ~50% faster than vectors, but the added convenience may be worth it.

Also, I made a change so that the particles' velocity only updates every other frame, and got it up to ~50 particles which is a fairly decent number. I've got a few more things I want to try but if anyone has any suggestions I'd love to hear it!

1 Like

Nope! It's how many iterations were done in the time available. Bigger is better.

Not sure why I labelled that column time, sorry
(I didn't)

1 Like

SsirRender-public 3.zip (113.0 KB)

@manalive Here you go. I sanitized the engine a bit and threw in some comments / printed output into the simulator console to help newcomers. Let me know if you have any questions

EDIT: minor changes to the file

Ooh, better correct that, this will be very confusing and possibly a massive time-waster. Since optimization is already a huge time sink.

Suggested: add a bit of math to calculate the time that is spent, or change labels to i"terations" or something like that.

Sorry just to clarify I added the time label since I didn’t see that it was labeled. I think right now it says benchmark or something. And just a quick update: I’ve gotten it up to 100 particles which I think is decent enough to release. I’ll clean it up a little and post it here soon.

1 Like

Yes! I did check but forgot to mention here. I describe it as:

number of times that command can be called in a single frame/update

Hey all, thanks so much for all your help and feedback, I finally got it working more or less the way I want it. It's still a little rough but I wanted to share it with you all to hear your thoughts or feedback.

Meet glitterbomb:
sparkloop

The main class is called ParticleEmitter, which you initialize by passing it an image. There's also a class called AnimatedParticleEmitter that does the same thing but with an image table. You can manually position the emitter, or give it a velocity so it moves each frame.

Right now, the default behavior is to emit particles randomly given certain parameters (discussed below) but could easily be changed to whatever pattern you want.

Right now, the settings are:

  • emissionRate - How many particles are created per second
  • emissionForce - Their initial velocity
  • emitterWidth - If greater than 0, will randomize the start position of particles along a line
  • emissionAngle - The direction they are emitted towards (0 degrees is right, 90 is down, 180 is left, etc.)
  • emissionSpread - If greater than 0, will randomize the angle at which particles are emitted within +/- the spread amount
  • particleLifetime - How long particles will exist (in seconds) before being destroyed (note, this along with emissionRate combine to give you the maximum number of particles that will exist at one time. So emitting 100 particles per second with a 1 second lifetime will have the same number of particles as 25 particles per second with a 4 second lifetime)
  • particleUpdateDelay - Rather than update particles' velocity every frame, you can insert a delay to improve performance (at the cost of physical accuracy)
  • inheritVelocity - Boolean. If set to true, particles will have the emitter's velocity added to their own when spawned.
  • gravity - The amount of gravity particles experience (can set to 0 to turn off)
  • worldScale - A helpful variable that lets you convert from your game's scale to the real world (for example, lets you think of particle velocity as m/s, 9.8 as physically accurate gravity, etc.)

You can either set these variables on initialization by passing them as a table, or can set them manually with functions like setEmissionRate, setEmissionForce, etc. I've included a demo app so you can see some of the functionality and examples of how to use it.

Right now, I've found that I can get over 100 particles on device at 30FPS, and closer to 50 particles if they're animated (note: if anyone has thoughts on how to improve it I would love to hear it!)

I've found that animating scale and opacity was too expensive to do, so instead I recommend baking any size or opacity changes into your animation. To that end, I also created a second mini "app" called glitterglue. With glitterglue, you can create a ParticleEmitter, but rather than draw onscreen it draws a particle frame by frame to a sprite sheet so you can import it as an ImageTable to an AnimatedParticleEmitter later (I'm sure there's a more elegant way of going about this but that's what I have so far). Right now, it just animates size and opacity, but I'm sure could be extended to rotation or other animations.

Anyways, would love for you all to take a look, try it out, and let me know what you think!

7 Likes