Image:draw into itself with y offset: acts differently on hardware vs. Simulator (1.12.3)

When I take an image and draw it onto itself shifted downward (a portion of a scroll routine I'm working on) like this...

gfx.pushContext(test)
	test:draw(0,24)
gfx.popContext()

test:draw(0,0)

... the Simulator behaves as I'd hoped: the whole image is shifted down, with the top portion left behind as a duplicate band (where I'll be drawing new content next).

But on actual hardware, that band gets duplicated endlessly downward. Which would make sense if I was looping this, but I'm not! (Only a down shift seems to do this. Shifting up or sideways works fine.)

I'm not 100% sure what behavior is intended, but I figured the Simulator and hardware ought to act the same.

Workaround: spending a tiny bit of resources to use an intermediate copy resolves the difference, as follows:

test:copy():draw(0,24)

Run the attached Lua project on Simulator vs. hardware and you'll see the difference:

Test for draw bug on hardware.zip (256.7 KB)

I'm not surprised by the duplication, since it's basically a feedback loop. You're drawing row 0 into row 24, which then gets copied into row 48, etc. It's weird that it doesn't happen in the simulator. Turns out what's happening is because the image data is lined up just right, it's doing a copy draw, there's no stencil or image mask, and so on.. we can skip the complicated drawing function that handles every single edge case and just do a plain old memcpy(). And that's where it's different: On the device this does a for loop and doesn't care that it's writing over itself, but on the simulator it checks for that and works around it, probably by writing from back to front instead.

We could do that on the device too, but this is only used in the shortcut code. If the image has a mask, the data's not lined up, etc. it'll use the more complex code which will feedback into itself--and fixing it there too is a whole 'nother can of worms I don't want to mess with.

I'll have the simulator always draw line-by-line to avoid this mismatch, skipping the optimization where it can do the entire blit in a single memcpy.

Makes sense! The line-by-line process IS the loop.