I'm writing a game to test some drawing and performance.
I have written my own routines to draw lines using DDA and Bresenham's algorithm in C.
They are ok performing, pretty fast.
But when I started to profile I noticed that native graphics->drawLine performs at least 5 times faster.
Minor things the compiler will probably do for you, but if you want to be sure (as division is slow):
Instead of x0 / 8 do x0 >> 3
Instead of x0 % 8 do x0 & 0x7
Also potentially minor (if your perf. is fine on the device then don't worry about it), but in performant loops like this you may want to avoid pipeline flushes due to branch misprediction:
int e2dx = e2 >= dy; // 1 or 0
int e2sx = e2 <= dx; // 1 or 0
err += (dx * e2dx) + (dy * e2dy);
x0 += sx * e2dy;
y0 += sy * e2dx;
Now there are no branches and the processor can just do pure math blasting.