The problem was I couldn't get the circular DMA working exactly right.. I think the problem is we can't stop it right at the end of a row so a bit leaks through, putting an extra garbage line on the screen. I changed that back to normal line-by-line transfer but now I'm keeping the double buffer and pipelining it properly (fire off the transfer, then while that's going prepare the next line to send once it's done) and that gets us to an even 50 fps. Gonna call it there.