So, I traced on the Saleae and saw that there's a bit of down time each row that makes the difference between 50+ and 48 fps. Makes perfect sense in retrospect: we're only DMAing one row at a time, so there's some turnaround for the interrupt handler to set up the next transfer. I switched to a circular buffer so that it's loading the next row while it's sending the current one and I'm getting just over 54 fps.
I feel like I should point out: This doesn't mean that if your game is dragging at 23 fps this will bump it up to 29! You have to be able to provide 54 frames a second to the display driver. But with this change the display will be able to keep up with you if you can go that fast.
I guess another nice side effect of this is regardless of how fast your game is running, we'll be refreshing the display around 10% faster.