I'm using the Lua APIs, but I've had good luck with using animators or timers to automate volume of FilePlayers for smooth fades between playing sounds. The end effect is slightly lowering the volume of one player each frame and slightly increasing the volume of the other player. Maybe something similar would work for you? Either fade out and then fade in, or fade both opposite directions for a cross-fade.
This is exactly what I do in Ball und Panzer. I have two tracks playing at the same time, in lock step, and fade their relative volume levels to make dynamic music.
The trickiest part was creating two identical (sample-level) ADPCM files. My approach here was to combine two mono tracks as a single stereo track and edit them together as a WAV. Only at the final step do I export each channel as a mono WAV and convert to ADPCM using adpcm-xq