File API: failed write >~1KB seems to lock the data volume for the rest of the session

AH!

Okay I captured geterr() at the write call, here’s the actual output from my console:

cart-save: header write rc=32, geterr=‘Not found’
cart-save: chip0 write rc=-1, geterr=‘I/O error’
cart-save: chip0 short-write (-1/524288)
cart-save: post-fail close rc=0, geterr=‘I/O error’
integration selftest: PASS T5 post-overwrite (chip0 dirty) = 0x0
cart-save: no save file at scratch/selftest_cart.bin
integration selftest: FAIL T5 load_save returned true
integration selftest: FAIL T5 byte restored from save file = 0x0 (expected 0xAA)
integration selftest: done — 28 pass, 2 fail

The immediate failure is "I/O error", not 1414. 1414 only appears on the next path-resolving call!

Okay, that makes sense. Looks like uC-FS sets the volume state to "open" (which I guess means not mounted?) when it runs into an i/o error. I tested the devices I have on hand and found three that are showing this. I rolled them back to 3.0.0, no difference. :thinking: Another interesting thing is that they're all rev 1 units. Do you know which hardware version you have? If you plug the device in while the simulator is running you'll see the device's response to the version serial command show up in the console window. If pcbver is 0x01 that's rev. 1, 0x13 is rev. 2.

Now that I have a unit that reproduces this I can dive down into the eMMC driver where the i/o error is coming from. It's a spooky place, lots of "I have no idea why this works but if you touch it you'll break it" code down there. I'll let you know what I find!

1 Like

Rev 1!

Dang, so we found two issues but they’ve been around for a while, what in tarnation.

Btw “a spooky place” is the perfect way to describe driver-land hahahaha.

Turns out the problem is the SD/MMC peripheral is throwing a FIFO TX underrun, no idea why. I'll continue digging, but a possible workaround is to do the startup code in the update callback instead of the kEventInit handler. An easy way to make that change is to move that code into a separate function (if it's not already) and set it as your update callback, then in that function call setUpdateCallback again to set it to the real update function. But hopefully I can find a fix for this in the firmware so the workaround (if it even works) isn't needed.

3 Likes

You are a wizard. Let me know if I can help test with anything! Thank you for what you do.

Okay, I have a fix for 3.1. It's dumb simple and I'm embarrassed it took me this long to find it. I don't know exactly how the fix works, what gears that weren't meshing properly are now, but it was just a matter of adding a tiny delay in the retry loop after we get that FIFO error. I saw pretty quickly that delays helped as I was adding some log outputs but it took a while to figure out the right place to put them.

Anyway! I don't think there's any viable workaround in user space for this because the filesystem is helpfully unmounting the disk as soon as it sees the error. Writing 1KB at a time might work but it'll be slow, maybe slow enough to trigger the 10 s watchdog timer. If you want to test out the fix before 3.1 ships (by the end of the month hopefully?) DM me your serial # and I'll send you a firmware build.

2 Likes