Linux Answers »Kernel »reading errors on JMicron JM20337 USB-SATA |
|||
|
|
|||
|
Artur Skawina wrote at: 2009-08-02 09:10:06
reading errors on JMicron JM20337 USB-SATA | |||
|
Lev A. Melnikovsky wrote: > I have read through a year old thread on "JMicron JM20337 USB-SATA data > corruption bugfix" and it seems here's another aspect of the same > problem. The SATA disk has genuine errors (bad sectors, just in case: I > am not going to use it but to recover some data from it). Unfortunately > when a bad block is read no error is returned, instead a caller is > blocked indefinitely (until the USB cable is removed). The system log is > filled with repetitive > > sd 3:0:0:0: [sdf] Sense Key : 0x0 [current] > sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0 yes, jmicron bridges do not report errors properly and just stall pretty much indefinitely; found out the hard way, when a disk started to develop bad blocks. took a bit of time to figure out as there were no i/o errors reported at all. At least all the patches from back then have been merged and the kernel can better cope w/ the situation (it used to be a lot worse); plus modern smartctl will let you see the smart attributes (-d usbjmicron), making it easier to check if the disk really is failing. What did work for my case was to copy the data from the disk and every time the process stalled turn off power to the sata drive for a few seconds (leaving the bridge connected). The bridge in most cases recovered and a bit more data got off the drive. This was what saved that controller, because by the time i realized the disk went bad, it was not possible to even mount the fs using another sata controller due to all the i/o errors. With the above process i was able to recover ~95% of the data. Summary: Wouldn't want to use the bridge for any kind of unattended data transfer, it's more of a data recovery device... artur -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||
|
Lev A. Melnikovsky wrote at: 2009-08-03 02:20:05
reading errors on JMicron JM20337 USB-SATA | |||
|
On Sun, 2 Aug 2009 at 6:03pm, Artur Skawina wrote: AS> Lev A. Melnikovsky wrote: AS>> I have read through a year old thread on "JMicron JM20337 USB-SATA data AS>> corruption bugfix" and it seems here's another aspect of the same AS>> problem. The SATA disk has genuine errors (bad sectors, just in case: I AS>> am not going to use it but to recover some data from it). Unfortunately AS>> when a bad block is read no error is returned, instead a caller is AS>> blocked indefinitely (until the USB cable is removed). The system log is AS>> filled with repetitive AS>> AS>> sd 3:0:0:0: [sdf] Sense Key : 0x0 [current] AS>> sd 3:0:0:0: [sdf] ASC=0x0 ASCQ=0x0 AS> AS> yes, jmicron bridges do not report errors properly and just stall pretty AS> much indefinitely; found out the hard way, when a disk started to develop My interpretation was different - the bridge firmware does not crash but remains alive (it does not report the error properly but "zis iz probably perfectly normal behaviour for a Vogon"). This is the Linux kernel that indefinitely tries to re-read. Am I wrong? AS> What did work for my case was to copy the data from the disk and every AS> time the process stalled turn off power to the sata drive for a few AS> seconds (leaving the bridge connected). The bridge in most cases AS> recovered and a bit more data got off the drive. My nerve is too weak to touch ground/power until the data line is disconnected. Running -rc1 seems not so dangerous... -L -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||
|
Alan Stern wrote at: 2009-08-03 09:30:22
reading errors on JMicron JM20337 USB-SATA | |||
|
On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote: > AS> yes, jmicron bridges do not report errors properly and just stall pretty > AS> much indefinitely; found out the hard way, when a disk started to develop > My interpretation was different - the bridge firmware does not crash but > remains alive (it does not report the error properly but "zis iz probably > perfectly normal behaviour for a Vogon"). This is the Linux kernel that > indefinitely tries to re-read. Am I wrong? You are correct except for the term "indefinitely". The retries _will_ stop if you wait long enough. Unfortunately, because of all the nested retry loops in the SCSI drivers and at the application level, you may have to wait as long as half an hour. I agree that this should be fixed. But it is a SCSI issue, not a USB issue. You could try bringing it up on the linux-scsi mailing list. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||
|
Artur Skawina wrote at: 2009-08-03 10:40:10
reading errors on JMicron JM20337 USB-SATA | |||
|
Alan Stern wrote: > On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote: >> AS> yes, jmicron bridges do not report errors properly and just stall pretty >> AS> much indefinitely; found out the hard way, when a disk started to develop >> My interpretation was different - the bridge firmware does not crash but >> remains alive (it does not report the error properly but "zis iz probably >> perfectly normal behaviour for a Vogon"). This is the Linux kernel that >> indefinitely tries to re-read. Am I wrong? No, but that's arguably the right thing to do -- the device didn't report an error, so why should the kernel fail?.. > You are correct except for the term "indefinitely". The retries _will_ > stop if you wait long enough. Unfortunately, because of all the nested > retry loops in the SCSI drivers and at the application level, you may > have to wait as long as half an hour. iirc, i had stalls _way_ longer than that, probably because the reads eventually succeeded, only to stall on the next ones. > I agree that this should be fixed. But it is a SCSI issue, not a USB > issue. You could try bringing it up on the linux-scsi mailing list. actually, the number of retries should probably be configurable, but i wouldn't lower them by default; losing data because of recoverable errors is bad. In this case the bridge may be at fault (by not passing along the error), but to make a significant difference you'd have to reduce the number of retries to something like zero, maybe one at most, and that's just too low for a default. artur -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||
|
Alan Stern wrote at: 2009-08-03 10:50:06
reading errors on JMicron JM20337 USB-SATA | |||
|
On Mon, 3 Aug 2009, Artur Skawina wrote: >> You are correct except for the term "indefinitely". The retries _will_ >> stop if you wait long enough. Unfortunately, because of all the nested >> retry loops in the SCSI drivers and at the application level, you may >> have to wait as long as half an hour. > > iirc, i had stalls _way_ longer than that, probably because the reads > eventually succeeded, only to stall on the next ones. > >> I agree that this should be fixed. But it is a SCSI issue, not a USB >> issue. You could try bringing it up on the linux-scsi mailing list. > > actually, the number of retries should probably be configurable, but i > wouldn't lower them by default; losing data because of recoverable errors > is bad. In this case the bridge may be at fault (by not passing along the > error), but to make a significant difference you'd have to reduce the number > of retries to something like zero, maybe one at most, and that's just too > low for a default. As I understand it, the SCSI and block layers conspire to keep retrying each command until a timeout expires, not until the number of retries reaches a limit. But the situation is complicated, because some kinds of retries reset the timer. And if the application repeats the I/O request then of course everything starts over again. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||
|
Alan Stern wrote at: 2009-08-03 15:00:24
reading errors on JMicron JM20337 USB-SATA | |||
|
On Mon, 3 Aug 2009, Lev A. Melnikovsky wrote: > On Mon, 3 Aug 2009 at 6:25pm, Alan Stern wrote: > > AS> You are correct except for the term "indefinitely". The retries _will_ > AS> stop if you wait long enough. Unfortunately, because of all the nested > AS> retry loops in the SCSI drivers and at the application level, you may > AS> have to wait as long as half an hour. > It was a simple test, I've plugged the USB cable off after two hours, this > is apparently not long enough: > > [root ~]# time dd if=/dev/sdf of=/dev/null skip=61395120 count=1 bs=512 > dd: reading '/dev/sdf': Input/output error > 0+0 records in > 0+0 records out > 0 bytes (0 B) copied, 7550.12 s, 0.0 kB/s > dd: closing input file '/dev/sdf': Bad file descriptor > > real 125m50.119s > user 0m0.000s > sys 0m0.000s Okay, it looks like I was wrong and this particular kind of error will indeed cause unending retries. Either way, like I said before, you should complain about this to the SCSI people. They are the ones who can fix it. (You can CC: linux-usb too, just to keep us in the loop.) Tell them that scsi_end_request() mustn't call scsi_requeue_command() if bytes == 0. Alan Stern -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo (~=~) vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ . |
|||










