- Joined
- Jan 27, 2011
- Location
- Beautiful Sunny Winfield
I've done it! This is a Crucial M4 that began reporting errors a while ago. I tried running self tests on it expecting it to remap a bad sector but the self test terminated with 90% to go. I thought it would simply remap a bad sector and go on. That's standard operation on many of the HDDs I've used.
Symptoms were failing disk operations. I don't recall the details. It's been a while. Another symptom was SMART errors logged and some questionable SMART statistics. And finally, it took a l-o-n-g time for 'smartctl' to read the SMART statistics from the drive. I felt the problem was compounded by faulty drive firmware (which should have handled the problem and continued normal operation.)
Then I ran across this page: https://www.smartmontools.org/wiki/BadBlockHowto
There's a lot of detail there about how to figure out just what offsets to pass to 'dd' to rewrite the ailing sector. I didn't bother with that. First, the calculations made my head hurt. Second, I wanted to write the entire drive to reveal any other bad spots. I think I did something like `dd if=/dev/urandom of=/dev/sdd`. It would have been smarter to use /dev/zero as the source. (*) Maybe I'll repeat this. Wouldn't hurt to repeat the check.
Following this, I created a ZFS filesystem. The reason for this choice is that ZFS checksums all data written to the drive and has a 'scrub' operation that reads all data back to verify integrity. I then filled the disk with files and initiated a scrub. The scrub just finished with no errors reported.
Nice! It's not a particularly fast or large drive, and I wouldn't put it in anything I considered mission critical, but it will be useful as a boot drive for any of a number of lab systems that I fool around with. Oh, and now fetching SMART stats is no longer delayed. What I see is
I'll comment on "5 Reallocated_Sector_Ct". 8192 decimal is 0x2000 in Hexadecimal - in other words one bit set. I don't think this is accurately reported. I'll put more stock in no pending sectors and 196 Reallocated_Event_Count of only 2.
(*) Writing zeroes to an SSD tells the controller that the sector is not used and then need not be erased before being overwritten. Or something like that. This provides better performance when there are empty sectors for the SSD controller's wear leveling algorithms to use.
Symptoms were failing disk operations. I don't recall the details. It's been a while. Another symptom was SMART errors logged and some questionable SMART statistics. And finally, it took a l-o-n-g time for 'smartctl' to read the SMART statistics from the drive. I felt the problem was compounded by faulty drive firmware (which should have handled the problem and continued normal operation.)
Then I ran across this page: https://www.smartmontools.org/wiki/BadBlockHowto
There's a lot of detail there about how to figure out just what offsets to pass to 'dd' to rewrite the ailing sector. I didn't bother with that. First, the calculations made my head hurt. Second, I wanted to write the entire drive to reveal any other bad spots. I think I did something like `dd if=/dev/urandom of=/dev/sdd`. It would have been smarter to use /dev/zero as the source. (*) Maybe I'll repeat this. Wouldn't hurt to repeat the check.
Following this, I created a ZFS filesystem. The reason for this choice is that ZFS checksums all data written to the drive and has a 'scrub' operation that reads all data back to verify integrity. I then filled the disk with files and initiated a scrub. The scrub just finished with no errors reported.
Nice! It's not a particularly fast or large drive, and I wouldn't put it in anything I considered mission critical, but it will be useful as a boot drive for any of a number of lab systems that I fool around with. Oh, and now fetching SMART stats is no longer delayed. What I see is
Code:
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 100 100 050 Pre-fail Always - 13
5 Reallocated_Sector_Ct 0x0033 100 100 010 Pre-fail Always - 8192
9 Power_On_Hours 0x0032 100 100 001 Old_age Always - 27436
12 Power_Cycle_Count 0x0032 100 100 001 Old_age Always - 1384
170 Grown_Failing_Block_Ct 0x0033 100 100 010 Pre-fail Always - 2
171 Program_Fail_Count 0x0032 100 100 001 Old_age Always - 0
172 Erase_Fail_Count 0x0032 100 100 001 Old_age Always - 0
173 Wear_Leveling_Count 0x0033 098 098 010 Pre-fail Always - 65
174 Unexpect_Power_Loss_Ct 0x0032 100 100 001 Old_age Always - 81
181 Non4k_Aligned_Access 0x0022 100 100 001 Old_age Always - 13 0 13
183 SATA_Iface_Downshift 0x0032 100 100 001 Old_age Always - 0
184 End-to-End_Error 0x0033 100 100 050 Pre-fail Always - 0
187 Reported_Uncorrect 0x0032 100 100 001 Old_age Always - 2727
188 Command_Timeout 0x0032 100 100 001 Old_age Always - 0
189 Factory_Bad_Block_Ct 0x000e 100 100 001 Old_age Always - 131
194 Temperature_Celsius 0x0022 100 100 000 Old_age Always - 0
195 Hardware_ECC_Recovered 0x003a 100 100 001 Old_age Always - 12898
196 Reallocated_Event_Count 0x0032 100 100 001 Old_age Always - 2
197 Current_Pending_Sector 0x0032 100 100 001 Old_age Always - 0
198 Offline_Uncorrectable 0x0030 100 100 001 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x0032 100 100 001 Old_age Always - 38
202 Perc_Rated_Life_Used 0x0018 098 098 001 Old_age Offline - 2
206 Write_Error_Rate 0x000e 100 100 001 Old_age Always - 0
I'll comment on "5 Reallocated_Sector_Ct". 8192 decimal is 0x2000 in Hexadecimal - in other words one bit set. I don't think this is accurately reported. I'll put more stock in no pending sectors and 196 Reallocated_Event_Count of only 2.
(*) Writing zeroes to an SSD tells the controller that the sector is not used and then need not be erased before being overwritten. Or something like that. This provides better performance when there are empty sectors for the SSD controller's wear leveling algorithms to use.