(NB: I'm running Linux so questions and answers are in terms of that.)
I found that a drive had been dropped from a RAID1 a couple days ago. At that time the drive no longer responded to SMART queries ('smartctl -a /dev/sdc') Following reboot the drive now responds but has not been added back to the MD RAID1 array. (I'm not sure if this is automatic or if it requires manual intervention - this is the first time I've had a drive dropped from an MD RAID.)
Looking at the SMART diagnostics I find:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 201 3 Spin_Up_Time 0x0027 196 178 021 Pre-fail Always - 5158 4 Start_Stop_Count 0x0032 100 100 000 Old_age Always - 70 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 156 000 Old_age Always - 0 9 Power_On_Hours 0x0032 055 055 000 Old_age Always - 33491 10 Spin_Retry_Count 0x0032 100 253 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 70 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 36 193 Load_Cycle_Count 0x0032 194 194 000 Old_age Always - 19264 194 Temperature_Celsius 0x0022 123 106 000 Old_age Always - 27 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 [B][COLOR="#B22222"]197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 1 [/COLOR][/B]198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 200 200 000 Old_age Offline - 0
I performed a 'long' self test ('smartctl -t long /dev/sdc') and it failed quickly. Results of all self tests are
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Extended offline Completed: read failure 90% 33491 239126392 # 2 Extended offline Completed without error 00% 30627 - # 3 Short offline Completed without error 00% 30619 -
I decided to perform a read/write test using 'badblocks' ('badblocks -v -s -n /dev/sdc')
[email protected]:/home/hbarta# badblocks -v -s -n /dev/sdc Checking for bad blocks in non-destructive read-write mode From block 0 to 2930266583 Checking for bad blocks (non-destructive read-write test) Testing with random pattern: 0.13% done, 58:30 elapsed. (0/0/0 errors)
If my arithmetic is right, it's going to take about a month to complete the badblocks scan. Of course if it gets to the bad block sooner, it may not need to finish.
My understanding of the 'Current_Pending_Sector' count is it is a sector that could not be read. It will be remapped when it is written. (Or if it is written and can be read back, it will not be remapped and the pending count cleared.)
I'm curious if anyone has a better suggestion. I've purchased a copy of 'spinrite' and could use that to checkout the drive. At the moment I haven't taken anything apart to do that yet. (I'd have to remove the drive from the server.) I'm also wondering if this is a minor issue or if I should be ordering a replacement.
Drive is a WD 3TB Red NAS (model WD30EFRX) And of course it's out of warranty.
There's a lot more information in the SMART reply and if anyone is interested in that it can be viewed at https://pastebin.com/cs78mkMQ
Edit.01: 'badblocks' seems to make the drive unresponsive (as when I originally found it.) That seemed to happen about the time I expected it to hit the bad block (again, if my calculations were correct.) I will need to pull it to perform further diagnostics. I'm also looking at a replacement drive. I'm thinking about an HGST 8TB Helium filled drive model HUH728080ALE600. Backblaze is seeing good results with this drive. (Go big or go home!) part of the reason for upping the size is a potential migration to ZFS. Due to heavy hard linking, my backups nearly double in size when migrated to a new FS. 'rsync' can't preserve hard links for large data sets. That drive is $190US on Amazon presently.
Edit.02 Spinrite on the PC available to run it on doesn't seem to work for a 3TB drive. Support request submitted.
Edit.02.1 I asked the vendor of the HGST drive listed above about the warranty and they replied with "30 day return policy." I did some searching and found that WD has a site where a user can check warranty status and some drives come up 'warranty provided by seller.' I've never had a drive fail within the warranty period but I'm reluctant to rely on the seller. And now I need to check warranty status on some other cheap HGST drives purchased in the past.
Edit.03 Reply from GRC. Spinrite doesn't grok GPT partitioned drives or drives bigger than 2TB. I withdraw my previous recommendation for their product.