• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Zpool scrub

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

aftermath

Member
Joined
Jan 29, 2002
Location
The Big Brother Nation
Hello
Everything seems to be going pair shaped.
Every time I do a zpool status -v I get a list of unrecoverable files, yet I have no errors on any disk.

I'm going to log in to the box to copy paste some stuff 1 sec


Ok I have an error on a new disk.
I have 1 Segate and 5WD sigh
Code:
root@Baldr:/home/jackf# zpool status -v
  pool: ZStore
 state: ONLINE
status: One or more devices has experienced an error resulting in data
	corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
	entire pool from backup.
   see: http://zfsonlinux.org/msg/ZFS-8000-8A
  scan: scrub in progress since Fri Feb 14 19:26:06 2014
    889G scanned out of 13.5T at 492M/s, 7h28m to go
    8K repaired, 6.43% done
config:

	NAME                                            STATE     READ WRITE CKSUM
	ZStore                                          ONLINE       0     0     0
	  raidz1-0                                      ONLINE       0     0     0
	    wwn-0x600508b1001c1a6742975f7ff618a496      ONLINE       0     0     0
	    wwn-0x600508b1001cdc7b0db0090bfb75224b      ONLINE       0     0     0
	    wwn-0x600508b1001c183c1b8dc0416451ee85      ONLINE       0     0     0
	  raidz1-2                                      ONLINE       0     0     0
	    wwn-0x600508b1001c562e66e01a449473ba8e      ONLINE       0     0     0
	    wwn-0x600508b1001c1a70ffd312937841a491      ONLINE       0     0     0
	    wwn-0x600508b1001c841f1fd22b8a9328cba2      ONLINE       0     0     2  (repairing)
	logs
	  scsi-3600508b1001c27179a1954f6520c732b-part3  ONLINE       0     0     0

errors: Permanent errors have been detected in the following files:

        ZStore/data:<0x7>
        /mnt/data/ISCSI/VR/lun1.img
        /mnt/data/FC/Freyja/lun-FR1tb.img
        ZStore/data:<0x10>
root@Baldr:/home/jackf#

SO it could be that?
 
Last edited:
Code:
wwn-0x600508b1001c841f1fd22b8a9328cba2      ONLINE       0     0     2  (repairing)

That is the drive that is failing and that is certainly an error. You will need to replace it or figure out what is going on.
 
It's about 2/3 weeks old.
Could it be anything else? Cable? Expander? SAS Controller Cache?

Can I run low level diags with it as part of the pool?

I deleted and remade my *.img that I use for LUNs today errors have been reoccurring.

I have 4 IDE IDE Seagate disks that are 500G legends! Just work.
I got all paranoid that a batch of dame disks would be a world of pain. Planing to put a Hitachi in each Z1 in 6 months.

Can I run Seatools from linux? Time to go google!
 
Drives that are new fail pretty frequently, compared to ones that has had time to "break in". It could be a cable, sure. I doubt it is the expander or cache, as other drives would have the same problems.

Check the SMART data with smartctl (mine is in the package smartmontools).

Code:
smartctl -H /dev/disk/by-id/wwn-0x600508b1001c841f1fd22b8a9328cba2
Then post the output here.
 
I don't have access to SMART directly AND I got the disk wrong.

It's one of the WD's I have done a clear. on the zpool going to see what happens.

Now even more parranoid about having a batch of identical drives.

root@Baldr:/home/jackf# smartctl -H /dev/disk/by-id/wwn-0x600508b1001c841f1fd22b8a9328cba2
smartctl 5.41 2011-06-09 r3365 [x86_64-linux-3.2.51-scst-enabled] (local build)
Copyright (C) 2002-11 by Bruce Allen, http://smartmontools.sourceforge.net

SMART Health Status: OK


Code:
=> ctrl all show config detail

Smart Array P212 in Slot 6
   Bus Interface: PCI
   Slot: 6
   Serial Number: PACCP9SXO3N6
   Cache Serial Number: PACCQ9SXD1EU
   RAID 6 (ADG) Status: Disabled
   Controller Status: OK
   Hardware Revision: C
   Firmware Version: 6.40
   Rebuild Priority: Medium
   Expand Priority: Medium
   Surface Scan Delay: 15 secs
   Surface Scan Mode: Idle
   Queue Depth: Automatic
   Monitor and Performance Delay: 60  min
   Elevator Sort: Enabled
   Degraded Performance Optimization: Disabled
   Inconsistency Repair Policy: Disabled
   Wait for Cache Room: Disabled
   Surface Analysis Inconsistency Notification: Disabled
   Post Prompt Timeout: 0 secs
   Cache Board Present: True
   Cache Status: OK
   Cache Ratio: 25% Read / 75% Write
   Drive Write Cache: Enabled
   Total Cache Size: 256 MB
   Total Cache Memory Available: 144 MB
   No-Battery Write Cache: Disabled
   Cache Backup Power Source: Batteries
   Battery/Capacitor Count: 1
   Battery/Capacitor Status: OK
   SATA NCQ Supported: True

   Array: A
      Interface Type: SAS
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 1
         Size: 558.7 GB
         Fault Tolerance: 5
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 512 KB
         Status: OK
         Caching:  Enabled
         Parity Initialization Status: Initialization Completed
         Unique Identifier: 600508B1001C27179A1954F6520C732B
         Disk Name: /dev/sda
         Mount Points: /media/usb0 8.7 GB, /var 2.8 GB, /usr 45.9 GB, /home 419.1 GB, /tmp 3.3 GB
         OS Status: LOCKED
         Logical Drive Label: A2C9EE43PACCRID10420XY191CD
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 15000
         Firmware Revision: HPD6
         Serial Number: 3QP02QCD00009846UDL3
         Model: HP      DF0300B8053     
         Current Temperature (C): 31
         Maximum Temperature (C): 56
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 15000
         Firmware Revision: HPD6
         Serial Number: 3QP02QCD00009846UDL3
         Model: HP      DF0300B8053     
         Current Temperature (C): 31
         Maximum Temperature (C): 56
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SAS
         Size: 300 GB
         Rotational Speed: 15000
         Firmware Revision: HPD6
         Serial Number: 3QP02QCD00009846UDL3
         Model: HP      DF0300B8053     
         Current Temperature (C): 31
         Maximum Temperature (C): 56
         PHY Count: 2
         PHY Transfer Rate: 3.0Gbps, Unknown


   Array: B
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 2
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C1A6742975F7FF618A496
         Disk Name: /dev/sdb
         Mount Points: None
         Logical Drive Label: A2D969C1PACCRID10420XY15025
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Firmware Revision: 80.00A80
         Serial Number:      WD-WCC1T1177765
         Model: ATA     WDC WD30EFRX-68A
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 25
         Maximum Temperature (C): 36
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Array: C
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 3
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001CDC7B0DB0090BFB75224B
         Disk Name: /dev/sdc
         Mount Points: None
         Logical Drive Label: A2D959E3PACCRID10420XY15667
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Firmware Revision: 80.00A80
         Serial Number:      WD-WCC1T1133367
         Model: ATA     WDC WD30EFRX-68A
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 25
         Maximum Temperature (C): 36
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Array: D
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 4
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C183C1B8DC0416451EE85
         Disk Name: /dev/sdd
         Mount Points: None
         Logical Drive Label: A2D949FCPACCRID10420XY1BFB3
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Firmware Revision: 80.00A80
         Serial Number:      WD-WCC1T1180348
         Model: ATA     WDC WD30EFRX-68A
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 25
         Maximum Temperature (C): 36
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Array: E
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 5
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C562E66E01A449473BA8E
         Disk Name: /dev/sde
         Mount Points: None
         Logical Drive Label: A2AB4670PACCRID10420XY1C487
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Rotational Speed: 5900
         Firmware Revision: SC43    
         Serial Number:             Z300NVFM
         Model: ATA     ST3000VN000-1H41
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 23
         Maximum Temperature (C): 26
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Array: F
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 6
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C1A70FFD312937841A491
         Disk Name: /dev/sdf
         Mount Points: /media/usb1 2.7 TB
         Logical Drive Label: A2AB5679PACCRID10420XY1D2AC
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Firmware Revision: 80.00A80
         Serial Number:      WD-WCC1T1167742
         Model: ATA     WDC WD30EFRX-68A
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 25
         Maximum Temperature (C): 36
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Array: G
      Interface Type: SATA
      Unused Space: 0  MB
      Status: OK
      Array Type: Data



      Logical Drive: 7
         Size: 2.7 TB
         Fault Tolerance: 0
         Heads: 255
         Sectors Per Track: 32
         Cylinders: 65535
         Strip Size: 256 KB
         Full Stripe Size: 256 KB
         Status: OK
         Caching:  Enabled
         Unique Identifier: 600508B1001C841F1FD22B8A9328CBA2
         Disk Name: /dev/sdg
         Mount Points: None
         Logical Drive Label: A2AB667FPACCRID10420XY1B63B
         Drive Type: Data

      physicaldrive 1I:1:0
         Port: 1I
         Box: 1
         Bay: 0
         Status: OK
         Drive Type: Data Drive
         Interface Type: SATA
         Size: 3 TB
         Firmware Revision: 80.00A80
         Serial Number:      WD-WCC1T1156141
         Model: ATA     WDC WD30EFRX-68A
         SATA NCQ Capable: True
         SATA NCQ Enabled: True
         Current Temperature (C): 25
         Maximum Temperature (C): 36
         PHY Count: 1
         PHY Transfer Rate: 3.0Gbps


   Enclosure SEP (Vendor ID HP, Model HP14HDD) 248
      Device Number: 248
      Firmware Version: 2.02
      WWID: 50001C1071540013
      Port: 1I
      Box: 1
      Vendor ID: HP      
      Model: HP14HDD         

   Expander 250
      Device Number: 250
      Firmware Version: 2.02
      WWID: 50001C1071540000
      Port: 1I
      Box: 1
      Vendor ID: HP      

   SEP (Vendor ID PMCSIERA, Model  SRC 8x6G) 249
      Device Number: 249
      Firmware Version: RevC
      WWID: 50014380056D58E0
      Vendor ID: PMCSIERA
      Model:  SRC 8x6G
 
If it only spit out "health status ok" and no parameters, it doesn't have access to read the stats. You will need to pull the drive and test it in another system.
 
fwiw I had this issue when I added two more 4tb drives to my array. There was no consistency in the drive errors when I changed controllers, cables, etc.. I think it went through two scrubs, the second completed without error, and i've never had any issues since.

It is worth noting, however, that I did not get any messages like you did with file names that have issues.

Do you have sync turned off on your pool that you're using for iscsi targets?
 
Thanks Both for your replies
fwiw I had this issue when I added two more 4tb drives to my array. There was no consistency in the drive errors when I changed controllers, cables, etc.. I think it went through two scrubs, the second completed without error, and i've never had any issues since.

It is worth noting, however, that I did not get any messages like you did with file names that have issues.

Do you have sync turned off on your pool that you're using for iscsi targets?

I'm guessing I don't as I don't understand the question.

Been looking back through my logs and I have not always been unmounting the ZFS, It looks like the SCST component/service my set up uses to host FiberChannel targets does not stop and wont allow the ZFS to unmount, the system goes and shuts down anyway.

Also I would not have expected to lose files. 1 disk has an error its raid 5 equivelent surely the parity + checksums would allow the files to continue being read? If they can't then I'm stuffed.
:cry: I expect the corruption was caused by the unclean shut down.
 
Thanks Both for your replies


I'm guessing I don't as I don't understand the question.

Been looking back through my logs and I have not always been unmounting the ZFS, It looks like the SCST component/service my set up uses to host FiberChannel targets does not stop and wont allow the ZFS to unmount, the system goes and shuts down anyway.

Also I would not have expected to lose files. 1 disk has an error its raid 5 equivelent surely the parity + checksums would allow the files to continue being read? If they can't then I'm stuffed.
:cry: I expect the corruption was caused by the unclean shut down.

you're probably correct in your assumption on the unclean shutdown, especially if it's still writing or trying to write when the array goes down. I only use NFS so I can't speak to exactly what you're trying to do here.
 
Back