Sunday, April 1, 2012

turn xfs_repair up to eleven with ag_stride...

If you are in a hurry to repair an XFS volume (and to be honest who ever isn't!), it turns out you can use this trick to run a parallel version of xfs_repair. It chunks up the allocation groups with a stride size parameter:
xfs_repair -o bhash=16384 -o ihash=16384 -o ag_stride=16 \
/dev/mapper/deep-deep_lv >& /tmp/repair.log

and away we go carving up the allocation groups into 16 membered chunks:
- agno = 0
- agno = 16
- agno = 32
- agno = 48
- agno = 64
- agno = 80
- agno = 96
etc..

and you get a spot of progress in the output:
- 12:44:35: process known inodes and inode discovery 
            13,811,776 of 151,658,048 inodes done

- 12:44:35: Phase 3: elapsed time 14 minutes, 31 seconds 
            processed 951,442 inodes per minute

- 12:44:35: Phase 3: 9% done - estimated remaining time 
            2 hours, 24 minutes, 52 seconds

This is really useful when you want to let folks know that progress is being made on any given repair. Our file system was extremely sick yesterday due to a massive and extremely rare hardware failure on the PCI riser:

http://blog.jcuff.net/2012/04/xfs-minus-fun-and-profit.html

So having this has been a huge help in guessing how long things are going to take on this part of the recovery - we are restoring from backup also, but I wanted to see if we could salvage the data on this 400T array while the guys worked like absolute trojans getting the backups and readonly copies of important files to our customers. Here's where we are at as I type this, looks like phase three recovery is on track at 2.5hrs and hopefully we don't see anything horrible in phases four through seven. Our error log alone currently stands at over 5.5GB...
[root@nssdeep /tmp]# du -sh repair.log 
5.4G repair.log  (eek!)

[root@nssdeep /tmp]# grep remaining repair.log
12:44:35: Phase 3: 9% done - estimated remaining time 2 hours, 24 minutes
12:59:35: Phase 3: 22% done - estimated remaining time 1 hour, 43 minutes
13:14:35: Phase 3: 35% done - estimated remaining time 1 hour, 22 minutes
13:29:35: Phase 3: 48% done - estimated remaining time 1 hour, 3 minutes
13:44:35: Phase 3: 60% done - estimated remaining time 47 minutes
13:59:35: Phase 3: 73% done - estimated remaining time 32 minutes
14:14:35: Phase 3: 86% done - estimated remaining time 16 minutes

However, it turned out that we also had a version of xfs_repair that was segfaulting like this version Tom Crane talked about on the XFS mailing list so we followed his lead and pulled in the git repo this morning to see if we could speed up this four day nightmare!
    47 11:21 git clone git://oss.sgi.com/xfs/cmds/xfsprogs
    48 11:21 cd xfsprogs/
    49 11:22 yum install uuid-devel
    50 11:23 yum install libuuid-devel
    51 11:23 make
    52 11:25 make install

and then we were off to the races! The current git copy:
[root@nssdeep /tmp]# xfs_repair -V
xfs_repair version 3.1.8

has all you need to not segfault when attempting recovery of filesystems. Reduced what was looking like a four day recovery event into what looks like it will be 4 or so hours. You can see the system hitting disk with "iotop" (never leave home w/o this!):
Total DISK READ: 88.41 M/s | Total DISK WRITE: 2.79 M/s
  TID  PRIO  USER     DISK READ  DISK WRITE  SWAPIN     IO>    COMMAND
16207 be/4 root     1995.30 K/s  109.76 K/s  0.00 % 99.99 % xfs_repair
16164 be/4 root        3.65 M/s   94.08 K/s  0.00 % 99.15 % xfs_repair
16194 be/4 root        3.77 M/s  156.80 K/s  0.00 % 99.11 % xfs_repair 
16187 be/4 root        2.89 M/s   54.88 K/s  0.00 % 98.41 % xfs_repair
16169 be/4 root        2.63 M/s  125.44 K/s  0.00 % 97.90 % xfs_repair 
16061 be/4 root        2.20 M/s  160.72 K/s  0.00 % 97.68 % xfs_repair
16082 be/4 root        2.87 M/s  101.92 K/s  0.00 % 97.10 % xfs_repair 
16104 be/4 root        3.78 M/s  152.88 K/s  0.00 % 96.90 % xfs_repair 
16093 be/4 root        4.11 M/s  172.48 K/s  0.00 % 96.86 % xfs_repair 
16074 be/4 root        4.05 M/s  117.60 K/s  0.00 % 96.84 % xfs_repair

etc..
We noticed without this version of xfs_repair, you are stuck on one thread, and hoping your file system will come back.

We absolutely owe Christoph Hellwig (http://rechner.lst.de/~hch/), Tom Crane from Royal Holloway and the ever present Dave Chinner, a huge debt of thanks for pointing us in the right direction with their mailing list - not the first time their knowledge and skill has been a real bacon saver!



[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff