Monday, November 15, 2010

ghetto HFS+ file recovery

So, this one is pretty sweet. My wonderful wife had a minor accident with a customer's sequence data and a slightly botched tar cvf, (never forget the - option!) that resulted in a clobbered 10GB file of non recoverable data that was shipped to us on a honking big USB drive. The disk was a 2TB HFS+ filesystem (always a brute to recover from - most folks say that it is pretty much impossible).

So we set about and downloaded Data Rescue III (very cool program). However, it was predicting >150 hours to first scan the drive for lost files. Even if successful, it would only recover 10MB in demo mode, so $100 would have been needed to get this back, and possibly a week of scan time. However, Michele had an idea of what the file contained so was able to reduce this into a few seconds by doing a random search through the disk with dd. 
Really rather useful - and yet again proves that dd is one of the true swiss army knife cli applications of UNIX!

[root] imichele:~ $ more search_disk

$count  = 1000000;
$blocks = 3905654784;
$bsize  = 512;

while (my $i < $count) {
   $pos = int(rand()*$blocks);
   print "Seeking at $pos\n";
   $cmd = "dd if=/dev/disk2 iseek=$pos bs=$bsize count=1";

   open (IN,"$cmd |");

   while () {
      print "Pos $pos $_";

Running this gives the following kind of thing:

Seeking at 1650713344

Seeking at 1330318437
Pos 3446386821 ghhhhgahhhhhh
Pos 3446386821 @HWUSI-EAS1759_0010_FC62LL1AAXX:5:79:6721:12735#0/1
Pos 3446386821 +HWUSI-EAS1759_0010_FC62LL1AAXX:5:79:6721:12735#0/1

This line is in the right file  61LL1AAXX is the flowcell  :5: is the lane and #0/1 is the read -  only took a few pages of output to find this.  Of course this only works well if you have few large files and know roughly what's in the file.

Pos 3446386821 +HWUSI-EAS1759_0010_FC62LL1AAXX:5:79:6721:12735#0/1

Back up the dd until you find the beginning of the file (we did this by hand)

 557 dd if=/dev/disk2 iseek=3409759900 count=1
 558  dd if=/dev/disk2 iseek=3409760000 count=1
 559  dd if=/dev/disk2 iseek=3409761000 count=1
 560  dd if=/dev/disk2 iseek=3409760500 count=1
 561  dd if=/dev/disk2 iseek=3409760600 count=1
 562  dd if=/dev/disk2 iseek=3409760300 count=1
 563  dd if=/dev/disk2 iseek=3409760400 count=1
 564  dd if=/dev/disk2 iseek=3409760350 count=1
 565  dd if=/dev/disk2 iseek=3409760310 count=1
 566  dd if=/dev/disk2 iseek=3409760320 count=1
 567  dd if=/dev/disk2 iseek=3409760311 count=1
 568  dd if=/dev/disk2 iseek=3409760312 count=1
 569  dd if=/dev/disk2 iseek=3409760313 count=1
 570  dd if=/dev/disk2 iseek=3409760314 count=1
 571  dd if=/dev/disk2 iseek=3409760315 count=1
 572  dd if=/dev/disk2 iseek=3409760316 count=1
 573  dd if=/dev/disk2 iseek=3409760317 count=1
 574  dd if=/dev/disk2 iseek=3409760318 count=1
 575  dd if=/dev/disk2 iseek=3409760319 count=1
 576  dd if=/dev/disk2 iseek=3409760320 count=1
 577  dd if=/dev/disk2 iseek=3409760319 count=1
 578  dd if=/dev/disk2 iseek=3409760320 count=1

Finally run dd with a big block size into a new file

dd if=/dev/disk2 iseek=3409760320 count=3905654784 > s_5_1_sequence.txt.recovered

Done and done!


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff