Sunday, December 5, 2010

swift DNA blastn, with a quick png picture ca. 2002 vintage

Sometimes even in 2010, you just really, really need to do things in a hurry. Like mapping a small set of reference sequences against a set of multiple DNA databases to prove existence or location - simple stuff. Then you also probably want to quickly draw a simple reference picture? Well back in the bad old days, this used to be a nightmare to do with any kind of velocity. Now with *much* faster CPUs, and way, way better code, it is fortunately trivial. However, you do have to remember that a considerable number of giants have gone before you; remember to always use their epic foo!

Here's an example from this year (2010) using some of the skills of (2002 vintage): http://www.bioperl.org/wiki/HOWTO:Graphics


Installing all of bioperl (in 2010) is now as simple as:

sudo apt-get install bioperl
sudo apt-get install libbio-graphics-perl
A quick edit of the example code to handle STDIN, and we are set:
my $searchio = Bio::SearchIO->new(-fh   => \*STDIN,
Then you can simply pipe the blast search of your reference DNA against the two databases you were unclear contained it, into your bioperl script, and output the png file you need right into imagemagick. We always have to love oneliners like this:
blast2 -p blastn -i ./reference.dna -d "./all.dna ./454Scaffolds.dna" | ./parse.pl | display


Hurray BioPerl!!!

[the old school parse.pl code from BioPerl that
draws the pictures comes after the break]


jcuff@srv:/data/assembly$ cat parse.pl
#!/usr/bin/perl
# This is code example 4 in the Graphics-HOWTO
use strict;
use Bio::Graphics;
use Bio::SearchIO;
use Bio::SeqFeature::Generic;
my $searchio = Bio::SearchIO->new(-fh => \*STDIN,
-format => 'blast') or die "parse failed";
my $result = $searchio->next_result() or die "no result";

my $panel = Bio::Graphics::Panel->new(
-length => $result->query_length,
-width => 600,
-pad_left => 5,
-pad_right => 5,
);

my $full_length = Bio::SeqFeature::Generic->new(
-start => 1,
-end => $result->query_length,
-display_name => $result->query_name,
);

$panel->add_track($full_length,
-glyph => 'arrow',
-tick => 2,
-fgcolor => 'black',
-double => 1,
-label => 1,
);

my $track = $panel->add_track(
-glyph => 'graded_segments',
-label => 1,
-connector => 'dashed',
-bgcolor => 'blue',
-font2color => 'red',
-sort_order => 'high_score',
-description => sub {
my $feature = shift;
return unless $feature->has_tag('description');
my ($description) = $feature->each_tag_value('description');
my $score = $feature->score;
"$description, score=$score";
},
);

while( my $hit = $result->next_hit ) {
next unless $hit->significance < 1E-5; my $feature = Bio::SeqFeature::Generic->new(
-score => $hit->raw_score,
-display_name => $hit->name,
);

while( my $hsp = $hit->next_hsp ) {
$feature->add_sub_SeqFeature($hsp,'EXPAND');
}
$track->add_feature($feature);
}

print $panel->png;


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff