Friday, July 31, 2015

How does your cluster sound?

So I was on the airplane coming back from XSEDE15 in St. Louis, and got to thinking about all the amazing visualizations that were on display. I wondered. What would a cluster sound like? On our HPC cluster we have millions of jobs running each month, and often 10-20,000 running simultaneously. So I decided to go on a hunt for a MIDI player and a MIDI file format generator.

Found both in seconds, the internet is awesome!

First up a player for OSX (you don't do the --with-libsndfile, it won't work at the end):
brew install libsndfile lame
brew install --with-libsndfile fluidsynth

Now download a soundfont (wow this takes me back!)


and we have the musics!

Jamess-MacBook-Pro:GeneralUser GS 1.44 FluidSynth jcuff$ fluidsynth -i ./GeneralUser\ GS\ FluidSynth\ v1.44.sf2 demo\ MIDIs/All\ Night\ Long.mid
FluidSynth version 1.1.6
Copyright (C) 2000-2012 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of E-mu Systems, Inc.

Ok so we can play a midi file from the CLI. Time to write one now. And yet again, the internet provides:

So here we go:
Jamess-MacBook-Pro:~ jcuff$ wget
Jamess-MacBook-Pro:~ jcuff$ unzip 
Jamess-MacBook-Pro:~ jcuff$ cd MIDIUtil-0.89
Jamess-MacBook-Pro:~ jcuff$ ls
Jamess-MacBook-Pro:~ jcuff$ python ./ install
Jamess-MacBook-Pro:~ jcuff$ sudo python ./ install
Jamess-MacBook-Pro:~ jcuff$ python ./examples/ 
Jamess-MacBook-Pro:~ jcuff$ file output.mid 

output.mid: Standard MIDI data (format 1) using 1 track at 1/960

Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ fluidsynth -v -i ../GeneralUser\ GS\ 1.44\ FluidSynth/GeneralUser\ GS\ FluidSynth\ v1.44.sf2 output.mid 
FluidSynth version 1.1.6
Copyright (C) 2000-2012 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of E-mu Systems, Inc.

fluidsynth: noteon 0 60 100 00000 0.975 1.113 0.000 0
fluidsynth: noteoff 0 60 0 00000 1.612 1

Ok - so we can play notes from the command line. Time to knock up a parser from sacct data, let's use -p and -l so we can actually parse the data that comes out… :-)

[root@sa01 tmp]# sacct -p -l > /tmp/sacct.dat
[root@sa01 tmp]# wc -l /tmp/sacct.dat
58651 /tmp/sacct.dat

Ok so we have some rich sacct data for our 58,651 jobs in the system right now. Let's look at the ones that completed, and ran for one day, write some weird chord generator, and then add Michele Clamp's epic python under a 1 hour rather extreme pair programing episode and we give you…


Here's the code for your enjoyment and modification.

So what does your cluster sound like?


You can get after your sound file to post to your sound cloud site with this:
Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ python ./ -f tt -s 1 -e 4

Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ fluidsynth -F out.wav -i ../GeneralUser\ GS\ 1.44\ FluidSynth/GeneralUser\ GS\ FluidSynth\ v1.44.sf2 output.mid 
FluidSynth version 1.1.6
Copyright (C) 2000-2012 Peter Hanappe and others.
Distributed under the LGPL license.
SoundFont(R) is a registered trademark of E-mu Systems, Inc.

Rendering audio to file 'out.wav'..

Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ lame out.wav 
LAME 3.99.5 64bits (
Using polyphase lowpass filter, transition band: 16538 Hz - 17071 Hz
Encoding out.wav to out.mp3
Encoding as 44.1 kHz j-stereo MPEG-1 Layer III (11x) 128 kbps qval=3
    Frame          |  CPU time/estim | REAL time/estim | play/CPU |    ETA 
 27776/27776 (100%)|    0:17/    0:17|    0:18/    0:18|   40.386x|    0:00 
   kbps        LR    MS  %     long switch short %
  128.0       50.4  49.6        99.6   0.3   0.2
Writing LAME Tag...done
ReplayGain: +10.1dB

Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ open out.mp3 

And finally, here's the code!
Jamess-MacBook-Pro:MIDIUtil-0.89 jcuff$ cat ~/Downloads/ 

# wget
# unzip 
# cd MIDIUtil-0.89
# python sudo ./ install

# Fluidsynth has better noises :
# brew install --with-libsndfile fluidsynth
# wget
# unzip this into a directory

# python -f sacct.dat -s 17 -e 25  (Writes into output.mid)

# fluidsynth -F output.wav -i ./GeneralUser\ GS\ 1.44\ FluidSynth/GeneralUser\ GS\ FluidSynth\ v1.44.sf2 output.mid

from argparse  import ArgumentParser
from random    import randint

import math
import re

from midiutil.MidiFile import MIDIFile

parser        = ArgumentParser(description = 'Convert sacct data to midi')

parser.add_argument('-f','--file'     ,      help="The sacct data file")
parser.add_argument('-s','--programstart'  , help="The instrument program start")
parser.add_argument('-e','--programend'  ,   help="The instrument program end")
parser.add_argument('-b','--bpm'  ,          help="Beats per minute")

args = parser.parse_args()

programstart = 1 
programend   = 1
bpm          = 120

if args.programstart is not None:
   programstart = int(args.programstart)

if args.programend is not None:
   programend= int(args.programend)

if args.bpm is not None:
   bpm = int(args.bpm)

fh = open(args.file)

MyMIDI = MIDIFile(1)

track = 0
time  = 0

MyMIDI.addTrackName(track,time,"Sample Track ")
MyMIDI.addTempo(track,time, bpm)

lnum = 0 
daysecs = 24*60*60

maxtime  = 0

for line in fh:
   lnum = lnum + 1

   if lnum == 1:

   line    = line.rstrip('\n')
   ff      = line.split('|')

   cores   = int(ff[21])
   elapsed = ff[22]
   status  = ff[23]

   if lnum%10 != 0:

   if status != "COMPLETED":

   tt = elapsed.split(':')
   channel  = 0

   duration = cores+1
   volume   = 90

   if len(tt) == 3 and elapsed > 0 and cores != 0 and '-' not in elapsed:

     program = randint(programstart,programend)
     secs    = int(tt[0])*60*60 + int(tt[1])*60 + int(tt[2])
     newsecs = 10 + int(secs*(127-10)/daysecs)
     #time    = secs*120.0/float(daysecs)
     #pitch   = cores
     pitch    = newsecs 

     MyMIDI.addProgramChange(track,channel, time, program)

     print tt[0],tt[1],tt[2],secs,newsecs,daysecs,program,time


     tmppitch = pitch + 12
     if tmppitch > 127:
       tmppitch = 127


     if time > maxtime:
        maxtime = time

     tmp = float((math.sqrt(cores))/2.0)

     if tmp > 10.0:
       tmp = 10.0
     time = float(time) + tmp

print maxtime

i =  0


while i < maxtime:
   i = i + 1

# And write it to disk.
binfile = open("output.mid", 'wb')

Thursday, April 16, 2015

Of huge pages and huge performance hits, are we alone?

We do a fair amount of sequence analysis here. One thing we do a lot of is trimming sequence data. The files are somewhat large. I'm not allowed to call this "big data" :-) There's a neat trimming code called "trimmomatic" (awesome name eh?). It's a simple enough piece of java, but interacts poorly with our machines, and it turns out it as a code is not alone.

We have a huge page table issue.

A very big one.

So turns out, khugepaged manages mapping pages in memory, and when you have large codes such as this one that pull a pair of 7GB data files together modify them and then try and get them out to disk as fast as you can you can see it will stress any machine. We used local storage for this and kept it simple.
[jcuff@regal01 dist]# time java -jar ./jar/trimmomatic-0.33.jar PE -phred33 ./FR4_P_pilosa_CTTGTA.R1.fastq.gz ./FR4_P_pilosa_CTTGTA.R2.fastq.gz outr1.dat outr2.dat outr1un.dat outr2un.dat ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

TrimmomaticPE: Started with arguments: -phred33 ./FR4_P_pilosa_CTTGTA.R1.fastq.gz ./FR4_P_pilosa_CTTGTA.R2.fastq.gz outr1.dat outr2.dat outr1un.dat outr2un.dat ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Multiple cores found: Using 16 threads

ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Input Read Pairs: 25776503 Both Surviving: 23790480 (92.30%) Forward Only Surviving: 1901563 (7.38%) Reverse Only Surviving: 49159 (0.19%) Dropped: 35301 (0.14%)

TrimmomaticPE: Completed successfully

So how did we do?

real 57m47.317s
user 44m24.784s
sys 435m41.152s

Yeah that's pretty slow.

While we were running we saw khugepaged @ 100% in top and then in "perf":

25022 root      39  19    0    0    0 R 100.0  0.0  6:23.13 khugepaged

[root@regal01 dist]# perf top

Samples: 191K of event 'cycles', Event count (approx.): 104146992710
 75.44%  [kernel]             [k] _spin_lock_irqsave
  4.21%  [kernel]             [k] _spin_lock_irq
  1.02%              [.] logicalSubscript
  0.83%  [kernel]             [k] ____pagevec_lru_add

Never good to be in _spin_lock_irq

Now so let's take out our THP (transparent huge page)

[root@regal01 dist]# echo never > /sys/kernel/mm/transparent_hugepage/enabled

How did we do?
[jcuff@regal01 dist]# time java -jar ./jar/trimmomatic-0.33.jar PE -phred33 ./FR4_P_pilosa_CTTGTA.R1.fastq.gz ./FR4_P_pilosa_CTTGTA.R2.fastq.gz outr1.dat outr2.dat outr1un.dat outr2un.dat ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36

TrimmomaticPE: Started with arguments: -phred33 ./FR4_P_pilosa_CTTGTA.R1.fastq.gz ./FR4_P_pilosa_CTTGTA.R2.fastq.gz outr1.dat outr2.dat outr1un.dat outr2un.dat ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Multiple cores found: Using 16 threads

ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences

Input Read Pairs: 25776503 Both Surviving: 23790480 (92.30%) Forward Only Surviving: 1901563 (7.38%) Reverse Only Surviving: 49159 (0.19%) Dropped: 35301 (0.14%)

TrimmomaticPE: Completed successfully

Drum roll please!
real 3m23.022s
user 6m22.666s
sys 0m38.528s


That's a pretty big deal. I'll always take a 20x speed up if I can get it.

Turns out we are not alone:

There's an ongoing discussion here about the current state of the art.

Mean time we are going to disable our THP.

Have others seen this? Happy to see the comments.

Thursday, February 5, 2015

Please, purchase my storage solution....

CUE: Story opens, In a small office at a research computing department:

Endearing Storage Vendor: ".... so, now you have seen our technology, you will want to purchase our one of a kind "storage solution" you will be inordinately happy and immediately absolved of any and all future storage issues... forever. We guarantee it! We would truly love to partner with you, we have a unique, one of a kind system. Once we install your system, you can basically take two weeks off, but also in the meantime we will arrange to get your hair to grow back, and I will buy you many beautiful steak dinners... you are feeling very sleepy... but very satisfied with your decision to partner with us, it is a one of a kind product, did I tell you that our CTO invented....."

(beautiful harp music plays in the background)

Research Computing Director [Dreaming] : Oh wow, this stuff sounds absolutely fantastic, I bet I could finally sleep at night, the milk would never spill or go sour anymore. Life would finally be full of unicorns and rainbows! I so much want to live in this fantastic land of flawless storage, unlimited capacity, endless feature sets, complete 100.1% reliability and uptime, oh it's going to be so utterly awesome. In this world storage never, ever goes bad. Hold that thought I NEED to live in this world!! I MUST buy this storage array... I have to raise a PO!....

Endearing Storage Vendor: When I click my fingers you will awake, refreshed and ready to place your purchase order... 3..... 2...... 1.....

[Click] (director wide awake)

Research Computing Director [Sweating]: Whoah! What! Hang on! Wait Nooooo!!

CUE: Fade to black...

So, all joking aside I've been doing this job, and jobs much like it for years. I actually do know the exact storage system it is that exists in this dream from our little story above. And, well so given we are all friends here I'll take a moment to share the answer with you, let's keep it our little secret though, we should not let this trade secret get out.

Ok, so are you ready? Ok, so it's this one:

Not a single one of them!

Yep - you heard right folks, not a single one. I know I'm like a total heretic right?

You've probably all heard the endearing storage vendor promises... I have them in the archive, somewhere, let me go dig them out for you...

CUE: The clip of "The top 50 Most Endearing Storage Vendor Quotes":

"The competition are light years behind our technology! They are slower, more expensive, and totally unreliable, I mean they basically have NO clue! Our CTO literally invented the binary system!"

"That custom Linux kernel you hand rolled may be clever, but it does not scale. Our custom fork of Plan9 we use to power our ARM powered ASICS - it's quite literally lightyears ahead of the competition"

"Here, how about this... you can try our storage for no cost. I'll ask my manager so you can have a little bit for free - don't worry, we can talk price after your first petabyte migration"

"We vet every single patch upgrade before we release to our customers - rolling upgrades from any point release result in zero downtime"

"This storage will basically NEVER fail - people like Harva... oops sorry I can't disclose our clients, but they think it's totally wonderful, I can set up a call with Dr. X, he will totally vouch for how awesome we are."

"Let's not talk price just yet, let me show you how we use quantum laser effects to increase our redundancy and reliability"

"The next version has a completely redesigned API and REST interface, oh and it will be a seamless data in place update - don't worry"

"I want to take a little time to explain to you about our differential value"

"Let me take a moment to explain how we use a stronger steel frame for our cabinets, it is a key differentiator"

"The drives have a perpetual motion device as bearings, you can basically think of them as "physical flash drives"

"We run one of the two top advanced storage manufacturing plants located south of Basildon"

"Our disk magnets are sourced from an ancient salt mine just south of Las Vegas"

"We are in one of the top one worldwide soda manufacturers, we would tell you but we keep our clients confidential"

"We are unique in the market place. Our product is one of a kind. You need to understand our differential value. Let me set up a call with our CTO, so he can explain how this works at a deep technical level. Did we tell you our CTO invented the binary system?"

"You guys shouldn't waste your time building your own storage. We have an end to end solution for you."

"Putting all your storage under our single name space with our amazing technology will just make everything easier."

"Did I tell you already that our CTO invented the binary system?"

"Would you mind if I called some of your Faculty directly so I can show them our value? I don't want to go over your head or anything, but I really need to show them the value of our system, so they can see why you should buy this system."

"... and this was when our founders invented magnetism"

"Great question! Cluster quorum is maintained by a remote software as a service cloud"

"Our storage array was certified by the TSA, and is in use at 5 of the national airlines that fly out of Canada, we could tell you be we want to keep our clients confidential"

"Through our technology we have effectively achieved 200 nines of reliability, and 800 days of uptime a year"

"We have essentially redesigned how RAID works, let us show you the following algebra..."

"It is essentially a software defined storage stack written into a dedicated FPGA so it's very flexible..."

"You basically don't need backups any more!"

"Great question! I'll circle back with engineering and get right back to you - Steve be sure to take a note on that - great question!"

"I'll skip over these marketing slides so we can do a deep dive on our technology... oh just one thing while we are here, we do as you can see from this slide sell to all of your competitors, but anyway, let's get to the technology, oh and this customer here purchased 500 petabytes, ok moving on..."

"We call this feature RAID ONE MILLION. Yeah I know right? It really is literally that good."

"Cache coherence is on our roadmap"

"Hey let's get a round table with your engineering team. I'll bring our top people in so we can show your team our differential value, once your engineers see this they will be ready to convince you to purchase this storage."

"Great question! File locking is absolutely due for the next release"

Oh and the best ever...?

"This product literally pays for itself!"

So... I dunno about you, but unless this disk array prints twenty freaking dollar bills, that thing ain't paying for anything, least of all itself!

So as I said, it's been my day job to be "sold" to for a number of years now. I've quite possibly heard them all. They also say the easiest thing in the world is to sell a sales man, and I've been told that I'm a bit of a sales man, or at least I've been seen to play one on the T.V...

Even so...



p.s. I shall never, ever disclose my sources of "ESV" tee hee :-)

Wednesday, December 17, 2014

Of big microscopes and even bigger data...

We recently installed one of these awesome electron microscopes... In the center I help PI, we are imaging brains, but more about that another time. Right now this is all about getting this thing running, and running at speed, and some lovely UNIX geekery... I don't get anywhere near enough time these days to get my paws on a CLI, but I needed to stick my nose into this one!

It's extremely cool looking eh? However, it needs a fair amount of horsepower to just even "catch" the data that streams off it.  It is also a scientific instrument so of course the file system obviously ends up being more than just a little bit hairy.  For reference, here is the output of a single "run":
[root@storage]# du -sh .
6.6T .

[root@storage]# ls -RU | wc -l

[root@storage]# find -P . -type f | rev | cut -d/ -f2- | rev |  cut -d/ -f1-2 | cut -d/ -f2- | sort | uniq -c
  65226 001
  62994 002
  67458 003
  67954 004
  65226 005
  62994 006
  67458 007
  67954 008
  65226 009
  62994 010
  67458 011
  67954 012
  65226 013
  62994 014
  67458 015
  67954 016
  65226 017
  62994 018
  67458 019
  67954 020
  65226 021
   8559 022

So 1/2 million files in 6.6T with ca. 65,000 per dir and each image is about 612K...

Please stop me if you have heard *any* of this before :-)

Hehehehe :-)

Anyway, well, so our first task was to catch this stuff.  It flies in from the instrument at a rate of about 3TB an hour, out of eight distinct and separate windows acquisition servers writing out directly to a CIFS mount -- yeah I know, hashtag awesome right? More on SAMBA tuning at scale in another post...

So we benched our storage, a MD3260 with a couple of MD3260e expansion bricks making for a nice 0.6PB single image file system made out of 180 spindles tucked behind an R720.

Nothing too exotic, and at 3TB/hr design spec we need only a dedicated 10G, so we double bagged it, and popped a pair of 10 gee bee cards in the box, span up some LACP and so we were off to the races!

Until we weren't... Do you see the problem here:
41252 be/4 root 0.00 B/s  247.65 M/s  0.00 % 78.54 % dd if=/dev/zero of=test.dat
bs=1024k count=1000000000

Yeah, so that's 250MB/s peak, on the box, with no network in the way, direct to disk with caches working - which is about 0.9TB/hour...  Oh and this was also at about the same time the imaging center director called our group telling us that the microscope was broken, and that he thinks our network is very broken... Yep - a bad day in paradise this sure was turning out to be... and I couldn't even blame it on the network this time! :-)

So we are really not doing so well here. I poked about inside some of our other boxes... we run loads of this stuff... At first I was seeing the same results on some, and on others we were just fine... until I stumbled across one that was pulling nearly 800MB/s... I looked just a little closer at the config for the one that was working as I thought it was...

The default shipping is 4K, which is no good for streaming writes!

Arrggh - the sort of things we used to worry about in the '90s was back and in full effect. Flipped the button, all better.  Still not quite seeing decent performance though, the design spec with this number of spindles and 4 x 6Gb/s SAS wires should peak at... urrm types into google...

So I bust out a copy of the awesome bwm-ng:

Doh! We are not striping, so only using one of the four available 6Gb/s SAS lines... LVM has two modes of operation, Linear and Striping... we were using the Linear one, which was no good... so let's go fix it!
[root@storage]# umount /fs

[root@storage]# lvremove /dev/store_md32xx_vg/store_md32xx_lv
Do you really want to remove active logical volume store_md32xx_lv? [y/n]: y
  Logical volume "store_md32xx_lv" successfully removed

[root@storage]# lvcreate --extents 100%FREE --stripes 10 --stripesize 256 --name store_md32xx_lv store_md32xx_vg
  Logical volume "store_md32xx_lv" created

[root@storage]# lvs --segments
  LV                  VG                  Attr       #Str Type    SSize  
  store_md32xx_lv     store_md32xx_vg     -wi-a-----   10 striped 545.73t
  lv_root             vg_root             -wi-ao----    1 linear    1.09t

[root@storage]# mkfs.xfs /dev/mapper/store_md32xx_vg-store_md32xx_lv 

[root@storage]# mount -a

[root@storage fs]# dd if=/dev/zero of=test.dat bs=1024k count=1000000

Yay! we are now cpu bound on this dd… :-)

17773 root      20   0  103m 2780 1656 R 100.0  0.0   8:26.73 dd 

And now we are striped - so much better, nice balance across luns!

Major kudos to Justin Weissig's great corner of the internet for helping out!

We also removed the cache mirror - this is a catch it as fast as you can system, we can take the risk of controller issues for this application (don't try this at home kids!)... we put it back in the end, but wanted to make sure it was not a bottle.

And here we are all finished running eight at a time and pushing 2.6GB/s (woot!):

At this rate we can also support two more microscopes all off the same kit... with each of them running at full tilt!

How's about them apples for some serious price/performance eh? :-)

Oh and one more thing...

Hashtag BIG DATA... :-)

Thursday, October 23, 2014

OdyBot And Pointy Haired Alerting! AKA: Grumpy old man shakes fist at web two dot oh!

Hi all,

It's been ages since I last posted. We have been super busy in the day job running our monster high performance computing infrastructure and keeping lots of petabytes spinning and many scientists and researchers happy. I wanted to quickly note that we recently had a new addition to our group, supporting the Odyssey cluster.

Let me introduce:


Behind the scenes there are all sorts of fun activities, like checking that our data centers are neat and tidy and doing lots of awesome science:

from Harvard FAS Research Computing on Vimeo.

and sometimes just chilling out riding a skateboard around the yard:

OdyBot Gets Schooled
from Harvard FAS Research Computing on Vimeo.

You can find out all about OdyBot over at

Meanwhile we have had a couple of integration issues back at the ranch making sure that our awesome RC support staff are able to answer questions as our community asks them, and I think my old school methods finally got the better of me yesterday...

So, we use two online web services Userlike and hipchat to provide our external "voice" for OdyBot and for our internal communications. We wanted a quick way to post an alert to our main chat room when the operator count became zero.

Simple eh?

Well, kinda... and given that I'm not a child of the web 2.0 world I went about it in true UNIX style. A python script, with a unix pipe to a perl script... I'm sure sometimes I just do this stuff to wind up my team. :-) Anyway, here's the hipchat part, based on the awesome script with the following 2 second changes to allow it to read from stdin, and quote out the <CR>'s for all that down stream JSON cleverness...
bash-3.2$ diff 
>             "message=s"=> \$optionMessage,
< while (<>){
<   $optionMessage .= $_;
< }
< $optionMessage=~s/\n/\\n/g;

The nice thing about this is you can quickly post random stdin stuffs to you chat rooms:

Then I busted out some extremely suspect python:
bash-3.2$ cat 
import httplib2
import os
import json

API_HOST = os.environ.get('API_HOST') or ''
c = 0 
h = httplib2.Http()

resp, content = h.request(
        'Authorization': API_TOKEN
if resp.status == 200:
    data = json.loads(content)

    print 'Error status=%s' % resp.status

for x in data:
    if (x['slots']['online'] == 1):
      c = c + 1

cc = 1;

if (c>0):
    print "OdyBot Operators Online..."
    for x in data:
      if (x['slots']['online'] == 1):
        print ("%d) %s" % (cc, x['name']))
        cc = cc + 1

    print "Warning!  No OdyBot Ops Are Online!!"

Which works by just querying the Userlike API:
bash-3.2$ ./ 
OdyBot Operators Online...
1) James Cuff
2) Bob Freeman
3) John Noss
4) Dan Caunt

And there you have it - simple alerting to the main chat room when you need it to tell you that folks are not on the wire taking our OdyBot support requests! This is actually pretty important, we absolutely don't want to have our community waiting, and we have invested a lot of time and effort into the OdyBot concept so that our community can contact us. We also have an open OdyBot community list that folks inside and outside of Harvard can use to post questions, tips and techniques, although it's only just starting to ramp up right now.

Question is:

Could you do this integration in an even more ghetto/bandaid/bailing wire fashion :-)

p.s. we are also testing SLACK as a replacement for hipchat as I type. For pretty much the exact same reasons we had to replace Zopim with Userlike yesterday - as you can see, even with shoddy perl and python scripts, it is all about the integration these days!

No matter how ghetto the methods ;-) #allhailtheunixpipe

Friday, July 18, 2014

Of style and science...

There are times in your career that you really, really remember.

This was one of those times.

My then head of department, the dearly departed and most wonderful Professor Dame Louise Johnson wrote this note to my D. Phil. supervisor Geoff. back in 1997.  Geoff. recently sent me a copy while clearing out space to move into his fabulous new building over in Dundee.

To this day, I love that Louise who was an absolute scientific powerhouse said of my research:

"we thought the science was fine"

Although more importantly, her feedback about their concern for my writing style was what has really stuck with me over the years!

Nowt's changed much for me, be at rest Louise

Tuesday, June 24, 2014

Ohai Linux! So you are a network switch now...

Decided to see what the fuss was all about surrounding these open source switches. Plus the rocket powered turtle really did peak my interest ;-)

[ and ]

I built all of this on a CentOS release 6.5 (Final), and I wanted to build everything from source to really see how ONIE worked from the ground up. Don't try this at home kids, there is no reason to try and damage yourself.
git clone

Needed to add some deps, this was a little painful to find what was missing, much make fail, make, fail repeat, but this should be enough for most folks to run so you don't have to go through the iterations I did - this is a monster build. I learned a lot here, never having used "realpath" for example, or any of the syslinux kit which is fab!
sudo yum install realpath
sudo yum install gperf
sudo yum install stgit
sudo yum install texinfo
sudo yum install glibc-static
sudo yum install libexpat-devel
sudo yum install python-devel
sudo yum install fakeroot
sudo yum install syslinux syslinux-devel syslinux-extlinux syslinux-perl
sudo ln -s /usr/share/syslinux /usr/lib/syslinux

Oh and get a fresh autoconf if you are on CentOS 6.5
tar zxvf autoconf-latest.tar.gz
cd autoconf-2.69/
sudo make install

And away we go!
[jcuff@jcair-vm build-config]$ make -j4 MACHINE=kvm_x86_64 all recovery-iso

mkdir: created directory `/home/jcuff/onie/build'
mkdir: created directory `/home/jcuff/onie/build/images'
mkdir: created directory `/home/jcuff/onie/build/download'
mkdir: created directory `/home/jcuff/onie/build/kvm_x86_64-r0'
mkdir: created directory `/home/jcuff/onie/build/kvm_x86_64-r0/stamp'
mkdir: created directory `/home/jcuff/onie/build/kvm_x86_64-r0/initramfs'
==== Getting Linux ====
2014-06-11 14:50:54 URL: [65143140/65143140] -> "/home/jcuff/onie/build/download/linux-3.2.35.tar.xz" [1]
linux-3.2.35.tar.xz: OK

wheee! (get a large beverage, this bit takes a while!
[jcuff@jcair-vm build-config]$ ls -ltra ../build/images/
total 34212
drwxrwxr-x. 7 jcuff jcuff     4096 Jun 11 17:24 ..
-rw-rw-r--. 1 jcuff jcuff  3301792 Jun 11 18:29 kvm_x86_64-r0.vmlinuz
-rw-rw-r--. 1 jcuff jcuff  5284988 Jun 13 11:23 kvm_x86_64-r0.initrd
-rw-rw-r--. 1 jcuff jcuff  8603253 Jun 13 11:23 onie-updater-x86_64-kvm_x86_64-r0
drwxrwxr-x. 2 jcuff jcuff     4096 Jun 13 11:29 .
-rw-rw-r--. 1 jcuff jcuff 17825792 Jun 13 11:30 onie-recovery-x86_64-kvm_x86_64-r0.iso

Make a disk:
[root@jcair-vm onie]# dd if=/dev/zero of=/tmp/onie-x86-demo.img bs=1M count=256
256+0 records in
256+0 records out
268435456 bytes (268 MB) copied, 0.272711 s, 984 MB/s

Spin up the kvm!
[root@jcair-vm onie]# sudo /usr/libexec/qemu-kvm -m 1024 -name onie -boot order=cd,once=d -cdrom /tmp/onie.iso -net nic,model=e1000 -vnc -vga std -drive file=/tmp/onie-x86-demo.img,media=disk,if=virtio,index=0 -serial telnet:localhost:9000,server

And you are golden!
ONIE: Starting ONIE Service Discovery
Info: Found static url: file:///lib/onie/onie-updater
ONIE: Executing installer: file:///lib/onie/onie-updater
Verifying image checksum ... OK.
Preparing image archive ... OK.
ONIE: Version       : master-201406241118-dirty
ONIE: Architecture  : x86_64
ONIE: Machine       : kvm_x86_64
ONIE: Machine Rev   : 0
ONIE: Config Version: 1
Installing ONIE on: /dev/vda
Pre installation hook
Post installation hook

Remove the CD from your config and you can now boot the live version, and if everything has worked out, the discovery process will work and you can now ping the UK from the USA...
ONIE: Rescue Mode ...
Version   : master-201406241118-dirty
Build Date: 2014-06-24T11:40-0400
Info: Mounting kernel filesystems... done.
Info: Mounting LABEL=ONIE-BOOT on /mnt/onie-boot ...
Running demonstration platform init pre_arch routines...
Running demonstration platform init post_arch routines...
Info: Using eth0 MAC address: 52:54:00:2b:63:f6
Info: eth0:  Checking link... up.
Info: Trying DHCPv4 on interface: eth0
ONIE: Using DHCPv4 addr: eth0: /
Starting: dropbear ssh daemon... done.
Starting: telnetd... done.
discover: Rescue mode detected.  Installer disabled.

Please press Enter to activate this console. 

ONIE:/ # onie-sysinfo -a
VM-1234567890 52:54:00:2b:63:f6 master-201406241118-dirty 42623 kvm_x86_64 0 x86_64-kvm_x86_64-r0 x86_64 1 gpt 2014-06-24T11:40-0400

ONIE:/ # ping
PING ( 56 data bytes
64 bytes from seq=0 ttl=61 time=108.473 ms
64 bytes from seq=1 ttl=61 time=103.824 ms
64 bytes from seq=2 ttl=61 time=103.238 ms
--- ping statistics ---
3 packets transmitted, 3 packets received, 0% packet loss
round-trip min/avg/max = 103.238/105.178/108.473 ms

p.s. for extra twisted points this is ONIE running on linux KVM, inside virtualbox, on a mac on a pair of different layer three networks... it becomes a little confusing to run commands, but always makes me chuckle that a mac laptop is basically a little data center at this point :-)
jcair:~ jcuff$ uname -v

Darwin Kernel Version 13.2.0: Thu Apr 17 23:03:13 PDT 2014; root:xnu-2422.100.13~1/RELEASE_X86_64

jcair:~ jcuff$ ssh -p 2222 root@ ssh uname -a

Linux onie 3.2.35-onie+ #1 SMP Tue Jun 24 11:30:01 EDT 2014 x86_64 GNU/Linux


Monday, May 5, 2014

compressing DRAM with ZRAM for fun and profit?


Can you use compressed DRAM for science if you don't quite have enough memory?


I'm going to file this under "Great idea, but my execution is slightly suspect"

Anyway, here's an example set up of compressed swap files:
[root@jcair-vm ~]# modprobe zram

[root@jcair-vm ~]# mkswap /dev/zram0
Setting up swapspace version 1, size = 104860756 KiB
no label, UUID=58476253-ad5a-4595-9bec-60bd09d76d30

[root@jcair-vm ~]# mkswap /dev/zram1
Setting up swapspace version 1, size = 104860756 KiB
no label, UUID=ed5d0f85-0245-472e-902e-0e94a743cbe0

[root@jcair-vm ~]# swapon -p5 /dev/zram0 
[root@jcair-vm ~]# swapon -p5 /dev/zram1

[root@jcair-vm ~]# swapon -s
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       104860752       0       5
/dev/zram1                              partition       104860752       0       5

Clearly without the zram setup above, stress fails right out of the gate:
[root@jcair-vm ~]# stress --vm-bytes 2344600024 -m 2 --vm-keep
stress: info: [6063] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd
stress: FAIL: [6063] (415) -- worker 6065 got signal 9
stress: WARN: [6063] (417) now reaping child worker processes
stress: FAIL: [6063] (451) failed run completed in 10s

But, running a stress test with a memory allocation much bigger than the host seems to work just fine and dandy once we have our zram swap files like those noted above:
[root@jcair-vm ~]# stress --vm-bytes 2344600024 -m 2 --vm-keep
stress: info: [5383] dispatching hogs: 0 cpu, 0 io, 2 vm, 0 hdd

top - 10:41:17 up 13 days, 22:03,  4 users,  load average: 2.91, 0.87, 0.29
Tasks: 192 total,   4 running, 188 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.1%us, 74.8%sy,  0.0%ni, 11.5%id,  0.0%wa,  0.1%hi, 13.4%si,  0.0%st
Mem:   3923468k total,  3840852k used,    82616k free,     5368k buffers
Swap: 209721504k total,   626964k used, 209094540k free,    36932k cached

 5385 root      20   0 2242m 1.2g  124 R 96.4 31.3   0:48.12 stress
 5384 root      20   0 2242m 1.2g  124 R 84.0 32.0   0:48.12 stress

Yay! So - this looks like it could work!

And so here we go with a genome aligner to see if this works. This will be a good test as it writes real data structures into memory, stress was doing a block fill. So first up let's try w/o enough ram:
[root@jcair-vm ~]# cat 
./bowtie2/bowtie2 -x ./hg19 -p 4  <( zcat Sample_cd1m_3rdrun_1_ATCACG.R1.fastq.gz)

[root@jcair-vm ~]# ./ 
Out of memory allocating the ebwt[] array for the Bowtie index.  Please try
again on a computer with more memory.

Error: Encountered internal Bowtie 2 exception (#1)

Command: /root/bowtie2/bowtie2-align-s --wrapper basic-0 -x ./hg19 -p 4 /dev/fd/63 
(ERR): bowtie2-align exited with value 1

Ok, fair enough, so we have a reproducer.

Let's now set up a run with the right amount of physical ram:
[root@jcair-vm ~]# ./bowtie2/bowtie2 -x ./hg19 -p 4 <(cat cuff.fastq) -S out.dat &

7467 root 20 0 3606m 3.3g 1848 S 389.3 58.3 51:37.25 bowtie2-align-s

And we have a result!
[root@jcair-vm ~]# time ./bowtie2/bowtie2 -x ./hg19 -p 4 <(cat cuff.fastq)  -S out.dat 
13558597 reads; of these:
  13558597 (100.00%) were unpaired; of these:
  11697457 (86.27%) aligned 0 times
    545196 (4.02%) aligned exactly 1 time
   1315944 (9.71%) aligned >1 times
13.73% overall alignment rate

Ok, so let's shrink the memory of the machine and see if we can run with zram.

Let's also set the same priority and do a round robin between physical swap and zram so each can write/read a block should be nice balanced I/O. The stress worked, so our theory is that data and in memory structures could compress and we should be able to get at least a 1:2 or 1:1.5 ratio out of the memory, I settled on a 3G machine with a 3G compression and some physical swap also:
[jcuff@jcair-vm ~]$ swapon -s
Filename                                Type            Size    Used    Priority
/dev/zram0                              partition       2947008 614124  1
/dev/dm-1                               partition       4063224 614176  1

When running it did result in *much* smaller RES (982m vs 3.3G from native example):

 2350 root      20   0 3606m 982m 1020 S 20.8 33.3  12:26.74 bowtie2-align-s

Things chugged along, but I was not seeing this ending any time soon so I truncated the read file dramatically to ca. 5k reads to see if I could get a quick comparison between, zram hybrid sram and swap, and plain old boring old swap files.

As you can see below, only "boring old swap" resulted in anything sensible. The zram alone caused some rather spectacular OOM errors and obvious system instability, it was kinda fun though. You can also see below various versions we tried out, none of which actually worked, but we are also not totally alone here either.

Oh and: "Just the right amount of memory" - like Goldilocks, that one worked ;-)
Machine with memory too small:          (ERR): bowtie2-align exited

3G zram:                                sshd invoked oom-killer: gfp_mask=0x200da

Hybrid 3G zram + 4G physical swap:      6m 25.285s

Hybrid 500MB zram + 4G physical swap:   1m 51.029s

Regular /dev/dm-1 swap file:            0m 29.741s

Machine with enough ram:                0m 12.698s

In summary... NO PROFIT this time :-(

Still a neat idea - just don't try this at home kids!

Thursday, April 17, 2014

of painting, retirement plans and minimum wage

So my lovely "painting diva by night" Michele Clamp bangs out some epic watercolors...

Michele totally scored today! A great friend of ours bought one of her paintings. For ONE HUNDRED DOLLARS! Tonight we decided to look at how we are going to fund our new found retirement from paintings! Here's the transaction, I kid you not, she literally made ONE HUNDRED DOLLARS!

And here is the lovely (now sold!) "Pig in Clover" in his new rather resplendent frame waiting to go to the CCAE to hang out with his chums in the rest of Michele's exhibition...

I did think at the time that charging ONE HUNDRED BUCKS was a bit steep, especially to a great and close friend of ours, so I asked Michele to pull together the numbers.

It was rather sad as you can see:
Actual Painting                 1 hour
Buying frame                    1 hour
Framing                         1 hour
Ferrying to/from gallery        1 hour

Paint                           $1.00
Paper                           $1.00
Brush wear and tear             $1.00
Frame                           $15.00
Sale price                      $100.00
Minus fees to the lovely CCAE   $50.00

Net                             $32.00 

Income @ 4 hours                $8.00 /hour

Which in the state of MA the minimum wage is exactly eight bucks an hour.

Clearly not quite time to retire yet!

Especially given our current sales rate is about one painting every six months, which puts Michele well... yeah best we don't even bother with that math, it would not keep us in adult beverages.


Thursday, April 10, 2014

of schools and of school districts

So folks in the USA worry a lot about where to send their kids to school. Entire family decisions are made and based upon locating to the right regions and towns and cites in America so their kids can get "the best education they can afford".  It's a very big dealio.

For example has huge sections of school data built right into the purchase section for any property. Here for example is a $900,000 home in a town called Sudbury in Massachusetts (it's a bit posh, but I wanted to use it as an example - we would never live there!). The yearly council and property taxes for this particular place come in at over $16,000. But check this out for some of the local schools - you can clearly see where the money goes right?

Anyway so I want to tell you about where I carried out my high school education...

Tulketh High School, Tag Lane, Preston, England.

I was there in the mid to late eighties. Other than white socks being a formal, and required part of the male and female uniform - it was not all that bad a place.  Sure the bar was extremely low, it was a pretty poor neck of the woods. At the time we were living in nearby council assisted housing and didn't have two pennies to rub together. But there at Tulketh the teachers (for the most part - but I'll get to that later) tried their very best to teach us reprobates the three R's.

I remember fondly our English, Latin and French teachers in particular being absolutely great and my Chemistry teacher, well he was the chap who first introduced me to a 480z... and I guess given my current occupation you could call what I do now as being the rest of his history!  So a good teaching staff in a pretty shitty location, but with absolute hearts of gold.  Certainly there was no $16,000 a year council tax, heck I imagine at the time you could probably buy our entire house for that!

For balance, just in case folks think I'm getting all wet and starry eyed, I did unfortunately have a math teacher who in my later years at Tulketh helped me achieve a very solid "D"...  I can tell you that "D" looked totally amazing next to all my other "A's" when I later applied to do my A levels! Anyway, I retook the math course and achieved a straight A on the second go round.  Once I had a teacher that actually taught me the syllabus... but as I said, the bar at Tulketh really was pretty gosh darn low.  I hold no grudges - it was utterly amazing I made it to University to be honest!

Unfortunately for Tulketh, it turned out that things did not get a whole lot better after I left the area either.  I've no background as to why this is, although I do have some personal ideas.  But mainly one.

Teachers are not paid anywhere near enough $ to stay in the profession. 

Couple that with the total reprobates (sorry pupils) that hung out at our school when I was there, I can hardly imagine how difficult it must have been to even get up in the morning to go to work...  Some of our classes were merely a study in chaos theory rather than anything approximating education.  I can't imagine that part ever really improved any.  Certainly not for the teachers.

For example, the latest performance figures from the BBC back in 2004, show the school came in ranked 90th out of a possible 93, with a 15% success rate in the GCSE.  That's altogether just on the other side of absolutely grim, however you want to do the statistics....

I've not been back to that part of the UK in about 10 years, google maps currently shows it to be not doing all that well, and I hear the rest of the building is also all boarded up now:

They appear to have closed the school after attempting to make it a "sports centre of excellence" after failing at education fully in 2003 or so, and then basically from what I can see gave up on the whole system some time in 2009.

Very, very sad.

And worse still - it very much looks like the new rebooted version of this school, albeit at a new site with a big fancy building, and in a fancier postal code with huge multimillion pound building investments, (but with probably the very same dodgy pupils - remember I was one ;-)) is also still not doing so very well either...  Do we think the teachers all got paid more?  I doubt it.  And yet the report below STILL blames the teachers!  It can't be the twenty five million pound building, it has to be the teachers!


When will they ever learn?  Just pay the bloody teachers!

However in this recent and much more positive news:

"The report addressed suggestions raised including references made to Tulketh High School as a possible alternative to the proposed new secondary school.

A new secondary school is currently in phase two of the proposal, and the report said: “Tulketh High School is closed now and whether it should be retained in abeyance until such time as it might be needed is a matter for the County Council as the Local Education Authority.

“It is, however, an option that should be discussed further.”

Gives me hope that the place where I first learned to program a computer may well be able to dig itself out of the rather nasty corner it is in right now.

And for fun, I'll leave you with a picture of what prefects were given to denote their status (courtesy of Tina Kelly from a Facebook post).  I also remember holding one of these badges with great pride!  It was the only time that this little nerd could get his own back on those big bullies - basically by handing out detention slips!  Ah!  Happy days! :-)

Must say, it does have a slight imperial look to it... and I guess we know how well that kind of thing always works out - right?

Tulketh High, there will always be a place for you in my heart - you are clearly gone, but you will never, ever be forgotten, neither will all the amazing teachers there who helped me on my way!

[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff