Monday, June 25, 2012

does science ever sleep? "on the psychology of cluster computing..."


It's been a while since I posted anything, and this one is both hopefully fun and informative for folks that either run, or run on HPC cluster systems.

Ok here we go...

So we run ca. 20,000 processors in our cluster, it is all time shared and allows over 5,000 folks to submit their work in batches, at any time of the day or night. There are times at which, it all gets rather hectic I can tell you!

Recently we have been working on a C/MySQL interface to our flat-file batch logs going over the last three or so years to look for trends and spot which groups are running which codes and when they are running them.

There are about 82 million rows in the database right now, so there are some pretty interesting SQL queries on the go, but I digress. I'll talk more about the interface and database in a follow up post - it is Michele's project, you can read about it here on her github:

https://github.com/mclamp/harvard-informatics/tree/master/lsf

The C code is also pretty darn cool and groovy, but I'm biased ;-)

https://github.com/mclamp/harvard-informatics/tree/master/lsf/src

Anyway, it turns out it's pretty fun to look at the global stats. First up, let us look at how all our jobs have been submitted over each month. This one is pretty straight forward for trending capacity:


Ignore the negative job number, this stuff is still work in progress, it will get better with time, this is all 10,000ft stuff right now ;-) Ok, good, so now let's look at all our jobs based on which day it is (I was looking for good days to do downtime and updates):


Interesting... And finally a look see at which hour of the day it is, again we could use this to spot good opportunities to carry out some deferred cluster maintenance:


Science NEVER Sleeps!

Is a little bit of a misnomer don't ya think? :-) Mind you, when you consider our population of customers (grad student heavy), and also correlate this to how many help tickets we get at around 6pm on any given evening because of a failed submission script, and then the number of requests we see on Saturday morning with folks upset about code failing...

It all makes sense.

Most jobs are fired off at about 5pm, on a Thursday or Friday night (some nearly 38 million jobs!) There is very little (in comparison only 13 million jobs) going on, on a Saturday night, but then with a clear uptick of rate on Sunday night on to Monday morning.

I looked at my own jobs, and sure enough other than some crazy 4:30-5:00am submission I half way remember, I follow the trend fairly well. You can get a hint of our new charting method from this snap below, it has all sorts of neat javascript predictive text entry courtesy of Michele Clamp. You can also see we make use of the totally awesome http://www.highcharts.com/ charting software. Very cool.


Data Driven Summary, smart kids will get up at 6am on Sat. to run their science...

Oh wait, no! That makes absolutely no sense. hehehe :-)



[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff