Wednesday, July 18, 2012

scientific software as a service sprawl...

Lots of chatter in HPC circles about how fast your machine is, how many petabytes of storage you have, how low is your latency, how fast is your fabric, yada, yada, yada, the list goes on and on. One thing folks forget is how hard it is to build open source software with hundreds of dependencies and interactions with system libraries and interpreted language sub systems.

Oh yeah, and perl, python ruby: I'm looking at each of ya'll ;-)).

So, I took a quick audit today of our own software system, we use modules to be able to pull in and manage various software, Fourier transform libraries, math, blas, compilers and big old honking perl pipelines. You know the sort of thing, foo requires bah depends on blergh, must be linked with piffle....

I knew the team had been adding software at a crazy rate, I just had no idea quite how far this had gone :-) Our faculty and researchers do love this stuff, we build it once, invest the time and then the codes are available to all quickly and easily as a module load hpc/matlab style operation. It is a huge cost savings, certainly to my organisation and the folks we support.

Do it once, reuse many!

[jcuff@hpc ~]$ module avail |& wc -l
2192
Woof!

That's just shy of two thousand, two hundred pieces of software loaded on our cluster!

[jcuff@hpc ~]$ module avail |& awk -F "/" '{print $1}' | sort | uniq -c | sort -nr
   1054 hpc
    618 bio
    194 math
    134 viz
    116 devel
     28 ncf
     28 chem
     14 geo
      2 mia32
http://rc.fas.harvard.edu/module_list/

However, industry folks worry a lot about server spraw, VM sprawl, disk sprawl.

So I guess my question is:

How are folks managing this new beast that is clearly "scientific software sprawl"?

If we are to believe the dream of SaaS models, it's only going to get way worse!



[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff