Tuesday, March 8, 2011

DIY $ / TB and RC playing catch up quick!

I had the opportunity to speak on the phone with Gleb Budman back in early 2010 about how we could best use his building blocks to support science with my RC team. We both decided we were not quite ready and would postpone any grand ideas of pushing 2PB of our data to backblaze.com at least for a little while... that reminds me I need to catch up with Gleb again :-)


Anyway, time passed and my team in RC started to gain traction with what we started to call "commodity scaleable" storage arrays. Basically we take regular old "enterprise" arrays, (think of your tier one vendor that provides lights off, battery raid and 1U boxes with a boat load of SATA drives and a 1-800-IT-ISNOT-WORKING number for 24hr parts firmware etc.) We then stack them together in 100-250TB building blocks and away we go. Linux, LVM/XFS storage and NFS building blocks worked well, as did some in Lustre configurations:

http://www.flickr.com/photos/jamesdotcuff/5210398730/
http://www.flickr.com/photos/jamesdotcuff/5210399550/

It is "cheap" or in "gartner quadrant" terms, "cost effective". I've spoken about this approach before at a number of talks - it is not sexy, but it works well for us and keeps the cost down. We thought we finally cracked the whole "scale it for low $" problem that we have been facing for years now.

Well we thought we had...

However, I do work at one of the most interesting Universities on the planet, with Faculty that always want to "push the envelope". They do, and in this case did, by taking *exactly* Gleb's part list from the internet (I got these snaps in my inbox this morning from the research team):


Yeah they need a few zip ties and some other bits, but the research team did this all by themselves and really quickly, and they all have day jobs:

http://cleanenergy.harvard.edu!


We in team RC unfortunately have to stand on the sidelines and watch this one for just a little while. I don't quite have the stones to put this into production for our 2,000+ researchers. However, it does show you that a couple of dedicated postdocs and short money investments got their system up and running.

Currently, we are sharing configuration, some of my team's awesome systems talent, lab space and ideas at this point as we in RC watch from the sidelines. I know it is going to be inevitable that we go this route - we just have to, I'm just not ready quite yet to jump in.

$/TB is the bottom line of all of what science cares about!

Wonder how long it is before we prove out this system at multipetabyte scientific scale and I end up posting something along the lines of:

http://blog.backblaze.com/2011/01/05/10-petabytes-visualized/

What does make me chuckle is that my RC team is supposed to be the fast moving shim between the slow going regular IT provisioned services like email and accounts, and the fast moving quantum chemists and atmospheric science researchers. However, this post clearly shows that there are times when even my crazy fast non ITIL compliant technology rich science research team just *can't* keep up with the pace of our actual science and researchers!

It is a wake up for us all to continue to be more agile to support our science where ever or what ever support mechanisms we have - even if it does mean plugging in 45 disk drives at a time :-)

Anyway, well done to all of Alán's team - we continue to watch you all carefully ;-)!


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff