Saturday, August 11, 2012

Barriers to storage as a service

So Monday I'm off to visit my good friends at UMASS medical to chat with my other good pals and colleagues in the MGHPCC about possible shared storage models. I was also speaking with the CIO of Boston University on Friday, and she and I were both wondering what levels could one potentially share storage infrastructure between schools. Even more importantly, once you have done that, then could you ever seamlessly share data and files between schools? We knew it was probably going to look something like an OSI model - most things in IT do end up that way ;-)

Anyway, I said I'd put a slide together at least to start our conversations. I've been wrestling with shared storage for years in my various day jobs. What to share? How to share it? Who to share it with? The list goes on and on.

As I was putting this slide together it finally hit me!

The reason for my astonishment and enlightenment today?

Simple - it's all about BARRIERS!!

Barriers get in the way of everything. They cause misery, confusion and blame. They are always put in place to stop people from doing things. The more barriers the more control, the less freedom. What is academic data sharing all about? You got it sherlock! It is all about freedom! Freedom to explore, freedom to grow, freedom to push against barriers.

So there you have it, this post is basically stating the blindingly obvious!

"barriers get in the way of academic data sharing and freedom"

You see, each barrier you have to cross does make the challenge harder and harder to support science and discovery. Not all barriers are technology problems either, actually very few are. I see them more like activation energy once you get over the hump, the energy you need is lower, and quality ought to be higher... challenge is finding the right catalyst(*) ;-))

Here's the chart anyway, I think it came out rather well:



(*)folks in our IT shops have often called me out as a "catalyst" - not sure they always mean it in a "good way" though... hehe.


Monday, August 6, 2012

MSL in real time...

Yep we watched it all the way down... in real time via nasaTV and by this totally amazing java app on eyes.nasa.gov, the simulation and the TV feed back was utterly amazing - never seen anything like it in my whole life. Stunning!






Wednesday, August 1, 2012

GCE part II - fun with the API


First make sure you use a git checkout, the api is a moving target:

git clone https://code.google.com/p/google-compute-engine-tools.gcelib/

Then simply:

#!/usr/bin/python

import logging
from gcelib import gce_util
from gcelib import gce_v1beta12
from gcelib import shortcuts

# Project-related configuration.
DEFAULT_PROJECT = 'PROJECT'  # Your project name goes here.
DEFAULT_ZONE = 'us-east1-a'
DEFAULT_IMAGE = 'projects/google/images/ubuntu-12-04-v20120621'
DEFAULT_MACHINE_TYPE = 'n1-standard-1'

# Change logging to INFO to see more or to DEBUG to see even more!
LOG_LEVEL = logging.ERROR

def main():
  # Performs the oauth2 dance.
  credentials = gce_util.get_credentials()
  defaults = gce_util.get_defaults()

  # Constructs an instance of GoogleComputeEngine.
  api = gce_v1beta12.GoogleComputeEngine(
      credentials,
      logging_level=LOG_LEVEL,
      default_project=defaults.project or DEFAULT_PROJECT,
      default_zone=defaults.zone or DEFAULT_ZONE,
      default_image=defaults.image or DEFAULT_IMAGE,
      default_machine_type=defaults.machine_type or DEFAULT_MACHINE_TYPE)

  # Creates 8 test instances synchronously.
  names = ['test-instance-{0}'.format(i) for i in xrange(8)]
  api.insert_instances(names, networkInterfaces=shortcuts.network())

  # Prints the names of all instances in the given project.
  for instance in api.all_instances():
    print instance.name
 
  api.wait_until_done()

if __name__ == '__main__':
  main()
          

Gets you up and running in seconds:

toofan:samples jcuff$ time ./go.py
test-instance-0
test-instance-1
test-instance-2
test-instance-3
test-instance-4
test-instance-5
test-instance-6
test-instance-7

real 0m10.847s
user 0m0.520s
sys 0m0.103s



yep, it really was that easy... 8 node cluster, 10 seconds.

jcuff% foreach h (0 1 2 3 4 5 6 7)
foreach? gcutil ssh test-instance-$h uname -a
foreach? end
Linux test-instance-0 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-1 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-2 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-3 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-4 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-5 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-6 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012
Linux test-instance-7 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012

Done

GCE FTW!

Turns out I know a gal, who knows a guy who happens to be "a big deal" at g-plex... (thanks Phil!) ;-) So totally need to redact this post:

http://blog.jcuff.net/2012/07/gce-denied.html

ok, so let's take it for a whirl...

start the clock!

toofan:~ jcuff$ date
Wed Aug 1 10:21:58 EDT 2012

Ok, off we go!!

toofan:~ jcuff$ mkdir gce
toofan:~ jcuff$ cd gce
toofan:gce jcuff$ curl http://dl.google.com/dl/compute/gcutil.tar.gz > gcutil.tgz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  693k  100  693k    0     0   759k      0 --:--:-- --:--:-- --:--:-- 1288k
toofan:gce jcuff$ tar zxvf gcutil.tgz 
x gcutil/share/lib/apiclient/http.py
x gcutil/share/lib/apiclient/discovery.py
x gcutil/share/lib/apiclient/mimeparse.py
[snip]

toofan:gcutil jcuff$ ./gcutil auth --project_id=harvardresearch
Go to the following link in your browser:


Enter verification code: 
Authentication successful.
INFO: Authorization succeeded for user jcuff

toofan:gcutil jcuff$ ./gcutil getproject 

+-----------------------------------------+---------------------------+
|                property                 |           value           |
+-----------------------------------------+---------------------------+
| name                                    | jcuff.net:harvardresearch |
| description                             |                           |
| creation time                           | 2012-08-01T14:17:32.902   |
| ips                                     |                           |
| INSTANCES quota usage / limit           | 0.0 / 8.0                 |
| CPUS quota usage / limit                | 0.0 / 8.0                 |
| EPHEMERAL_ADDRESSES quota usage / limit | 0.0 / 8.0                 |
| DISKS quota usage / limit               | 0.0 / 8.0                 |
| DISKS_TOTAL_GB quota usage / limit      | 0.0 / 100.0               |
| SNAPSHOTS quota usage / limit           | 0.0 / 1000.0              |
|                                         |                           |
| Common Instance Metadata                |                           |
+-----------------------------------------+---------------------------+

toofan:gcutil jcuff$ ./gcutil addinstance research
1: us-central1-a
2: us-central2-a
3: us-east1-a
>>> 3
1: n1-standard-1
2: n1-standard-1-d
3: n1-standard-2
4: n1-standard-2-d
5: n1-standard-4
6: n1-standard-4-d
7: n1-standard-8
8: n1-standard-8-d
>>> 8

Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
INFO: Waiting for insert of research. Current status: RUNNING. Sleeping for 3s.

toofan:gcutil jcuff$ ./gcutil ssh research

Identity added: /Users/jcuff/.ssh/google_compute_engine 

Welcome to Ubuntu 12.04 LTS (GNU/Linux 2.6.39-gcg-201203291735 x86_64)

 * Documentation:  https://help.ubuntu.com/
Running a Google Compute Engine VM Instance
 * You are running on an EPHEMERAL root disk, which is NOT PERSISTENT.
   For persistent data, use Persistent Disks:
     https://developers.google.com/compute/docs/disks#persistentdisks

0 packages can be updated.
0 updates are security updates.

The programs included with the Ubuntu system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.

Ubuntu comes with ABSOLUTELY NO WARRANTY, to the extent permitted by
applicable law.

jcuff@research:~$ uptime
 14:30:56 up 0 min,  1 user,  load average: 0.14, 0.04, 0.02

jcuff@research:~$ uname -a  
Linux research 2.6.39-gcg-201203291735 #1 SMP Thu Mar 29 17:37:55 PDT 2012 x86_64 x86_64 x86_64 GNU/Linux

jcuff@research:~$ cat /proc/cpuinfo | grep "model name"
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz
model name : Intel(R) Xeon(R) CPU @ 2.60GHz

jcuff@research:~$ logout
Connection closed.
toofan:gcutil jcuff$ ./gcutil deleteinstance research
Delete instance research? [y/N]
>>> y


Stop the clock!

toofan:~ jcuff$ date
Wed Aug 1 10:32:06 EDT 2012

Oh my! That was EPIC! 11 mins, and I had to read some docs... sweet!

This thing is going to change the world yet again!!

UPDATE: once you know what you are doing, you can boot 8 nodes in 10 seconds!

http://blog.jcuff.net/2012/08/gce-part-ii-fun-with-api.html


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff