Thursday, January 13, 2011

vanity publication: why we have to take our faculty to the clouds

these opinions as always are mine, they do not 
in anyway represent any affiliated .edu official policies

We (well my awesome team) spin up and manage many 100's of virtual machines in our day jobs dedicated to "websites". Many of the instances we end up managing start off with this rather simple looking question:

From: faculty member @ dot edu
Subject: web site

Dear research computing,

We have a very important set of research data that we wish to publish on the web. Would you be able to install a "virtual machine" with drupal, php and ssh access to the world that mounts our 200TB storage array to display our data?

kthxbai

p.s if we could have world accessible FTP access to the data that would be awesome!

Prof. Famous.

We have all seen it before. 1000's of times. Anyone that works in academic computing is used to installing boatloads of php, perlcgi, ruby, drupal, glassfish, tomcat, confluence, plone, joomla, mediawiki, skeletonz, etc etc.

Why do we continue to do it? Just because it has "always been that way" does not mean that we need to continue to do the same thing. Or should we?

In my job, folks at all levels fear control issues. Everyone has to "hug" and manage their content and data. I totally get it - it is a big deal! The information is rather important - does not matter if you are a lowly RA or a CIO - "it's the data stoopid"!

However, do we really need to continue to install all this stuff at scale for 1,000's of folks to be able to publish and show off their (have to admit, pretty amazing) work?

Probably not. Most results are a subset of the total data, yes it is nice to have a cifs:\\bigfs\data1 mounted on some random IIS/Apache instance under our, and the faculty's full control. However, not many data sets are all that big.

So the question is - who is in control?

Answer - the DNS servers for assorted dot edus.

A dot edu third or fourth level domain alias to a web site is approximately three key strokes away for any netmanager / systems admin for any faculty that wants it. I've seen it done. BIND/CNR/Infoblox etc. are not large cost centers. However they are held close and tight by all Universities. These top level domains cost little money to the University - way less than FTE making [website.edu x 1,000]. But they continue to drive the faculty nuts, because they have to publish from "their.edu", the faculty are attached to "their.edu" and rightly so! They want to publish from "their.edu" - we can't blame them. We have to support them.

Am I right?

Does that mean each dot edu has to staff up and fund this "vanity"?

The community clearly wants to see it, and once that whole http:// thing starts - the data they want to show is public. All bets of privacy and control are off. It is as they say "out there". No self respecting faculty is going to misrepresent their research, or their university - they need less rather than more control.

So what to do?

It is now 2011, there are umpteen methods to expose data on the web. Over the last few months I've explored "clouds". All sorts of them, ranging from complex instances (awesome control) to simple sites (less control). Each took only seconds to set up. Vanity domains are but a godaddy/pairnic click away.

They are a thing of beauty.

So why are academic departments and expensive University IT organizations still attempting to build "mongo websites" and other such FTE intensive "CMS" systems? I for one simply don't get it. The industry is more than keen to "outsource". However, we clearly miss this simple idea of what our faculty are actually trying to do: expose their reduced scientific datasets. It is the very definition of outsourcing - the researchers just want to get their data out there!

Yes there are issues where large data needs to be exposed, and that's where my team can help - we don't fear the petabyte... But really - how large are most result sets? And even more so, do Universities really need to host vanity pages for all their assorted faculty and PIs?

A quick brain dump of *really* simple methods to expose data, twiddle with style sheets, set decoration and themes for any faculty/postdoc/scientist w/o having to engage a single system admin, or "web developer" and each for inordinately low cost @ varying levels of complexity:

[list is limited - trust me there are absolutely 1000's out there]

None of these sites are currently sponsored by your local dot edu DNS servers. I just hope some of your senior faculty get to lobby for independence of their domain name systems. Maybe get their research out there even faster and attributable to their home institute than ever before!

Summary
Websites for faculty and PI's: BASICALLY DONE!

This is clearly a solved problem much as google did for email in 2004 with 1GB email boxes to solve issues, albeit dot edu's still struggle with email...

So maybe it is not just as simple as delegating a portion of DNS name space after all... :-)


[any opinions here are all mine, and have absolutely nothing to do with my employer]
(c) 2011 James Cuff