This morning we saw a ticket in RT from Mario Juric who is doing some amazing work on our cluster as a Hubble Fellow over at the Center for Astrophysics. Anyway, so Mario found a problem with our 2.7.1 production version of python.
Mario's request was super simple (stuff we see all the time):
So Mario also gives us the full patch info, where to find information about the error and the type of error. Here is the ticket queue from python HQ:
This was all based on Mario's original filing:
So we get the ticket flow like this following on from the 20th April @ 6:39am ticket close out @ python HQ by Raymond Hettinger. At this point I start thinking about it some more, and realize that this is not an ordinary patch a piece of code kinda ticket...
Seriously one conversation: [broken] -> [fixed?] -> [yeah fixed!]
Done!
So what is important here? Why am I making so much fuss about a patch, the type of which happens every single day in the open source and in our research community?
1: our researchers are amazing, finding that level of threading issue deep inside a code base is not trivial!
2: open source community bug tracking blows proprietary release cycles out of the water!
3: the open source community take issues seriously, they are not messing about!
4: if you are not fast, proactive and aware in research computing you are dead! the science community has no need for you - they can do all this IT stuff themselves - remember they can find galaxies and collide particles and annotate the human genome - sticking a few computers together is just cake and biscuit to these boys and girls!
Think about how long such a change to a core threading library would take to be approved in any production enterprise IT shop? No chance that such seat of pants application is either sensible or practical to do on a production floor. However in RC, this is exactly what we need to do.
So that is why research computing is different! - least to me and my team at any rate!
Mario's request was super simple (stuff we see all the time):
I'm using Python 2.7 installed on Odyssey. Two days ago I've managed to
trace a serious bug it the implementation of one of its components that
caused our codes to (on very rare occasions) produce erroneous results.
The bug has been reported upstream as:
http://bugs.python.org/issue11875
I'd really appreciate if this could be done. Otherwise, we'd have to build
a separate version of Python (and numpy, and scipy, ...) just to get this
fix in.
trace a serious bug it the implementation of one of its components that
caused our codes to (on very rare occasions) produce erroneous results.
The bug has been reported upstream as:
http://bugs.python.org/issue11875
I'd really appreciate if this could be done. Otherwise, we'd have to build
a separate version of Python (and numpy, and scipy, ...) just to get this
fix in.
So Mario also gives us the full patch info, where to find information about the error and the type of error. Here is the ticket queue from python HQ:
Date User Action Args --------------------------------------------------------------------- 2011-04-20 06:39:51 rhettinger set status: open -> closed 2011-04-20 00:19:26 python-dev set messages: + msg134110 2011-04-20 00:01:12 python-dev set messages: + msg134109 2011-04-19 21:08:45 durban set nosy: + durban 2011-04-19 20:58:09 amaury set status: closed -> open 2011-04-19 20:26:37 mjuric set messages: + msg134096 2011-04-19 19:08:53 rhettinger set status: open -> closed 2011-04-19 18:15:34 python-dev set messages: + msg134085 2011-04-19 16:55:00 python-dev set nosy: + python-dev 2011-04-19 09:34:04 rhettinger set messages: + msg134027 2011-04-19 08:46:24 rhettinger set assignee: rhettinger 2011-04-19 08:28:24 mjuric create
This was all based on Mario's original filing:
The implementation of OrderedDict.__reduce__() in Python 2.7.1 is not thread safe because of the following four lines:
tmp = self.__map, self.__root
del self.__map, self.__root
inst_dict = vars(self).copy()
self.__map, self.__root = tmp
If one thread is pickling an OrderedDict, while another accesses it, a race condition occurs if the accessing thread accesses the dict after self.__map and self.__root have been delated, and before they've been set again (above).
tmp = self.__map, self.__root
del self.__map, self.__root
inst_dict = vars(self).copy()
self.__map, self.__root = tmp
If one thread is pickling an OrderedDict, while another accesses it, a race condition occurs if the accessing thread accesses the dict after self.__map and self.__root have been delated, and before they've been set again (above).
So we get the ticket flow like this following on from the 20th April @ 6:39am ticket close out @ python HQ by Raymond Hettinger. At this point I start thinking about it some more, and realize that this is not an ordinary patch a piece of code kinda ticket...
Wed Apr 20 08:56:20 2011 Mario Juric - Ticket created Wed Apr 20 09:10:52 2011 Chris Walker - Taken Wed Apr 20 09:19:12 2011 Chris Walker - Correspondence added Wed Apr 20 09:36:59 2011 Mario Juric - Correspondence added
Done!
So what is important here? Why am I making so much fuss about a patch, the type of which happens every single day in the open source and in our research community?
1: our researchers are amazing, finding that level of threading issue deep inside a code base is not trivial!
2: open source community bug tracking blows proprietary release cycles out of the water!
3: the open source community take issues seriously, they are not messing about!
4: if you are not fast, proactive and aware in research computing you are dead! the science community has no need for you - they can do all this IT stuff themselves - remember they can find galaxies and collide particles and annotate the human genome - sticking a few computers together is just cake and biscuit to these boys and girls!
Think about how long such a change to a core threading library would take to be approved in any production enterprise IT shop? No chance that such seat of pants application is either sensible or practical to do on a production floor. However in RC, this is exactly what we need to do.
So that is why research computing is different! - least to me and my team at any rate!
0 comments:
Post a Comment