Sunday, May 25, 2014

Python reimports

We've been working a lot recently on two kinds of optimization in music21: improving speed and then using some of the speed increases to add functionality and stability, so that new features can be added without slowing down the process. One of the places we found where we could make changes is in our over-cautious use of imports. 

While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:

Here I compare ten million calls to reference an object vs. doing the same while also importing a module that is already imported:

>>> from timeit import timeit as t # number = ten million; output in secs to 3 decimals
>>> t('x', setup='import weakref; x=5', number=10000000)
0.278
>>> t('import weakref; x', setup='import weakref; x=5', number=10000000)
7.810

So it's approximately two orders of magnitude slower than direct access alone.  Even with using the module and creating the weakref itself, the check-for-reimport timedominates five-fold over the creation of the weakref:

>>> t('weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
2.098
>>> t('import weakref; weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
9.823

for historical reasons (porting to systems without weakref, etc.) the “common.wrapWeakref” function of music21 (which does a try: except to see if a weakref could be made) did the import within the function.  Moving it outside the function sped it up considerably and made it only half the speed of calling weakref.ref(x) directly -- worth it for the extra safety--and only an order of magnitude slower than direct access to x itself:

before, with common.wrapWeakref doing a safety "import weakRef" call 
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
17.112

after, without it:
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
4.171

So this is the speedup in music21 that you'd find if you managed to grab the GitHub repository right now.  But we're planning on using the speedup to make things more functional.

As a practical consideration, one of the things that I’ve never been able to fix in music21 is the ability of elements embedded in a Stream to change their duration without telling their sites that things have changed for an element. There are expensive operations such as calculating that the length of a Measure, the last object, etc. which we cache as long as no .append(), .insert(), .remove() etc. are called.  But a Note inside the measure may have changed length so that the information in the cache is no longer accurate. I've been wanting to fix this for a while.

The problem is that the Note object itself has no idea that its duration has changed, because while the Note has a reference to the Duration, the Duration does not have a reference to Note -- it can't have a normal reference because this would create a circular reference (Note.duration = Duration; Duration.client = Note). With a circular reference, neither the Note nor the Duration will ever disappear even after they're not needed anymore, causing memory leaks. The obvious solution is to use a weak reference which behaves mostly like a normal reference but does not cause circular references. If the Note should disappear then the Duration.client weakref is not strong enough to keep the two objects alive.

With the speed increases, it should be possible to store a weakref on Duration and also Pitch to the object they’re attached to so that they can inform their “client” that they’ve changed.  The client can then inform its Sites (measures, etc.) that it has changed and clear the appropriate cache.  The extra overhead of creating the weakref ends up being only about 20% of object creation time; a small price to pay for the security of knowing that nothing can change and screw up the overall system:

>>> t('d=duration.Duration();', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
19.382
>>> t('d=duration.Duration(); pitchRef = common.wrapWeakref(x)', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
23.787

Expect to see more functionality like this in a forthcoming release of music21.

No comments:

Post a Comment