Sunday, May 25, 2014

Python reimports

We've been working a lot recently on two kinds of optimization in music21: improving speed and then using some of the speed increases to add functionality and stability, so that new features can be added without slowing down the process. One of the places we found where we could make changes is in our over-cautious use of imports. 

While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:

Here I compare ten million calls to reference an object vs. doing the same while also importing a module that is already imported:

>>> from timeit import timeit as t # number = ten million; output in secs to 3 decimals
>>> t('x', setup='import weakref; x=5', number=10000000)
>>> t('import weakref; x', setup='import weakref; x=5', number=10000000)

So it's approximately two orders of magnitude slower than direct access alone.  Even with using the module and creating the weakref itself, the check-for-reimport timedominates five-fold over the creation of the weakref:

>>> t('weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
>>> t('import weakref; weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)

for historical reasons (porting to systems without weakref, etc.) the “common.wrapWeakref” function of music21 (which does a try: except to see if a weakref could be made) did the import within the function.  Moving it outside the function sped it up considerably and made it only half the speed of calling weakref.ref(x) directly -- worth it for the extra safety--and only an order of magnitude slower than direct access to x itself:

before, with common.wrapWeakref doing a safety "import weakRef" call 
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)

after, without it:
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)

So this is the speedup in music21 that you'd find if you managed to grab the GitHub repository right now.  But we're planning on using the speedup to make things more functional.

As a practical consideration, one of the things that I’ve never been able to fix in music21 is the ability of elements embedded in a Stream to change their duration without telling their sites that things have changed for an element. There are expensive operations such as calculating that the length of a Measure, the last object, etc. which we cache as long as no .append(), .insert(), .remove() etc. are called.  But a Note inside the measure may have changed length so that the information in the cache is no longer accurate. I've been wanting to fix this for a while.

The problem is that the Note object itself has no idea that its duration has changed, because while the Note has a reference to the Duration, the Duration does not have a reference to Note -- it can't have a normal reference because this would create a circular reference (Note.duration = Duration; Duration.client = Note). With a circular reference, neither the Note nor the Duration will ever disappear even after they're not needed anymore, causing memory leaks. The obvious solution is to use a weak reference which behaves mostly like a normal reference but does not cause circular references. If the Note should disappear then the Duration.client weakref is not strong enough to keep the two objects alive.

With the speed increases, it should be possible to store a weakref on Duration and also Pitch to the object they’re attached to so that they can inform their “client” that they’ve changed.  The client can then inform its Sites (measures, etc.) that it has changed and clear the appropriate cache.  The extra overhead of creating the weakref ends up being only about 20% of object creation time; a small price to pay for the security of knowing that nothing can change and screw up the overall system:

>>> t('d=duration.Duration();', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
>>> t('d=duration.Duration(); pitchRef = common.wrapWeakref(x)', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)

Expect to see more functionality like this in a forthcoming release of music21.

Friday, May 23, 2014

Speed, Speed, Speed, ... and news.

The newest GitHub repository contains a huge change to the under-the-hood processing of .getContextByClass() which is used in about a million places in music21.  It is the function that lets any note know what its current TimeSignature (and thus beatStrength, etc.) is, lets us figure out whether the sharp on a given note should be displayed or not given the current KeySignature, etc.  While we had tried to optimize the hell out of it, it’s been a major bottleneck in music21 for working with very large scores. We sped up parsing (at least the second time through) a lot the last commit. This was the time to speed up Context searching.  We now use a form of AVL tree implemented in a new Stream.timespans module — it’s not well-documented yet, so we’re only exposing it directly in one place, stream.asTimespans(recurse=True|False).  You don’t need to know about this unless you’re a developer; but I wanted to let you know that the results are extraordinary.

Here’s a code snippet that loads a score with three parts and 126 measures and many TimeSignatures and calculates the TimeSignature active for every note, clef, etc. and then prints the time it takes to run:

>>> c = corpus.parse('luca/gloria')
>>> def allContext(c):
...     for n in c.recurse():
...         k = n.getContextByClass('TimeSignature')
>>> from time import time as t
>>> x = t(); allContext(c); print t() - x

with the 1.8 release of Music21:
42.9 seconds

with the newest version in GitHub:
0.70 seconds

There’s a lot of caching that happens along the way, so the second call is much faster:

second call with 1.8 release:
44.6 seconds ( = same within a margin of error)
with the newest version in GitHub if the score hasn’t changed:
0.18 seconds

You’ll see the speedup immensely in places where every combination of notes, etc. needs to be found.  For instance, finding all parallel fifths in a large score of 8 parts could have taken hours before. Now you’ll likely get results in under a few seconds.

I have not heard of any issues arising from the change in sorting from the last posting on April 26, so people who were afraid of updating can breath a bit more easily and update to the version of music21 at least as of yesterday. The newer version, like all GitHub commits, should be used with caution until we make a release.

Thanks to the NEH and the Digging into Data Challenge for supporting the creation of tools for working with much bigger scores than before.

In other news: 

Music21j — a Javascript implementation of music21’s core features — is running rapidly towards a public release.  See for an example of usage.  We’ll be integrating it with the Python version over the summer.

Ian Quinn’s review of Music21 appeared in the Journal of the American Musicological society yesterday.  Prior to this issue, no non-book had ever been reviewed. It’s a great feeling to have people not on this list know about the software as well.

Oh, and MIT was foolhardy enough to give me tenure! Largely on the basis of music21.  If you’re an academic working on a large digital project, I still advice proceeding with caution, but know that it can be done.  Thanks everyone for support.